[ruby-core:112925] [Ruby master Bug#19468] Ruby 3.2: net/http sets UTF-8 encoding for binary responses
From:
"naruse (Yui NARUSE) via ruby-core" <ruby-core@...>
Date:
2023-03-17 04:36:42 UTC
List:
ruby-core #112925
Issue #19468 has been updated by naruse (Yui NARUSE).
Backport changed from 2.7: DONTNEED, 3.0: DONTNEED, 3.1: DONTNEED, 3.2: REQ=
UIRED to 2.7: DONTNEED, 3.0: DONTNEED, 3.1: DONTNEED, 3.2: DONE
ruby_3_2 b309c246ee70926d593d3857e1625202e2d0f67b merged revision(s) d78ae7=
8fd76e556e281a743c75bea4c0bb81ed8c.
----------------------------------------
Bug #19468: Ruby 3.2: net/http sets UTF-8 encoding for binary responses
https://bugs.ruby-lang.org/issues/19468#change-102446
* Author: romuloceccon (R=F4mulo Ceccon)
* Status: Closed
* Priority: Normal
* ruby -v: ruby 3.2.1 (2023-02-22 revision 65ab2c1ef2) [x86_64-linux]
* Backport: 2.7: DONTNEED, 3.0: DONTNEED, 3.1: DONTNEED, 3.2: DONE
----------------------------------------
net/http on Ruby 3.2 has changed the encoding of binary responses from SSL =
connected hosts (non-SSL connections are not affected):
``` ruby
# req.rb
require 'openssl'
require 'net/http'
puts "openssl ext: #{OpenSSL::VERSION}"
puts "openssl lib: #{OpenSSL::OPENSSL_VERSION}"
puts "net-protocol: #{Net::Protocol::VERSION}"
puts "net-http: #{Net::HTTP::VERSION}"
puts Net::HTTP.get(URI(ARGV.first)).encoding
```
Ruby 3.1 (with updated net-protocol and net-http libs):
```
$ ruby -v
ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]
$ ruby req.rb https://www.gnu.org/software/gzip/manual/gzip.pdf
openssl ext: 3.0.0
openssl lib: OpenSSL 1.1.1n 15 Mar 2022
net-protocol: 0.2.1
net-http: 0.3.2
ASCII-8BIT # <=3D=3D CORRECT
```
Ruby 3.2 (latest git revision):
```
$ ruby -v
ruby 3.2.1 (2023-02-22 revision 65ab2c1ef2) [x86_64-linux]
$ ruby req.rb https://www.gnu.org/software/gzip/manual/gzip.pdf
openssl ext: 3.1.0
openssl lib: OpenSSL 1.1.1n 15 Mar 2022
net-protocol: 0.2.1
net-http: 0.3.2
UTF-8 # <=3D=3D WRONG
```
I've tracked the problem down to the SSL socket call at https://github.com/=
ruby/ruby/blob/9557c8edf2dcf18fdece066c596a71696b2f2b30/lib/net/protocol.rb=
#L218.
The string returned has the encoding set to `ASCII-8BIT`, but `#ascii_only?=
` also always reports true, even when there are non-ascii bytes. This seems=
to be a bug, and is the probably cause of the change in behavior in net/ht=
tp. On Ruby 3.1 concatenating the result of reading the SSL socket to a UTF=
-8 string produces an ASCII-8BIT string. On Ruby 3.2 the concatenation prod=
uces a UTF-8 string.
Here's a program demonstrating the behavior of the SSL socket:
```ruby
# ssltest.rb
require 'openssl'
require 'uri'
url =3D URI(ARGV.first)
path =3D url.path
path +=3D '?' + url.query if url.query
req =3D "GET #{path} HTTP/1.1\r\nHost: #{url.hostname}\r\nAccept: */*\r\n\r=
\n"
sock =3D OpenSSL::SSL::SSLSocket.open(url.hostname, url.port || HTTPS.defau=
lt_https_port)
sock.connect
sock.write(req)
sleep(1)
loop do
sleep(0.1)
b =3D ''.b
r =3D sock.read_nonblock(1024 * 16, b, exception: false)
break unless String =3D=3D=3D r
p [r.bytesize, r.encoding.to_s, r.ascii_only?]
end
```
Ruby 3.1:
```
$ ruby ssltest.rb https://www.gnu.org/software/gzip/manual/gzip.pdf
[475, "ASCII-8BIT", true]
[16384, "ASCII-8BIT", false] # <=3D=3D always false (except HTTP header): C=
ORRECT
[16384, "ASCII-8BIT", false]
...
[13927, "ASCII-8BIT", false]
```
Ruby 3.2:
```
$ ruby ssltest.rb https://www.gnu.org/software/gzip/manual/gzip.pdf
[475, "ASCII-8BIT", true]
[16384, "ASCII-8BIT", true] # <=3D=3D always true: WRONG
[16384, "ASCII-8BIT", true]
...
[13927, "ASCII-8BIT", true]
```
--=20
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-c=
ore.ml.ruby-lang.org/