From: romuloceccon via ruby-core Date: 2023-02-28T12:27:53+00:00 Subject: [ruby-core:112630] [Ruby master Bug#19468] Ruby 3.2: net/http sets UTF-8 encoding for binary responses Issue #19468 has been reported by romuloceccon (R�mulo Ceccon). ---------------------------------------- Bug #19468: Ruby 3.2: net/http sets UTF-8 encoding for binary responses https://bugs.ruby-lang.org/issues/19468 * Author: romuloceccon (R�mulo Ceccon) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.1 (2023-02-22 revision 65ab2c1ef2) [x86_64-linux] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- net/http on Ruby 3.2 has changed the encoding of binary responses from SSL connected hosts (non-SSL connections are not affected): ``` ruby # req.rb require 'openssl' require 'net/http' puts "openssl ext: #{OpenSSL::VERSION}" puts "openssl lib: #{OpenSSL::OPENSSL_VERSION}" puts "net-protocol: #{Net::Protocol::VERSION}" puts "net-http: #{Net::HTTP::VERSION}" puts Net::HTTP.get(URI(ARGV.first)).encoding ``` Ruby 3.1 (with updated net-protocol and net-http libs): ``` $ ruby -v ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux] $ ruby req.rb https://www.gnu.org/software/gzip/manual/gzip.pdf openssl ext: 3.0.0 openssl lib: OpenSSL 1.1.1n 15 Mar 2022 net-protocol: 0.2.1 net-http: 0.3.2 ASCII-8BIT # <== CORRECT ``` Ruby 3.2 (latest git revision): ``` $ ruby -v ruby 3.2.1 (2023-02-22 revision 65ab2c1ef2) [x86_64-linux] $ ruby req.rb https://www.gnu.org/software/gzip/manual/gzip.pdf openssl ext: 3.1.0 openssl lib: OpenSSL 1.1.1n 15 Mar 2022 net-protocol: 0.2.1 net-http: 0.3.2 UTF-8 # <== WRONG ``` I've tracked the problem down to the SSL socket call at https://github.com/ruby/ruby/blob/9557c8edf2dcf18fdece066c596a71696b2f2b30/lib/net/protocol.rb#L218. The string returned has the encoding set to `ASCII-8BIT`, but `#ascii_only?` also always reports true, even when there are non-ascii bytes. This seems to be a bug, and is the probably cause of the change in behavior in net/http. On Ruby 3.1 concatenating the result of reading the SSL socket to a UTF-8 string produces an ASCII-8BIT string. On Ruby 3.2 the concatenation produces a UTF-8 string. Here's a program demonstrating the behavior of the SSL socket: ```ruby # ssltest.rb require 'openssl' require 'uri' url = URI(ARGV.first) path = url.path path += '?' + url.query if url.query req = "GET #{path} HTTP/1.1\r\nHost: #{url.hostname}\r\nAccept: */*\r\n\r\n" sock = OpenSSL::SSL::SSLSocket.open(url.hostname, url.port || HTTPS.default_https_port) sock.connect sock.write(req) sleep(1) loop do sleep(0.1) b = ''.b r = sock.read_nonblock(1024 * 16, b, exception: false) break unless String === r p [r.bytesize, r.encoding.to_s, r.ascii_only?] end ``` Ruby 3.1: ``` $ ruby ssltest.rb https://www.gnu.org/software/gzip/manual/gzip.pdf [475, "ASCII-8BIT", true] [16384, "ASCII-8BIT", false] # <== always false (except HTTP header): CORRECT [16384, "ASCII-8BIT", false] ... [13927, "ASCII-8BIT", false] ``` Ruby 3.2: ``` $ ruby ssltest.rb https://www.gnu.org/software/gzip/manual/gzip.pdf [475, "ASCII-8BIT", true] [16384, "ASCII-8BIT", true] # <== always true: WRONG [16384, "ASCII-8BIT", true] ... [13927, "ASCII-8BIT", true] ``` -- https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/