From: fabio.ornellas@... Date: 2016-03-08T14:25:55+00:00 Subject: [ruby-core:74222] [Ruby trunk Feature#2567] Net::HTTP does not handle encoding correctly Issue #2567 has been updated by Fabio Pugliese Ornellas. Hello, I'm gonna give my 50 cents: ~~~ class Net::HTTPResponse def read_body(dest = nil, &block) if @read raise IOError, "#{self.class}\#read_body called twice" if dest or block return @body end # Force encoding for streamed response bodies final_block = if block proc do |chunk| if type_params['charset'] block.call(chunk.force_encoding(type_params['charset'])) else block.call(chunk) end end end to = procdest(dest, final_block) stream_check if @body_exist read_body_0 to @body = to else @body = nil end @read = true # Force encoding for String @body if type_params['charset'] && @body.respond_to?(:force_encoding) @body.force_encoding(response.type_params['charset']) end @body end end ~~~ These changes: * Makes Net::HTTP respect https://tools.ietf.org/html/rfc7231#section-3.1.1.2 * It woks for both cases: Net::HTTPResponse.body and Net::HTTPResponse.read_body. * If there is there is a server misconfiguration, and content-type charset is different from response body, it will postpone encoding exceptions to body processing outside Net::HTTP code, thus making it clearer to the user. * Users are still allowed to force_encoding to bypass any server misconfiguration. I understand Ruby libraries must obey RFC's by default, and let users get real exceptions when something is not right. The way it is now, body strings come inconsistent: sometimes I get ASCII-8BIT, sometimes UTF-8, depending on how the code inside Net::HTTP runs, and the RFC is not obeyed. I believe this change might create problems, with code that "works by coincidence", due to current behavior. For example, if the server is misconfigured, and set charset to iso8859-1, but response body is actually UTF-8, it will currently work, but with proposed patch, it will break. In such case however, it is a server issue, not client-side issue. It certainly is a risk, but not follow RFCs, is already bad as it is. ---------------------------------------- Feature #2567: Net::HTTP does not handle encoding correctly https://bugs.ruby-lang.org/issues/2567#change-57357 * Author: Ryan Sims * Status: Assigned * Priority: Normal * Assignee: Yui NARUSE ---------------------------------------- =begin A string returned by an HTTP get does not have its encoding set appropriately with the charset field, nor does the content_type report the charset. Example code demonstrating incorrect behavior is below. #!/usr/bin/ruby -w # encoding: UTF-8 require 'net/http' uri = URI.parse('http://www.hearya.com/feed/') result = Net::HTTP.start(uri.host, uri.port) {|http| http.get(uri.request_uri) } p result['content-type'] # "text/xml; charset=UTF-8" <- correct p result.content_type # "text/xml" <- incorrect; truncates the charset field puts result.body.encoding # ASCII-8BIT <- incorrect encoding, should be UTF-8 =end -- https://bugs.ruby-lang.org/ Unsubscribe: