From: hugo.corbucci@... Date: 2014-07-22T17:29:41+00:00 Subject: [ruby-core:63931] [ruby-trunk - Feature #2567] Net::HTTP does not handle encoding correctly Issue #2567 has been updated by Hugo Corbucci. I've just hit this problem again. I've read all comments and it seems like I see 3 different opinions: 1) Content type is unrealiable so clients of Net::HTTP should force the encoding to whatever they want whenever they want to use the body of a response. 2) Content type is unrealiable but that's the webserver's fault so Net::HTTP should force the encoding of the body to whatever content type specifies if any or the default_encoding otherwise. Clients who are accessing an unrealiable webserver should force the encoding. 3) Content type is unrealiable so Net::HTTP should try to detect the encoding from the body and then force the body into whatever is found or default_encoding otherwise. 1) requires no work and is the currently implemented solution. 2) needs a patch which is a subset of the one posted by NARUSE. 3) needs a patch which is something close to NARUSE's suggestion (if not all of it). Changing from 1) to 2) causes a breaking change for every user of Net::HTTP that doesn't currently force the encoding and relies on it being ASCII-8BIT. Changing from 1) to 3) causes a breaking change in some cases (the ones where the detection algorithm is wrong) if the user of Net::HTTP doesn't currently force the encoding. Seems to me that this means if a user is properly using the solution in 1), changing it to either 2 or 3 doesn't affect anything. If the user is not forcing the encoding, then there is already a potential problem waiting to happen. I would honestly prefer Net::HTTP to rely on the data provided by the server both for the body meaning I would consider the header it sent along with the body to inform me of the correct data. If it doesn't, I need to act on this anyway. But if it behaves correctly, I don't have to do anything. Seems better than having to force me to do extra work even though all sides are behaving nicely. What is stopping this feature from being implemented? A patch? ---------------------------------------- Feature #2567: Net::HTTP does not handle encoding correctly https://bugs.ruby-lang.org/issues/2567#change-47959 * Author: Ryan Sims * Status: Assigned * Priority: Low * Assignee: Yui NARUSE * Category: lib * Target version: next minor ---------------------------------------- =begin A string returned by an HTTP get does not have its encoding set appropriately with the charset field, nor does the content_type report the charset. Example code demonstrating incorrect behavior is below. #!/usr/bin/ruby -w # encoding: UTF-8 require 'net/http' uri = URI.parse('http://www.hearya.com/feed/') result = Net::HTTP.start(uri.host, uri.port) {|http| http.get(uri.request_uri) } p result['content-type'] # "text/xml; charset=UTF-8" <- correct p result.content_type # "text/xml" <- incorrect; truncates the charset field puts result.body.encoding # ASCII-8BIT <- incorrect encoding, should be UTF-8 =end -- https://bugs.ruby-lang.org/