From: Eric Hodel Date: 2011-11-22T13:46:28+09:00 Subject: [ruby-core:41190] [ruby-trunk - Feature #2567] Net::HTTP does not handle encoding correctly Issue #2567 has been updated by Eric Hodel. =begin What should the user expect when the response headers are wrong? For example, the response Content-Type claims ISO-8859-1 but the content was UTF-8? (Yes, this really happens) If Net::HTTP forces the encoding to ISO-8859-1 you will have undetectably garbled text: $ ruby -e 's = "��"; s.force_encoding Encoding::ISO_8859_1; puts s.valid_encoding?, s.encode(Encoding::UTF_8)' true ���� I think leaving the response as binary encoding and allowing the user to apply the proper heuristics to determine the encoding for their is the best way. If you wish to read HTML documents, perhaps mechanize is a better choice as it implements a heuristic similar to the one in HTML5 to find the encoding of the document despite potential lies from the server or document header. =end ---------------------------------------- Feature #2567: Net::HTTP does not handle encoding correctly http://redmine.ruby-lang.org/issues/2567 Author: Ryan Sims Status: Assigned Priority: Low Assignee: Yui NARUSE Category: lib Target version: 2.0.0 ruby -v: ruby 1.9.1p376 (2009-12-07 revision 26041) [i686-linux] =begin A string returned by an HTTP get does not have its encoding set appropriately with the charset field, nor does the content_type report the charset. Example code demonstrating incorrect behavior is below. #!/usr/bin/ruby -w # encoding: UTF-8 require 'net/http' uri = URI.parse('http://www.hearya.com/feed/') result = Net::HTTP.start(uri.host, uri.port) {|http| http.get(uri.request_uri) } p result['content-type'] # "text/xml; charset=UTF-8" <- correct p result.content_type # "text/xml" <- incorrect; truncates the charset field puts result.body.encoding # ASCII-8BIT <- incorrect encoding, should be UTF-8 =end -- http://redmine.ruby-lang.org