From: Eric Hodel Date: 2011-07-26T16:02:45+09:00 Subject: [ruby-core:38515] [Ruby 1.9 - Feature #2567] Net::HTTP does not handle encoding correctly Issue #2567 has been updated by Eric Hodel. The problem is not so much forcing the user to figure out how to get correct encoding (charset) but making sure the encoding returned is accurate. If we can add this feature to Net::HTTP in a way that works for most cases that's great. Unfortunately websites outside of the US seem to have big problems with guessing the encoding correctly and require an attempt at parsing the document first. Most bugs in mechanize about setting the encoding correctly came from people parsing non-English and non-Latin websites (so UTF-8 or ISO-8859-1 won't work). If we can do this without needing to parse the document that's great, but I think that is very difficult to do. Having a broken or inaccurate way of choosing the encoding will be worse than having no way. ---------------------------------------- Feature #2567: Net::HTTP does not handle encoding correctly http://redmine.ruby-lang.org/issues/2567 Author: Ryan Sims Status: Assigned Priority: Low Assignee: Yui NARUSE Category: lib Target version: 1.9.x ruby -v: ruby 1.9.1p376 (2009-12-07 revision 26041) [i686-linux] =begin A string returned by an HTTP get does not have its encoding set appropriately with the charset field, nor does the content_type report the charset. Example code demonstrating incorrect behavior is below. #!/usr/bin/ruby -w # encoding: UTF-8 require 'net/http' uri = URI.parse('http://www.hearya.com/feed/') result = Net::HTTP.start(uri.host, uri.port) {|http| http.get(uri.request_uri) } p result['content-type'] # "text/xml; charset=UTF-8" <- correct p result.content_type # "text/xml" <- incorrect; truncates the charset field puts result.body.encoding # ASCII-8BIT <- incorrect encoding, should be UTF-8 =end -- http://redmine.ruby-lang.org