From: Alex Young Date: 2011-11-22T19:03:04+09:00 Subject: [ruby-core:41198] [ruby-trunk - Feature #2567] Net::HTTP does not handle encoding correctly Issue #2567 has been updated by Alex Young. Surely setting the encoding to whatever the content-type header declares doesn't stop mechanize from performing that heuristic? Setting it to binary (incorrectly, in my view) forces me to fix it manually even when I know everything's lined up properly. Worse, in order to do it, I have to string match on the content-type header itself, when Net::HTTP has already done that work and has the information available. If I'm *not* reading HTML documents (which is *far* from uncommon, certainly in my case), the idea of having to apply a heuristic on top of Net::HTTP just doesn't make sense: the information is there in the headers, and Net::HTTP is best placed to interpret it. It's not the transport layer's job to make assumptions about the content it's transporting. ---------------------------------------- Feature #2567: Net::HTTP does not handle encoding correctly http://redmine.ruby-lang.org/issues/2567 Author: Ryan Sims Status: Assigned Priority: Low Assignee: Yui NARUSE Category: lib Target version: 2.0.0 ruby -v: ruby 1.9.1p376 (2009-12-07 revision 26041) [i686-linux] =begin A string returned by an HTTP get does not have its encoding set appropriately with the charset field, nor does the content_type report the charset. Example code demonstrating incorrect behavior is below. #!/usr/bin/ruby -w # encoding: UTF-8 require 'net/http' uri = URI.parse('http://www.hearya.com/feed/') result = Net::HTTP.start(uri.host, uri.port) {|http| http.get(uri.request_uri) } p result['content-type'] # "text/xml; charset=UTF-8" <- correct p result.content_type # "text/xml" <- incorrect; truncates the charset field puts result.body.encoding # ASCII-8BIT <- incorrect encoding, should be UTF-8 =end -- http://redmine.ruby-lang.org