From: Benoit Daloze Date: 2011-11-24T01:57:03+09:00 Subject: [ruby-core:41255] [ruby-trunk - Feature #2567] Net::HTTP does not handle encoding correctly Issue #2567 has been updated by Benoit Daloze. Eric Hodel wrote: > So giving the user undetectably garbled text is acceptable to both of you? I wish to clarify. Yes, as it should be garbled only when the response has a wrong Content-Type, in which case the user needs to check if it is the right encoding anyway. (And AFAIK, Firefox always reported garbled text if I set the meta tag to the right encoding and the Content-Type header to the wrong encoding in my tries.) > If the Content-Type header is used as you propose and the user sets the default_internal encoding what should happen? I think leaving it as BINARY (as now) is fine in this case. Assuming default_internal is the right encoding does not seem to be a good heuristic. > If the server lies and the response body is transcoded data may be lost or an exception may be raised. Should this exception be rescued by Net::HTTP? What should the result encoding be if it is? I think Net::HTTP should not transcode (#encode) the response, just set the right encoding if the information is available. ---------------------------------------- Feature #2567: Net::HTTP does not handle encoding correctly http://redmine.ruby-lang.org/issues/2567 Author: Ryan Sims Status: Assigned Priority: Low Assignee: Yui NARUSE Category: lib Target version: 2.0.0 ruby -v: ruby 1.9.1p376 (2009-12-07 revision 26041) [i686-linux] =begin A string returned by an HTTP get does not have its encoding set appropriately with the charset field, nor does the content_type report the charset. Example code demonstrating incorrect behavior is below. #!/usr/bin/ruby -w # encoding: UTF-8 require 'net/http' uri = URI.parse('http://www.hearya.com/feed/') result = Net::HTTP.start(uri.host, uri.port) {|http| http.get(uri.request_uri) } p result['content-type'] # "text/xml; charset=UTF-8" <- correct p result.content_type # "text/xml" <- incorrect; truncates the charset field puts result.body.encoding # ASCII-8BIT <- incorrect encoding, should be UTF-8 =end -- http://redmine.ruby-lang.org