From: Alex Young <alex@...>
Date: 2011-11-22T19:03:04+09:00
Subject: [ruby-core:41198] [ruby-trunk - Feature #2567] Net::HTTP does not handle encoding correctly


Issue #2567 has been updated by Alex Young.


Surely setting the encoding to whatever the content-type header declares doesn't stop mechanize from performing that heuristic?  Setting it to binary (incorrectly, in my view) forces me to fix it manually even when I know everything's lined up properly.  Worse, in order to do it, I have to string match on the content-type header itself, when Net::HTTP has already done that work and has the information available.

If I'm *not* reading HTML documents (which is *far* from uncommon, certainly in my case), the idea of having to apply a heuristic on top of Net::HTTP just doesn't make sense: the information is there in the headers, and Net::HTTP is best placed to interpret it.  It's not the transport layer's job to make assumptions about the content it's transporting.
----------------------------------------
Feature #2567: Net::HTTP does not handle encoding correctly
http://redmine.ruby-lang.org/issues/2567

Author: Ryan Sims
Status: Assigned
Priority: Low
Assignee: Yui NARUSE
Category: lib
Target version: 2.0.0
ruby -v: ruby 1.9.1p376 (2009-12-07 revision 26041) [i686-linux]


=begin
 A string returned by an HTTP get does not have its encoding set appropriately with the charset field, nor does the content_type report the charset. Example code demonstrating incorrect behavior is below.
 
 #!/usr/bin/ruby -w
 # encoding: UTF-8
 
 require 'net/http'
 
 uri = URI.parse('http://www.hearya.com/feed/')
 result = Net::HTTP.start(uri.host, uri.port) {|http|
     http.get(uri.request_uri)
 }
 
 p result['content-type']     # "text/xml; charset=UTF-8" <- correct
 p result.content_type        # "text/xml" <- incorrect; truncates the charset field
 puts result.body.encoding    # ASCII-8BIT <- incorrect encoding, should be UTF-8
=end



-- 
http://redmine.ruby-lang.org