From: Ricardo Amorim Date: 2011-11-24T13:01:53+09:00 Subject: [ruby-core:41277] [ruby-trunk - Feature #2567] Net::HTTP does not handle encoding correctly Issue #2567 has been updated by Ricardo Amorim. Yui NARUSE wrote: > It shouldn't effect because URI doesn't include non ASCII character. > If you are talking about an existing implementation which sends Location header with non ASCII characters, > such talk should be on real research. I've seen a few ASP applications that do that. They redirect to a generic error page with an error message as an argument. e.g. below: Location: error_page.asp?msg="P��gina com erro". Well doing some research I've found: http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-17#section-3.2.1 "Historically, HTTP has allowed field content with text in the ISO- 8859-1 [ISO-8859-1] character encoding and supported other character sets only through use of [RFC2047] encoding. In practice, most HTTP header field values use only a subset of the US-ASCII character encoding [USASCII]. Newly defined header fields SHOULD limit their field values to US-ASCII octets. Recipients SHOULD treat other (obs- text) octets in field content as opaque data." It's not entirely clear if non US-ASCII chars are allowed in field contents. ---------------------------------------- Feature #2567: Net::HTTP does not handle encoding correctly http://redmine.ruby-lang.org/issues/2567 Author: Ryan Sims Status: Assigned Priority: Low Assignee: Yui NARUSE Category: lib Target version: 2.0.0 ruby -v: ruby 1.9.1p376 (2009-12-07 revision 26041) [i686-linux] =begin A string returned by an HTTP get does not have its encoding set appropriately with the charset field, nor does the content_type report the charset. Example code demonstrating incorrect behavior is below. #!/usr/bin/ruby -w # encoding: UTF-8 require 'net/http' uri = URI.parse('http://www.hearya.com/feed/') result = Net::HTTP.start(uri.host, uri.port) {|http| http.get(uri.request_uri) } p result['content-type'] # "text/xml; charset=UTF-8" <- correct p result.content_type # "text/xml" <- incorrect; truncates the charset field puts result.body.encoding # ASCII-8BIT <- incorrect encoding, should be UTF-8 =end -- http://redmine.ruby-lang.org