From: naruse@... Date: 2017-12-15T08:22:10+00:00 Subject: [ruby-core:84282] [Ruby trunk Feature#2567] Net::HTTP does not handle encoding correctly Issue #2567 has been updated by naruse (Yui NARUSE). chucke (Tiago Cardoso) wrote: > Bitten by this as well. I'd go the route proposed earlier: > > 1. By default, encode the body using the charset set in content-type header. HTML's encoding is definition is bit different from usual encoding converters as described at WHATWG Encoding Standard. https://encoding.spec.whatwg.org/ And charset parameter has many aliases which sometimes different from normal encoding aliases. https://encoding.spec.whatwg.org/#names-and-labels > 2. Provide an option to disable this, to keep old behaviour. How the option is specified is problem. The encoding may differ per content (URL / path). Then it should be specified with get/post methods. But there's already header and data hash arguments... ---------------------------------------- Feature #2567: Net::HTTP does not handle encoding correctly https://bugs.ruby-lang.org/issues/2567#change-68437 * Author: slide_rule (Ryan Sims) * Status: Assigned * Priority: Normal * Assignee: naruse (Yui NARUSE) * Target version: ---------------------------------------- =begin A string returned by an HTTP get does not have its encoding set appropriately with the charset field, nor does the content_type report the charset. Example code demonstrating incorrect behavior is below. #!/usr/bin/ruby -w # encoding: UTF-8 require 'net/http' uri = URI.parse('http://www.hearya.com/feed/') result = Net::HTTP.start(uri.host, uri.port) {|http| http.get(uri.request_uri) } p result['content-type'] # "text/xml; charset=UTF-8" <- correct p result.content_type # "text/xml" <- incorrect; truncates the charset field puts result.body.encoding # ASCII-8BIT <- incorrect encoding, should be UTF-8 =end -- https://bugs.ruby-lang.org/ Unsubscribe: