[#37730] [Ruby 1.9 - Bug #4962][Open] come back gem_prelude! — Yusuke Endoh <mame@...>

24 messages 2011/07/02

[#37840] [Ruby 1.9 - Feature #4985][Open] Add %S[] support for making a list of symbols — Aaron Patterson <aaron@...>

23 messages 2011/07/07

[#37866] [Backport87 - Feature #4996][Open] About 1.8.7 EOL — Shyouhei Urabe <shyouhei@...>

22 messages 2011/07/08

[#37913] [Ruby 1.9 - Bug #5003][Open] Enumerator#next segfaults in OS X Lion (10.7) — Ganesh Gunasegaran <ganesh.gunas@...>

16 messages 2011/07/09

[#37917] [Ruby 1.9 - Feature #5005][Open] Provide convenient access to original methods — Lazaridis Ilias <ilias@...>

13 messages 2011/07/09

[#37932] [Ruby 1.9 - Feature #5008][Open] Equal rights for Hash (like Array, String, Integer, Float) — Suraj Kurapati <sunaku@...>

31 messages 2011/07/09

[#37936] [Ruby 1.9 - Feature #5010][Open] Add Slop(-like) in stdlib and deprecate current OptionParser API — Rodrigo Rosenfeld Rosas <rr.rosas@...>

29 messages 2011/07/09

[#37968] [Ruby 1.9 - Bug #5015][Open] method_added" is called in addition to "method_undefined — Lazaridis Ilias <ilias@...>

14 messages 2011/07/10

[#38096] [Ruby 1.9 - Feature #5033][Open] PATCH: 1.9: gc_mark_children: Avoid gc_mark() tail recursion, use goto again. — Kurt Stephens <ks.ruby@...>

14 messages 2011/07/16

[#38109] [Ruby 1.9 - Bug #5034][Open] C Source Code formatting — Lazaridis Ilias <ilias@...>

18 messages 2011/07/16

[#38171] [Ruby 1.9 - Bug #5047][Open] Segfault (most likely involving require) — Jack Christensen <jack@...>

21 messages 2011/07/18

[#38182] [Ruby 1.9 - Feature #5054][Open] Compress a sequence of ends — ANDO Yasushi ANDO <andyjpn@...>

68 messages 2011/07/19

[#38197] [Ruby 1.9 - Feature #5056][Open] About 1.9 EOL — Shyouhei Urabe <shyouhei@...>

39 messages 2011/07/19
[#38900] [Ruby 1.9 - Feature #5056] About 1.9 EOL — Shota Fukumori <sorah@...> 2011/08/10

[#38902] Re: [Ruby 1.9 - Feature #5056] About 1.9 EOL — Yukihiro Matsumoto <matz@...> 2011/08/10

Hi,

[#39048] Re: [Ruby 1.9 - Feature #5056] About 1.9 EOL — SASADA Koichi <ko1@...> 2011/08/22

Hi,

[#39055] Re: [Ruby 1.9 - Feature #5056] About 1.9 EOL — Lucas Nussbaum <lucas@...> 2011/08/23

On 23/08/11 at 06:50 +0900, SASADA Koichi wrote:

[#38295] [Ruby 1.9 - Feature #5064][Open] HTTP user-agent class — Eric Hodel <drbrain@...7.net>

15 messages 2011/07/21

[#38391] [Ruby 1.9 - Bug #5076][Open] Mac OS X Lion Support — Yui NARUSE <naruse@...>

17 messages 2011/07/22

[#38503] [Ruby 1.9 - Feature #5096][Open] offer Logger-compatibility for ext — Eric Wong <normalperson@...>

16 messages 2011/07/25

[#38510] [Ruby 1.9 - Feature #5097][Assigned] Supported platforms of Ruby 1.9.3 — Yui NARUSE <naruse@...>

42 messages 2011/07/26

[#38526] [Backport92 - Backport #5099][Open] Backport r31875 load path performance problem — Aaron Patterson <aaron@...>

19 messages 2011/07/26

[#38538] [Ruby 1.9 - Feature #5101][Open] allow optional timeout for TCPSocket.new — Eric Wong <normalperson@...>

15 messages 2011/07/27

[#38610] [Ruby 1.9 - Feature #5120][Open] String#split needs to be logical — Alexey Muranov <muranov@...>

18 messages 2011/07/30

[#38623] [Ruby 1.9 - Feature #5123][Open] Alias Hash 1.9 as OrderedHash — Alexey Muranov <muranov@...>

14 messages 2011/07/31

[ruby-core:38509] [Ruby 1.9 - Feature #2567] Net::HTTP does not handle encoding correctly

From: jonathan rochkind <jonathan@...>
Date: 2011-07-26 01:56:41 UTC
List: ruby-core #38509
Issue #2567 has been updated by jonathan rochkind.


how else is a developer/client going to figure out the encoding EXCEPT from the HTTP server response content-type?  You pretty much need to set the encoding in ruby 1.9 for the response to be useable.  If the Content-Type isn't trustworthy enough for Net::HTTP... what else is available for the developer?  In the vast majority of cases, the developer will have to map from content-type.  

If Net::HTTP doesn't do it, then every developer has to figure out how to do it on their own and write extra code to do it. Setting the encoding from the HTTP response is going to be by far the most common use pattern. I think Net::HTTP should do it.   

If there are certain edge cases where a knowledgeable developer knows s/he wants something different, then perhaps a "set_encoding?" attribute can be provided set to true by default, but settable to false. Alternately, the developer can always call force_encoding themselves as well. 

The knowledgeable developer perhaps needs ways to disable this behavior, but the beginning developer needs reasonable behavior as a default, and right now they don't get it -- character encoding is one of the most confusing issues in ruby 1.9 (esp to debug), and the more standard libraries set it appropriately when appropriate information is available, the less of a barrier this is to getting started with ruby 1.9.  When you're reading from an arbitrary data stream without encoding metadata (like the file system), it's one thing, the standard libraries can't possibly know what to do.  But when you're reading from a stream with encoding metadata (HTTP headers), using a library specifically designed for that kind of stream (stdlib Net::HTTP), it's unforgiveable that ruby makes each (newbie!) developer reinvent the wheel again and again. 
----------------------------------------
Feature #2567: Net::HTTP does not handle encoding correctly
http://redmine.ruby-lang.org/issues/2567

Author: Ryan Sims
Status: Assigned
Priority: Low
Assignee: Yui NARUSE
Category: lib
Target version: 1.9.x
ruby -v: ruby 1.9.1p376 (2009-12-07 revision 26041) [i686-linux]


=begin
 A string returned by an HTTP get does not have its encoding set appropriately with the charset field, nor does the content_type report the charset. Example code demonstrating incorrect behavior is below.
 
 #!/usr/bin/ruby -w
 # encoding: UTF-8
 
 require 'net/http'
 
 uri = URI.parse('http://www.hearya.com/feed/')
 result = Net::HTTP.start(uri.host, uri.port) {|http|
     http.get(uri.request_uri)
 }
 
 p result['content-type']     # "text/xml; charset=UTF-8" <- correct
 p result.content_type        # "text/xml" <- incorrect; truncates the charset field
 puts result.body.encoding    # ASCII-8BIT <- incorrect encoding, should be UTF-8
=end



-- 
http://redmine.ruby-lang.org

In This Thread

Prev Next