[#63592] [ruby-trunk - Bug #10009] IO operation is 10x slower in multi-thread environment — normalperson@...
Issue #10009 has been updated by Eric Wong.
3 messages
2014/07/08
[#63682] [ruby-trunk - Feature #10030] [PATCH] reduce rb_iseq_struct to 296 bytes — ko1@...
Issue #10030 has been updated by Koichi Sasada.
3 messages
2014/07/13
[#63703] [ruby-trunk - Feature #10030] [PATCH] reduce rb_iseq_struct to 296 bytes — ko1@...
Issue #10030 has been updated by Koichi Sasada.
3 messages
2014/07/14
[#63743] [ruby-trunk - Bug #10037] Since r46798 on Solaris, "[BUG] rb_vm_get_cref: unreachable" during make — ngotogenome@...
Issue #10037 has been updated by Naohisa Goto.
3 messages
2014/07/15
[#64136] Ruby 2.1.2 (and 2.1.1 and probably others) assumes a libffi with 3 version numbers in extconf.rb — "Jeffrey 'jf' Lim" <jfs.world@...>
As per subject.
4 messages
2014/07/31
[#64138] Re: Ruby 2.1.2 (and 2.1.1 and probably others) assumes a libffi with 3 version numbers in extconf.rb
— "Jeffrey 'jf' Lim" <jfs.world@...>
2014/07/31
On Thu, Jul 31, 2014 at 6:03 PM, Jeffrey 'jf' Lim <jfs.world@gmail.com>
[ruby-core:63931] [ruby-trunk - Feature #2567] Net::HTTP does not handle encoding correctly
From:
hugo.corbucci@...
Date:
2014-07-22 17:29:41 UTC
List:
ruby-core #63931
Issue #2567 has been updated by Hugo Corbucci.
I've just hit this problem again.
I've read all comments and it seems like I see 3 different opinions:
1) Content type is unrealiable so clients of Net::HTTP should force the encoding to whatever they want whenever they want to use the body of a response.
2) Content type is unrealiable but that's the webserver's fault so Net::HTTP should force the encoding of the body to whatever content type specifies if any or the default_encoding otherwise. Clients who are accessing an unrealiable webserver should force the encoding.
3) Content type is unrealiable so Net::HTTP should try to detect the encoding from the body and then force the body into whatever is found or default_encoding otherwise.
1) requires no work and is the currently implemented solution.
2) needs a patch which is a subset of the one posted by NARUSE.
3) needs a patch which is something close to NARUSE's suggestion (if not all of it).
Changing from 1) to 2) causes a breaking change for every user of Net::HTTP that doesn't currently force the encoding and relies on it being ASCII-8BIT.
Changing from 1) to 3) causes a breaking change in some cases (the ones where the detection algorithm is wrong) if the user of Net::HTTP doesn't currently force the encoding.
Seems to me that this means if a user is properly using the solution in 1), changing it to either 2 or 3 doesn't affect anything. If the user is not forcing the encoding, then there is already a potential problem waiting to happen.
I would honestly prefer Net::HTTP to rely on the data provided by the server both for the body meaning I would consider the header it sent along with the body to inform me of the correct data. If it doesn't, I need to act on this anyway. But if it behaves correctly, I don't have to do anything. Seems better than having to force me to do extra work even though all sides are behaving nicely.
What is stopping this feature from being implemented? A patch?
----------------------------------------
Feature #2567: Net::HTTP does not handle encoding correctly
https://bugs.ruby-lang.org/issues/2567#change-47959
* Author: Ryan Sims
* Status: Assigned
* Priority: Low
* Assignee: Yui NARUSE
* Category: lib
* Target version: next minor
----------------------------------------
=begin
A string returned by an HTTP get does not have its encoding set appropriately with the charset field, nor does the content_type report the charset. Example code demonstrating incorrect behavior is below.
#!/usr/bin/ruby -w
# encoding: UTF-8
require 'net/http'
uri = URI.parse('http://www.hearya.com/feed/')
result = Net::HTTP.start(uri.host, uri.port) {|http|
http.get(uri.request_uri)
}
p result['content-type'] # "text/xml; charset=UTF-8" <- correct
p result.content_type # "text/xml" <- incorrect; truncates the charset field
puts result.body.encoding # ASCII-8BIT <- incorrect encoding, should be UTF-8
=end
--
https://bugs.ruby-lang.org/