[#40602] [ruby-trunk - Bug #5532][Open] Compile problem for bigdecimal on cygwin — Martin Dürst <duerst@...>

14 messages 2011/11/01

[#40617] [ruby-trunk - Feature #5534][Open] Redefine Range class and introduce RelativeNumeric and RelativeRange — Alexey Muranov <muranov@...>

17 messages 2011/11/01

[#40646] [ruby-trunk - Bug #5541][Open] Better configure error message when llvm-gcc is the default compiler — Eric Hodel <drbrain@...7.net>

10 messages 2011/11/01

[#40648] [ruby-trunk - Feature #5543][Open] rb_thread_blocking_region() API is poorly designed — Christopher Huff <cjameshuff@...>

14 messages 2011/11/01

[#40684] [ruby-trunk - Feature #5555][Open] rename #include? to #includes? — Alexey Muranov <muranov@...>

20 messages 2011/11/02

[#40688] [ruby-trunk - Bug #5556][Open] SIGHUP no longer ignored when sent to process group from a subprocess — Brian Ford <brixen@...>

12 messages 2011/11/02

[#40706] [ruby-trunk - Feature #5562][Open] Improvement of Windows IO performance — Hiroshi Shirosaki <h.shirosaki@...>

39 messages 2011/11/03

[#40737] [ruby-trunk - Bug #5570][Open] Encoding of environment variables on Windows — Nikolai Weibull <now@...>

11 messages 2011/11/04

[#40748] Proposal for sustainable branch maintenance — "Yuki Sonoda (Yugui)" <yugui@...>

-----BEGIN PGP SIGNED MESSAGE-----

14 messages 2011/11/05

[#40770] [ruby-trunk - Feature #5578][Open] Embedded YAML for Ruby 2.0 — Thomas Sawyer <transfire@...>

17 messages 2011/11/06

[#40806] [ruby-trunk - Feature #5583][Open] Optionally typing — Yasushi ANDO <andyjpn@...>

21 messages 2011/11/07

[#40824] [ruby-trunk - Feature #5588][Open] add negation flag (v) to Regexp — Suraj Kurapati <sunaku@...>

38 messages 2011/11/08

[#40865] IO.copy_stream creates files with restrictive permissions — Eric Wong <normalperson@...>

I'm not sure if this is a bug or intended as spec.

16 messages 2011/11/09
[#41151] Re: IO.copy_stream creates files with restrictive permissions — Tanaka Akira <akr@...> 2011/11/19

2011/11/9 Eric Wong <normalperson@yhbt.net>:

[#41166] Re: IO.copy_stream creates files with restrictive permissions — KOSAKI Motohiro <kosaki.motohiro@...> 2011/11/20

>> I noticed when a file name argument is passed to the IO.copy_stream, the

[#41168] Re: IO.copy_stream creates files with restrictive permissions — Clifford Heath <clifford.heath@...> 2011/11/20

On 20/11/2011, at 5:09 PM, KOSAKI Motohiro wrote:

[#41176] Re: IO.copy_stream creates files with restrictive permissions — Tanaka Akira <akr@...> 2011/11/21

2011/11/20 Clifford Heath <clifford.heath@gmail.com>:

[#41180] Re: IO.copy_stream creates files with restrictive permissions — KOSAKI Motohiro <kosaki.motohiro@...> 2011/11/21

>> I think documentation is the wrong answer. The security defects are not caused

[#40908] [ruby-trunk - Feature #5607][Open] Inconsistent reaction in Range of String — Yen-Nan Lin <redmine@...>

15 messages 2011/11/10

[#40941] [ruby-trunk - Feature #5617][Open] Allow install RubyGems into dediceted directory — Vit Ondruch <v.ondruch@...>

22 messages 2011/11/11

[#40951] [Backport93 - Backport #5621][Open] Please backport thread-safe autoloading patch — Mike Perham <mperham@...>

25 messages 2011/11/12
[#40971] [Backport93 - Backport #5621] Please backport thread-safe autoloading patch — Mike Perham <mperham@...> 2011/11/12

[#40972] Re: [Backport93 - Backport #5621] Please backport thread-safe autoloading patch — Yehuda Katz <wycats@...> 2011/11/12

Unfortunately ruby-head has a deadlock in one of my go-to scenarios for

[#40976] Re: [Backport93 - Backport #5621] Please backport thread-safe autoloading patch — Hiroshi Nakamura <nahi@...> 2011/11/13

-----BEGIN PGP SIGNED MESSAGE-----

[#41128] Re: [Backport93 - Backport #5621] Please backport thread-safe autoloading patch — Charles Oliver Nutter <headius@...> 2011/11/18

On Sat, Nov 12, 2011 at 7:24 PM, Hiroshi Nakamura <nahi@ruby-lang.org> wrote:

[#41129] Re: [Backport93 - Backport #5621] Please backport thread-safe autoloading patch — Hiroshi Nakamura <nahi@...> 2011/11/18

-----BEGIN PGP SIGNED MESSAGE-----

[#41142] Re: [Backport93 - Backport #5621] Please backport thread-safe autoloading patch — Charles Oliver Nutter <headius@...> 2011/11/18

On Fri, Nov 18, 2011 at 12:15 AM, Hiroshi Nakamura <nahi@ruby-lang.org> wro=

[#40982] [ruby-trunk - Bug #5625][Open] Remove profanity and pejoratives — Andrew Grimm <andrew.j.grimm@...>

30 messages 2011/11/13

[#41004] [ruby-trunk - Feature #5628][Open] Module#basename — Thomas Sawyer <transfire@...>

18 messages 2011/11/14

[#41024] [ruby-trunk - Feature #5632][Open] Attempt to open included class shades it instead. — Boris Stitnicky <boris@...>

12 messages 2011/11/14

[#41025] Proposal to add new methods: positive? negative? natural? — JosFrancisco Calvo Moreno <josefranciscocalvo@...>

Hi all!

11 messages 2011/11/14
[#41027] Re: Proposal to add new methods: positive? negative? natural? — Jeremy Evans <code@...> 2011/11/14

On 11/15 12:58, Jos? Francisco Calvo Moreno wrote:

[#41031] Re: Proposal to add new methods: positive? negative? natural? — JosFrancisco Calvo Moreno <josefranciscocalvo@...> 2011/11/14

Hi Jeremy,

[#41038] [ruby-trunk - Bug #5634][Open] yield and binding — Thomas Sawyer <transfire@...>

17 messages 2011/11/14

[#41086] [ruby-trunk - Feature #5644][Open] add Enumerable#exclude? antonym — Suraj Kurapati <sunaku@...>

14 messages 2011/11/17

[#41175] [ruby-trunk - Feature #5654][Open] Introduce global lock to avoid concurrent require — Hiroshi Nakamura <nakahiro@...>

12 messages 2011/11/21

[#41200] [ruby-trunk - Bug #5659][Open] bug releasing a gem created with rails 3.1 — Vinicius Gati <viniciusgati@...>

14 messages 2011/11/22

[#41212] [ruby-trunk - Feature #5662][Open] inject-accumulate, or Haskell's mapAccum* — Edvard Majakari <edvard.majakari@...>

12 messages 2011/11/22

[#41213] [ruby-trunk - Bug #5663][Open] Combined map/select method — Yehuda Katz <wycats@...>

62 messages 2011/11/22

[#41317] [ruby-trunk - Bug #5676][Open] miniruby linking error: undefined reference to ___stack_chk_guard — Martin Dürst <duerst@...>

10 messages 2011/11/27

[#41404] [ruby-trunk - Bug #5690][Open] Module#qualified_const_get — Yehuda Katz <wycats@...>

31 messages 2011/11/30

[ruby-core:41256] [ruby-trunk - Feature #2567] Net::HTTP does not handle encoding correctly

From: Yui NARUSE <naruse@...>
Date: 2011-11-23 17:15:54 UTC
List: ruby-core #41256
Issue #2567 has been updated by Yui NARUSE.


I don't decide whether merge this or not yet, an experimental patch is following:

diff --git a/lib/net/http.rb b/lib/net/http.rb
index 1c594e0..0abcaa5 100644
--- a/lib/net/http.rb
+++ b/lib/net/http.rb
@@ -2723,6 +2723,8 @@ module Net   #:nodoc:
       end
       @read = true
 
+      enc = detect_encoding(@body)
+      @body.force_encoding(enc) if enc
       @body
     end
 
@@ -2807,6 +2809,167 @@ module Net   #:nodoc:
       end
     end
 
+    private
+    # :nodoc:
+    def detect_encoding(str, encoding=nil)
+      if encoding
+      elsif encoding = type_params['charset']
+      elsif encoding = check_bom(str)
+      else
+        case main_type.downcase
+        when %r{text/x(?:ht)?ml|application/(?:[^+]+\+)?xml}
+          /\A<xml[ \t\r\n]+
+            version[ \t\r\n]*=[ \t\r\n]*(?:"[0-9.]+"|'[0-9.]*')[ \t\r\n]+
+            encoding[ \t\r\n]*=[ \t\r\n]*
+            (?:"([A-Za-z][\-A-Za-z0-9._]*)"|'([A-Za-z][\-A-Za-z0-9._]*)')/x =~ str
+          encoding = $1 || $2 || Encoding::UTF_8
+        when %r{text/html.*}
+          sniff_encoding(str, encoding=nil)
+        end
+      end
+      return encoding
+    end
+
+    # :nodoc:
+    def sniff_encoding(str, encoding=nil)
+      # the encoding sniffing algorithm
+      # http://www.w3.org/TR/html5/parsing.html#determining-the-character-encoding
+      return enc if enc = scanning_meta(str)
+      # 6. last visited page or something
+      # 7. frequency
+      if str.ascii_only?
+        return Encoding::US_ASCII
+      else
+        utf8str = str.dup.force_encoding(Encoding::UTF_8)
+        return utf8str if utf8str.valid_encoding?
+      end
+      # 8. implementation-defined or user-specified
+    end
+
+    # :nodoc:
+    def check_bom(str)
+      case str.byteslice(0, 2)
+      when "\xFE\xFF"
+        return Encoding::UTF_16BE
+      when "\xFF\xFE"
+        return Encoding::UTF_16LE
+      end
+      if "\xEF\xBB\xBF" == str.byteslice(0, 3)
+        return Encoding::UTF_8
+      end
+      nil
+    end
+
+    # :nodoc:
+    def scanning_meta(str)
+      require 'strscan'
+      ss = StringScanner.new(str)
+      while true
+        if ss.skip(/<!--.*?-->/)
+        elsif ss.skip(/meta[\t\n\f\r ]*/)
+          attrs = {} # attribute_list
+          got_pragma = false
+          need_pragma = nil
+          charset = nil
+
+          # step: Attributes
+          while attr = get_attribute(ss)
+            name, value = *attr
+            next if attrs[name]
+            attrs[name] = true
+            case name
+            when 'http-equev'
+              got_pragma = true if value == 'content-type'
+            when 'content'
+              encoding = extracting_encodings_from_meta_elements(value)
+              unless charset
+                charset = encoding
+              end
+              need_pragma = true
+            when 'charset'
+              need_pragma = false
+              charset = value
+            end
+          end
+
+          # step: Processing
+          next if need_pragma.nil?
+          next if need_pragma && !got_pragma
+          charset = Encoding.find(charset) rescue nil
+          next unless charset
+          charset = Encoding::UTF_8 if charset == Encoding::UTF_16
+          return charset # tentative
+        elsif ss.skip(/<\/?[A-Za-z][^\t\n\f\r ]*/)
+          1 while get_attribute(ss)
+        elsif ss.skip(/<[!\/?][^>]*>/)
+        elsif ss.getch
+        end
+      end
+      nil
+    end
+
+    def get_attribute(ss)
+      ss.scan(/[\t\n\f\r \/]*/)
+      if ss.peek(1) == '>'
+        ss.getch
+        return nil
+      end
+      name = ss.scan(/[^=\t\n\f\r \/>]*/)
+      name.downcase!
+      raise if name.empty?
+      ss.skip(/[\t\n\f\r ]*/)
+      if ss.getch != '='
+        value = ''
+        return [name, value]
+      end
+      ss.skip(/[\t\n\f\r ]*/)
+      case ss.peek(1)
+      when '"'
+        ss.getch
+        value = ss.scan(/[^"]+/)
+        value.downcase!
+        ss.getch
+      when "'"
+        ss.getch
+        value = ss.scan(/[^']+/)
+        value.downcase!
+        ss.getch
+      when '>'
+        value = ''
+      else
+        value = ss.scan(/[^\t\n\f\r >]+/)
+        value.downcase!
+      end
+      [name, value]
+    end
+
+    def extracting_encodings_from_meta_elements(value)
+      # http://dev.w3.org/html5/spec/fetching-resources.html#algorithm-for-extracting-an-encoding-from-a-meta-element
+      if /charset[\t\n\f\r ]*=(?:"([^"]*)"|'([^']*)'|["']|\z|([^\t\n\f\r ;]+))/i =~ value
+        return $1 || $2 || $3
+      end
+      return nil
+    end
+
+    # http://dev.w3.org/html5/spec/parsing.html#table-encoding-overrides
+    TABLE_ENCODING_OVERRIDES = {
+      'EUC-KR'         => Encoding::CP949,
+      'EUC-JP'         => Encoding::CP51932,
+      'GB2312'         => Encoding::GBK,
+      'GB_2312-80'     => Encoding::GBK,
+      'ISO-8859-1'     => Encoding::Windows_1252,
+      'ISO-8859-9'     => Encoding::Windows_1254,
+      'ISO-8859-11'    => Encoding::Windows_874,
+      'KS_C_5601-1987' => Encoding::CP949,
+      'SHIFT_JIS'      => Encoding::Windows_31J,
+      'TIS-620'        => Encoding::Windows_874,
+      'US-ASCII'       => Encoding::Windows_1252,
+    }
+
+    # :nodoc:
+    def override_encoding(enc)
+      TABLE_ENCODING_OVERRIDES[enc.strip.upcase] || enc
+    end
   end
 
 
----------------------------------------
Feature #2567: Net::HTTP does not handle encoding correctly
http://redmine.ruby-lang.org/issues/2567

Author: Ryan Sims
Status: Assigned
Priority: Low
Assignee: Yui NARUSE
Category: lib
Target version: 2.0.0
ruby -v: ruby 1.9.1p376 (2009-12-07 revision 26041) [i686-linux]


=begin
 A string returned by an HTTP get does not have its encoding set appropriately with the charset field, nor does the content_type report the charset. Example code demonstrating incorrect behavior is below.
 
 #!/usr/bin/ruby -w
 # encoding: UTF-8
 
 require 'net/http'
 
 uri = URI.parse('http://www.hearya.com/feed/')
 result = Net::HTTP.start(uri.host, uri.port) {|http|
     http.get(uri.request_uri)
 }
 
 p result['content-type']     # "text/xml; charset=UTF-8" <- correct
 p result.content_type        # "text/xml" <- incorrect; truncates the charset field
 puts result.body.encoding    # ASCII-8BIT <- incorrect encoding, should be UTF-8
=end



-- 
http://redmine.ruby-lang.org

In This Thread

Prev Next