[#8815] Segfault in libc strlen, via rb_str_new2 — "Sean E. Russell" <ser@...>

Howdy,

12 messages 2006/09/09
[#8817] Re: Segfault in libc strlen, via rb_str_new2 — Eric Hodel <drbrain@...7.net> 2006/09/09

On Sep 8, 2006, at 10:10 PM, Sean E. Russell wrote:

Re: [ ruby-Bugs-5711 ] REXML fails to parse UTF-16 XML.

From: Yukihiro Matsumoto <matz@...>
Date: 2006-09-11 02:45:29 UTC
List: ruby-core #8833
Hi,

In message "Re: [ ruby-Bugs-5711 ] REXML fails to parse UTF-16 XML."
    on Mon, 11 Sep 2006 01:25:58 +0900, <noreply@rubyforge.org> writes:

|REXML fails to parse some XML documents written in UTF-16.

REXML is converting body twice, once from initialize, one more from
XMLDECL_START.  I made a patch.  If Sean Russel accept it, it would be
merged into 1.8.

Changes:

  * Encoding#encoding= to return boolean value to tell if the body is
    really converted or not.
  * Specific conversion library (e.g. rexml/encodings/UTF-16.rb) to
    have higher preceding.
  * UTF-16#decode_utf16 should work strings without BOM.

							matz.

--- lib/rexml/encoding.rb	22 Aug 2006 15:25:43 -0000	1.10
+++ lib/rexml/encoding.rb	11 Sep 2006 02:36:44 -0000
@@ -26,17 +26,18 @@ module REXML
         $VERBOSE = false
-        return if defined? @encoding and enc == @encoding
+				enc = enc.nil? ? nil : enc.upcase
+        return false if defined? @encoding and enc == @encoding
         if enc and enc != UTF_8
-          @encoding = enc.upcase
-          begin
-            require 'rexml/encodings/ICONV.rb'
-            Encoding.apply(self, "ICONV")
-          rescue LoadError, Exception => err
-            raise ArgumentError, "Bad encoding name #@encoding" unless @encoding =~ /^[\w-]+$/
-            @encoding.untaint 
-            enc_file = File.join( "rexml", "encodings", "#@encoding.rb" )
-            begin
-              require enc_file
-              Encoding.apply(self, @encoding)
-            rescue LoadError
-              puts $!.message
+					@encoding = enc
+					raise ArgumentError, "Bad encoding name #@encoding" unless @encoding =~ /^[\w-]+$/
+					@encoding.untaint 
+					enc_file = File.join( "rexml", "encodings", "#@encoding.rb" )
+					begin
+						require enc_file
+						Encoding.apply(self, @encoding)
+          rescue LoadError, Exception
+						begin
+							require 'rexml/encodings/ICONV.rb'
+							Encoding.apply(self, "ICONV")
+            rescue LoadError => err
+              puts err.message
               raise ArgumentError, "No decoder found for encoding #@encoding.  Please install iconv."
@@ -52,2 +53,3 @@ module REXML
       end
+			true
     end
Index: lib/rexml/source.rb
===================================================================
RCS file: /var/cvs/src/ruby/lib/rexml/source.rb,v
retrieving revision 1.9
diff -p -u -1 -r1.9 source.rb
--- lib/rexml/source.rb	22 Aug 2006 15:25:43 -0000	1.9
+++ lib/rexml/source.rb	11 Sep 2006 02:36:44 -0000
@@ -46,3 +46,3 @@ module REXML
 		def encoding=(enc)
-			super
+			return unless super
 			@line_break = encode( '>' )
Index: lib/rexml/encodings/UTF-16.rb
===================================================================
RCS file: /var/cvs/src/ruby/lib/rexml/encodings/UTF-16.rb,v
retrieving revision 1.5
diff -p -u -1 -r1.5 UTF-16.rb
--- lib/rexml/encodings/UTF-16.rb	9 Apr 2005 17:03:32 -0000	1.5
+++ lib/rexml/encodings/UTF-16.rb	11 Sep 2006 02:36:44 -0000
@@ -18,5 +18,6 @@ module REXML
     def decode_utf16(str)
+      str = str[2..-1] if /^\376\377/ =~ str
       array_enc=str.unpack('C*')
       array_utf8 = []
-      2.step(array_enc.size-1, 2){|i| 
+      0.step(array_enc.size-1, 2){|i| 
         array_utf8 << (array_enc.at(i+1) + array_enc.at(i)*0x100)

In This Thread