From: Yui NARUSE <naruse@...> Date: 2011-09-06T10:37:27+09:00 Subject: [ruby-core:39293] [Ruby 1.9 - Bug #5278][Assigned] REXML -- Malformed comment Issue #5278 has been updated by Yui NARUSE. Status changed from Open to Assigned Assignee set to Kouhei Sutou Target version set to 1.9.3 ---------------------------------------- Bug #5278: REXML -- Malformed comment http://redmine.ruby-lang.org/issues/5278 Author: Thomas Fritzsche Status: Assigned Priority: Normal Assignee: Kouhei Sutou Category: Target version: 1.9.3 ruby -v: ruby 1.9.2p290 (2011-07-09 revision 32553) [x86_64-darwin11.1.0] Hi Ruby-Team, I use lib rexml for XML parsing. Kanjidic2 XML-File: http://www.csse.monash.edu.au/~jwb/kanjidic2/��� (I do not attach file because it it too large) It works with version 1.8.7 but PaseException ("Malformed comment" is raised in lib/rexml/parsers/baseparser.rb My Code looks like this: ------ require 'rexml/document' require 'rexml/streamlistener' class KanjiListener include REXML::StreamListener end f = File.new("kanji.xml","rb") list = KanjiListener.new REXML::Document.parse_stream(f, list) ----- The used XML-File from above link has a comment section that looks like: ... <!-- Version 1.6 - April 2008 This is the DTD of the XML-format kanji file combining information from the KANJIDIC and KANJD212 files. It is intended to be largely self- documenting, with each field being accompanied by an explanatory comment. --> ... It's strange but the parser fails at "self- documented". The issue comes up here (about line 345): ... if md[0][2] == ?- md = @source.match( COMMENT_PATTERN, true ) case md[1] when /--/, /-$/ raise REXML::ParseException.new("Malformed comment", @source) end ... The MatchingData md[1] contains the complete comment and than regular expression /-$/ matches. From Debugging I guess the original Buffer is read by "readline" and somehow still includes the end-of-line markers. I tried to open the original FileIO with different newline-parameters but nothing helped. I tried different ruby versions (incl. todays 1.9.3-head) but complete 1.9 seems to have the problem while 1.8 works. I meanwhile converted to nokogiri XML-Parser and this works without problem on 1.9.x and I would expect that REXML could parse this too. For test purpose I just changed a single character on this file so that "/-$/" does not match "self-" in original XML file and than it works. ������������������������������������������ -- http://redmine.ruby-lang.org