From: Stephen Bannasch Date: 2010-09-02T06:21:57+09:00 Subject: [ruby-core:32003] [Ruby-Bug#3780][Open] RDoc::Parser.binary? broken for some utf8 files longer than 1024 bytes Bug #3780: RDoc::Parser.binary? broken for some utf8 files longer than 1024 bytes http://redmine.ruby-lang.org/issues/show/3780 Author: Stephen Bannasch Status: Open, Priority: Normal Category: core ruby -v: ruby 1.9.2p0 (2010-08-18 revision 29036) [x86_64-darwin10.4.0] RDoc truncates files at 1024 bytes when checking if the file is binary. This will invalidate the file encoding if the file is truncated in the middle of a utf8 char and cause RDoc to exit. I found this problem when running rdoc on the ruby 1.9.2 source. $ ruby -v ruby 1.9.2p0 (2010-08-18 revision 29036) [x86_64-darwin10.4.0] $ rdoc --version rdoc 2.5.11 More description of the bug and a patch with a failing test is on this issue in RubyForge rdoc issue tracker. http://rubyforge.org/tracker/index.php?func=detail&aid=28525&group_id=627&atid=2472 The same issue appears to be in the 1_9 source, see: http://github.com/ruby/ruby/blob/trunk/lib/rdoc/parser.rb#L70 I find it confusing knowing where to create an RDoc issue: RubyForge or here -- so I've created an issue in both places. This gist: http://gist.github.com/561350 (possible_fix.rb) shows how I changed RDoc::Parser.binary? locally -- but I don't think it is correct to classify all utf8 files which are invalid when truncated at 1024 bytes as binary. That same gist (show_parsing_error.rb) also shows another strategy for solving the invalid encoding issue but there are probably better ways to determine if a file is binary. ---------------------------------------- http://redmine.ruby-lang.org