From: "alpaca-tc (Hiroyuki Ishii) via ruby-core" Date: 2025-08-02T14:23:57+00:00 Subject: [ruby-core:122899] [Ruby Bug#21528] SyntaxError#message may have broken encoding with multibyte source under Prism Issue #21528 has been reported by alpaca-tc (Hiroyuki Ishii). ---------------------------------------- Bug #21528: SyntaxError#message may have broken encoding with multibyte source under Prism https://bugs.ruby-lang.org/issues/21528 * Author: alpaca-tc (Hiroyuki Ishii) * Status: Open * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- Since the introduction of Prism, when parsing Ruby source code that contains multibyte characters, SyntaxError#message can sometimes have invalid encoding. Here is a reproducible example: ``` begin RubyVM::InstructionSequence.compile(<<~CODE, nil, nil, 1) if a # 0000000000000������������������ # CODE rescue SyntaxError => e $e = e puts e.message # string contains a multibyte character that is cut off mid-byte. \xE3 # :3: syntax errors found # 1 | if a # > 2 | # 0000000000000���������������\xE3 ... # | ^ expected an `end` to close the conditional clause # > 3 | # # | ^ unexpected end-of-input, assuming it is closing the parent top level context puts e.message.valid_encoding? #=> expected true, but got false end ``` This appears to be caused by a truncation process in prism's error message generating that does not consider multibyte character boundaries. See: The truncation logic around [prism_compile.c L10696-L10709](https://github.com/ruby/ruby//blob/30a20bc166bc37acd7dcb3788686df149c7f428a/prism_compile.c#L10696-L10709) I'm not sure how to correctly fix it due to lack of knowledge about safe byte truncation. I discovered this issue through irb, which attempts to display source code even when it contains syntax errors. Because irb uses `SyntaxError#message`, it raised an `ArgumentError: invalid byte sequence in UTF-8`. See: https://github.com/ruby/irb/blob/f60dfa8549f746f69e9a6d160604a7a4974ffac1/lib/irb/ruby-lex.rb#L255-L256 If this is considered an irb issue, I already have a patch for IRB that handles it. -- https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/