From: zn@... Date: 2018-02-12T06:36:33+00:00 Subject: [ruby-core:85507] [Ruby trunk Bug#14458] RubyVM::InstructionSequence compilation loses Regexp encoding Issue #14458 has been updated by znz (Kazuhiro NISHIYAMA). I think `/p{Alnum}/` is US-ASCII only, so encoding is US-ASCII. ``` % irb -r irb/completion --simple-prompt >> puts "/\p{Alnum}/.encoding" /p{Alnum}/.encoding => nil >> eval "/\p{Alnum}/.encoding" => # ``` You can use `"/\\p{Alnum}/.encoding"` or `'/\p{Alnum}/.encoding'`. ---------------------------------------- Bug #14458: RubyVM::InstructionSequence compilation loses Regexp encoding https://bugs.ruby-lang.org/issues/14458#change-70300 * Author: dannyfallon (Danny Fallon) * Status: Open * Priority: Normal * Assignee: * Target version: * ruby -v: ruby 2.4.3p205 (2017-12-14 revision 61247) [x86_64-darwin16] * Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN ---------------------------------------- We appear to be losing encoding information for a Regexp object when we pass it through the compiler: ~~~ ruby irb(main):001:0> "Test".encoding => # irb(main):002:0> RubyVM::InstructionSequence.compile("'Test'.encoding").eval => # irb(main):003:0> /\p{Alnum}/.encoding => # irb(main):004:0> RubyVM::InstructionSequence.compile("/\p{Alnum}/.encoding").eval => # ~~~ I think the encoding should be retained, much like it is for strings. Adding /u to the Regexp object does retain the encoding but that feels like a burden we shouldn't have to bear? ~~~ irb(main):005:0> RubyVM::InstructionSequence.compile("/\p{Alnum}/u.encoding").eval => # ~~~ -- https://bugs.ruby-lang.org/ Unsubscribe: