From: "jeremyevans0 (Jeremy Evans) via ruby-core" Date: 2025-11-24T16:54:42+00:00 Subject: [ruby-core:123895] [Ruby Bug#21709] Inconsistent encoding by Regexp.escape Issue #21709 has been updated by jeremyevans0 (Jeremy Evans). Status changed from Open to Feedback This is not a bug, it is deliberate behavior for ASCII-only strings in `rb_reg_quote` (internal function called by `Regexp.escape`): ```c if (ascii_only) { rb_enc_associate(tmp, rb_usascii_encoding()); } ``` `US-ASCII` strings will be automatically converted to UTF-8 if necessary: ```ruby ("foo".encode("US-ASCII") + "\u1234").encoding # => # ``` Does this behavior cause any problems in your application? ---------------------------------------- Bug #21709: Inconsistent encoding by Regexp.escape https://bugs.ruby-lang.org/issues/21709#change-115299 * Author: thyresias (Thierry Lambert) * Status: Feedback * ruby -v: ruby 3.4.7 (2025-10-08 revision 7a5688e2a2) +PRISM [x64-mingw-ucrt] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- ```ruby %w(foo �tre).each do |s| puts "string: #{s.inspect} -> #{s.encoding}" puts "escaped: #{Regexp.escape(s).inspect} -> #{Regexp.escape(s).encoding}" end ``` Output: ``` string: "foo" -> UTF-8 escaped: "foo" -> US-ASCII string: "�tre" -> UTF-8 escaped: "�tre" -> UTF-8 ``` The result should always match the encoding of the argument. -- https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/