From: "jeremyevans0 (Jeremy Evans) via ruby-core" Date: 2025-11-24T21:50:49+00:00 Subject: [ruby-core:123899] [Ruby Bug#21709] Inconsistent encoding by Regexp.escape Issue #21709 has been updated by jeremyevans0 (Jeremy Evans). thyresias (Thierry Lambert) wrote in #note-4: > Ok for the workaround, but don't you think all this is inconsistent? > For me, it's a bug, not a feature. ^_^ I agree this represents a bug, which is why I changed the status back to Open. However, I think the bug is in the literal Regexp support, not in `Regexp.escape`. In general, US-ASCII strings are implicitly convertible to UTF-8 strings, so having `Regexp.escape` return a US-ASCII string for data that is solely US-ASCII is reasonable. This implicit use of US-ASCII happens in other cases: ``` # Literal Symbol $ ruby -e "p :a.encoding" # # Array#join $ ruby -e "p [].join.encoding" # # Literal Regexp $ ruby -e "p //.encoding" # ``` ---------------------------------------- Bug #21709: Inconsistent encoding by Regexp.escape https://bugs.ruby-lang.org/issues/21709#change-115303 * Author: thyresias (Thierry Lambert) * Status: Open * ruby -v: ruby 3.4.7 (2025-10-08 revision 7a5688e2a2) +PRISM [x64-mingw-ucrt] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- ```ruby %w(foo �tre).each do |s| puts "string: #{s.inspect} -> #{s.encoding}" puts "escaped: #{Regexp.escape(s).inspect} -> #{Regexp.escape(s).encoding}" end ``` Output: ``` string: "foo" -> UTF-8 escaped: "foo" -> US-ASCII string: "�tre" -> UTF-8 escaped: "�tre" -> UTF-8 ``` The result should always match the encoding of the argument. -- https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/