From: "jeremyevans0 (Jeremy Evans) via ruby-core" Date: 2025-11-24T18:52:51+00:00 Subject: [ruby-core:123897] [Ruby Bug#21709] Inconsistent encoding by Regexp.escape Issue #21709 has been updated by jeremyevans0 (Jeremy Evans). Status changed from Feedback to Open thyresias (Thierry Lambert) wrote in #note-2: > > Does this behavior cause any problems in your application? > > Yes: > ```ruby > search_text = "foo" > s_search = Regexp.escape(search_text) > re_prefix = /\p{In_Arabic}.+ / > s_search.prepend re_prefix.source > _re = /^#{s_search}|(?<=��� |: )#{s_search}/ #=> encoding mismatch in dynamic regexp : US-ASCII and UTF-8 (RegexpError) > ``` Thank you for providing an example. This seems more like an issue with the literal Regexp support in general than with `Regexp.escape`. You can trigger the issue without `Regexp.escape`: ```ruby re = /#{"\\p{In_Arabic}".encode("US-ASCII")}\u1234/ # encoding mismatch in dynamic regexp : US-ASCII and UTF-8 ``` It seems to require you specify unicode properties inside an interpolated string that isn't in UTF-8. You get a different error without that unicode character at the end: ```ruby re = /#{"\\p{In_Arabic}".encode("US-ASCII")}/ # invalid character property name {In_Arabic}: /\p{In_Arabic}/ ``` Using `Regexp.new` instead of a literal Regexp may work around the issue: ```ruby search_text = "foo" s_search = Regexp.escape(search_text) re_prefix = /\p{In_Arabic}.+ / s_search.prepend re_prefix.source _re = Regexp.new("^#{s_search}|(?<=��� |: )#{s_search}") ``` ---------------------------------------- Bug #21709: Inconsistent encoding by Regexp.escape https://bugs.ruby-lang.org/issues/21709#change-115301 * Author: thyresias (Thierry Lambert) * Status: Open * ruby -v: ruby 3.4.7 (2025-10-08 revision 7a5688e2a2) +PRISM [x64-mingw-ucrt] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- ```ruby %w(foo ��tre).each do |s| puts "string: #{s.inspect} -> #{s.encoding}" puts "escaped: #{Regexp.escape(s).inspect} -> #{Regexp.escape(s).encoding}" end ``` Output: ``` string: "foo" -> UTF-8 escaped: "foo" -> US-ASCII string: "��tre" -> UTF-8 escaped: "��tre" -> UTF-8 ``` The result should always match the encoding of the argument. -- https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/