From: jiri.marsik@... Date: 2021-06-15T11:59:28+00:00 Subject: [ruby-core:104276] [Ruby master Bug#17990] Inconsistent behavior of Regexp quantifiers over characters with complex case foldings Issue #17990 has been reported by jirkamarsik (Jirka Marsik). ---------------------------------------- Bug #17990: Inconsistent behavior of Regexp quantifiers over characters with complex case foldings https://bugs.ruby-lang.org/issues/17990 * Author: jirkamarsik (Jirka Marsik) * Status: Open * Priority: Normal * ruby -v: ruby 3.0.1p64 (2021-04-05 revision 0fb782ee38) [x86_64-linux] * Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN ---------------------------------------- With case insensitive Regexps, the string `"ff"` is considered equal to the string `"\ufb00"` with a single ligature character. ``` irb(main):001:0> /ff/i.match("\ufb00") => # ``` This behavior also persists when the string `"ff"` doesn't appear literally in the Regexp source but is expressed using a fixed-length quantifier, as in the following: ``` irb(main):002:0> /f{2}/i.match("\ufb00") => # irb(main):003:0> /f{2,2}/i.match("\ufb00") => # ``` However, this doesn't hold in general. When using other quantifiers, the ligature character `"\ufb00"` is not recognized a sequence of two `"f"` characters. ``` irb(main):004:0> /f*/i.match("\ufb00") => # irb(main):005:0> /f+/i.match("\ufb00") => nil irb(main):006:0> /f{1,}/i.match("\ufb00") => nil irb(main):007:0> /f{1,2}/i.match("\ufb00") => nil irb(main):008:0> /f{,2}/i.match("\ufb00") => # irb(main):009:0> /ff?/i.match("\ufb00") => nil ``` This leads to inconsistent behavior where a Regexp like `/f{1,2}/i` matches *fewer* strings than the more strict Regexp `/f{2,2}/i`. I suspect that this is caused by the pattern analyzer directly expanding `/f{2}/i` and `/f{2,2}/i` into `/ff/i`. However, this optimization then changes the semantics of the Regexp, as it is otherwise impossible to match a single ligature character via multiple repetitions of a quantified expression. While experimenting with this case, I have also discovered a related issue (caused by the problematic expansions of `/f{n}/i` and the issue reported here: https://bugs.ruby-lang.org/issues/17989). These match: ``` /f{100}/i.match("f" * 100) /f{100}/i.match("\ufb00" * 50) /f{100}/i.match("\ufb00" * 49 + "ff") /f{100}/i.match("ff" + "\ufb00" * 49) ``` However, this doesn't match: ``` /f{100}/i.match("f" + "\ufb00" * 49 + "f") ``` -- https://bugs.ruby-lang.org/ Unsubscribe: