From: jiri.marsik@... Date: 2020-11-24T11:17:59+00:00 Subject: [ruby-core:101048] [Ruby master Bug#17341] Unsound quantifier reduction with nested quantifiers Issue #17341 has been updated by jirkamarsik (Jirka Marsik). Thanks for the quick reply! Your fix looks great. ---------------------------------------- Bug #17341: Unsound quantifier reduction with nested quantifiers https://bugs.ruby-lang.org/issues/17341#change-88719 * Author: jirkamarsik (Jirka Marsik) * Status: Open * Priority: Normal * ruby -v: ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-linux] * Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN ---------------------------------------- The rules for reducing nested quantifiers can produce quantifiers with semantics which differ from the original quantifiers. This can then lead to the regular expressions matching different strings. ``` irb(main):001:0> /(?:a+?)*/.match('aa') (irb):1: warning: nested repeat operator '+?' and '*' was replaced with '+? and ?' in regular expression: /(?:a+?)*/ => #<MatchData "a"> irb(main):002:0> /(a+?)*/.match('aa') => #<MatchData "aa" 1:"a"> ``` In the above, we can see that by inserting a capture group between the two quantifiers, we prevent quantifier reduction from occurring and we get a regexp that matches the whole input. If we let quantifier reduction happen, we get a resulting regexp that only matches the first character. I think quantifier reduction should not change the behavior of a regexp, as it is just an optimization. I found the quantifier reduction rules in `ReduceTypeTable` in `regparse.c`. I haven't checked them all but the ones that replace two quantifiers by two other quantifiers caught my eye. -- https://bugs.ruby-lang.org/ Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe> <http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>