From: jiri.marsik@...
Date: 2020-11-24T11:17:59+00:00
Subject: [ruby-core:101048] [Ruby master Bug#17341] Unsound quantifier reduction with nested quantifiers

Issue #17341 has been updated by jirkamarsik (Jirka Marsik).


Thanks for the quick reply! Your fix looks great.

----------------------------------------
Bug #17341: Unsound quantifier reduction with nested quantifiers
https://bugs.ruby-lang.org/issues/17341#change-88719

* Author: jirkamarsik (Jirka Marsik)
* Status: Open
* Priority: Normal
* ruby -v: ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-linux]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN
----------------------------------------
The rules for reducing nested quantifiers can produce quantifiers with semantics which differ from the original quantifiers. This can then lead to the regular expressions matching different strings.

```
irb(main):001:0> /(?:a+?)*/.match('aa')
(irb):1: warning: nested repeat operator '+?' and '*' was replaced with '+? and ?' in regular expression: /(?:a+?)*/
=> #<MatchData "a">
irb(main):002:0> /(a+?)*/.match('aa')
=> #<MatchData "aa" 1:"a">
```

In the above, we can see that by inserting a capture group between the two quantifiers, we prevent quantifier reduction from occurring and we get a regexp that matches the whole input. If we let quantifier reduction happen, we get a resulting regexp that only matches the first character. I think quantifier reduction should not change the behavior of a regexp, as it is just an optimization.

I found the quantifier reduction rules in `ReduceTypeTable` in `regparse.c`. I haven't checked them all but the ones that replace two quantifiers by two other quantifiers caught my eye.



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>