[ruby-core:105633] [Ruby master Bug#17774] Quantified empty group causes regex to fail
From:
"jeremyevans0 (Jeremy Evans)" <noreply@...>
Date:
2021-10-13 16:43:28 UTC
List:
ruby-core #105633
Issue #17774 has been updated by jeremyevans0 (Jeremy Evans).
I looked into fixing this by removing the define of `USE_MONOMANIAC_CHECK_CAPTURES_IN_ENDLESS_REPEAT`, as @mame indicated: https://github.com/ruby/ruby/commit/018922ba15eb7aea86957789d7defae9ffc43688
It ends up breaking a few specs. For example, it changes the behavior of:
```ruby
/(a|\2b|())*/.match("aaabbb").to_a
# Before:
# => ["aaabbb", "", ""]
# After:
# => ["aaa", "", ""]
```
For this example, Ruby 1.8 returns `["aaa", "a", nil]`. The equivalent in Perl returns `["aaa", "", ""]`. The equivalent in Python 2 and 3 returns `["aaabbb", "", ""]`. I think the `["aaabbb", "", ""]` result seems best for a greedy match since it matches the most characters. However, I can also see where an implementation would return one of the other results if a scan terminates when no forward progress is made during an iteration.
Anyway, if we are OK with this behavior change for empty capture groups, I can submit the commit as a pull request. However, I think it would be better to wait for a fix in Onigmo.
----------------------------------------
Bug #17774: Quantified empty group causes regex to fail
https://bugs.ruby-lang.org/issues/17774#change-94123
* Author: Davidebyzero (David Ellsworth)
* Status: Open
* Priority: Normal
* ruby -v: ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-msys]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN
----------------------------------------
The regex `^((x*)(?=\2$))*x$` matches powers of 2 in unary, expressed as strings of `x` characters whose length is the number.
Adding an empty group `()` in the middle of it should have no effect on its operation, and indeed it does not. `^((x*)()(?=\2$))*x$` still matches powers of 2 just fine.
Quantifying that empty group, `(){4}`, should still have no effect. And indeed, `^((x*)(){4}(?=\2$))*x$` still matches powers of 2. But quantify that to `(){5}`, and suddenly it fails.
The following command line should print `1`, but instead prints nothing:
```
ruby -e 'print 1 if "x"*32 =~ /^((x*)(){5}(?=\2$))*x$/'
```
However this one does print `1`:
```
ruby -e 'print 1 if "x"*32 =~ /^((x*)(){4}(?=\2$))*x$/'
```
Bug found to occur on [Try It Online](https://tio.run/): `ruby 2.5.5p157 (2019-03-15 revision 67260) [x86_64-linux]`
Bug confirmed to happen on my own machine: `ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-msys]`
Solving the challenge [Is that number a Two Bit Number™️?](https://codegolf.stackexchange.com/questions/211840/is-that-number-a-two-bit-number%ef%b8%8f/222792#222792) on Code Golf Stack Exchange is what led me to discover this bug.
--
https://bugs.ruby-lang.org/
Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>