[ruby-core:109696] [Ruby master Bug#18797] Third argument to Regexp.new is a bit broken
From:
"jeremyevans0 (Jeremy Evans)" <noreply@...>
Date:
2022-08-25 19:41:25 UTC
List:
ruby-core #109696
Issue #18797 has been updated by jeremyevans0 (Jeremy Evans).
matz (Yukihiro Matsumoto) wrote in #note-1:
> This is indeed an obsolete feature for long time. And the third argument is ignored (IIRC).
Unfortunately, the third argument is not ignored:
```ruby
p Regexp.new("\u1234", nil, "n").encoding
#<Encoding:ASCII-8BIT>
p Regexp.new("\u1234", nil).encoding
#<Encoding:UTF-8>
```
Would you like to change the behavior to ignore the third argument without deprecating it? Or would you prefer to keep the current behavior and document it?
----------------------------------------
Bug #18797: Third argument to Regexp.new is a bit broken
https://bugs.ruby-lang.org/issues/18797#change-98921
* Author: janosch-x (Janosch Müller)
* Status: Open
* Priority: Normal
* Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
## Situation
'n' or 'N' can be passed as a third argument to `Regexp.new`. However, the behavior is not the same as the literal `n`-flag or the `Regexp::NOENCODING` option, and it makes the `#encoding` of `Regexp` and `Regexp#source` diverge:
```ruby
//n # => SyntaxError
Regexp.new('', Regexp::NOENCODING) # => RegexpError
re = Regexp.new('', nil, 'n') # => //
re.options == Regexp::NOENCODING # => true
re.encoding # => ASCII-8BIT
re.source.encoding # => UTF-8
re =~ '' # => Encoding::CompatibilityError
```
## Code
[Here](https://github.com/ruby/ruby/blob/b41de3a1e8c36a5cc336b6f7cd3cb71126cf1a60/re.c#L3622-L3658). There is also a test for the resulting encoding [here](https://github.com/ruby/ruby/blob/cf2bbcfff2985c116552967c7c4522f4630f2d18/test/ruby/test_regexp.rb#L564), but it is a no-op because the whole file is set to that encoding via magic comment anyway.
The third argument was added when ASCII was still the default Ruby encoding, so I guess Regexp and source encoding still matched at that point.
## Solution
It could be fixed, but my impression is that it is not useful anymore.
It was probably only added because `Regexp::NOENCODING` wasn't available at the time, so I think it could be deprecated like so:
> Passing a third argument to Regexp.new is deprecated. Use `Regexp::NOENCODING` as second argument instead.
--
https://bugs.ruby-lang.org/
Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>