From: janosch-x Date: 2022-05-17T20:26:13+00:00 Subject: [ruby-core:108602] [Ruby master Feature#18788] Support passing Regexp options as String to Regexp.new Issue #18788 has been updated by janosch-x (Janosch M��ller). I agree that the use cases are fairly limited. They are not totally uncommon, though. `Regexp.new` is mostly needed for recursion on Ruby and metaprogramming, and is used this way e.g. in [rubocop](https://github.com/rubocop/rubocop-ast/blob/816dfe7f2ca4e92c7eda226a9e8b44aa9fa81e81/lib/rubocop/ast/node_pattern/lexer.rb#L21-L58), ruby_parser, and [parser](https://github.com/whitequark/parser/blob/09d681e534885f1aa22f0099089841ae9d86f847/lib/parser/builders/default.rb#L2224-L2242). Sometimes it is used to build Regexps based on input, e.g. in [capybara](https://github.com/teamcapybara/capybara/blob/922d6614762518c82bc53a3bc83b816a4beb186d/lib/capybara/queries/text_query.rb#L71-L72), prawn, and [psych](https://github.com/ruby/ruby/blob/c1a6ff046d4f27c972adf96f9a6724abc2f0647a/ext/psych/lib/psych/visitors/to_ruby.rb#L96-L111). There might also be a few CMSes that allow admins to type in validation patterns. Many people are also seemingly unaware that Regexp literals support interpolation, or maybe they just find this interpolation hard to read. Either way, they often use `Regexp.new` instead, passing it an interpolated or concatenated String or `Regexp.union` output, as can be seen e.g. in css_parser, haml, net-ssh, or [uri](https://github.com/ruby/ruby/blob/10ad81eb2d4bf44b5d5350e3ea28e6248f550128/lib/uri/rfc2396_parser.rb#L500-L506). (css_parser [actually uses the only coincidentally working "i"](https://github.com/premailer/css_parser/blob/23a8f8a4a7b96b0c0c93ea5e36ed101956444f8f/lib/css_parser/regexps.rb#L5) as a second argument.) In some other cases, `Regexp.new` is used to avoid a `SyntaxError` or a warning on older Rubies, e.g. sinatra does this. ---------------------------------------- Feature #18788: Support passing Regexp options as String to Regexp.new https://bugs.ruby-lang.org/issues/18788#change-97640 * Author: janosch-x (Janosch M��ller) * Status: Open * Priority: Normal ---------------------------------------- ## Current situation `Regexp.new` takes an integer as second argument which needs to be ORed together from multiple constants: ``` Regexp.new('foo', Regexp::IGNORECASE | Regexp::MULTILINE | Regexp::EXTENDED) # => /foo/imx ``` Any other non-nil value is treated as `i` flag: ``` Regexp.new('foo', Object.new) # => /foo/i ``` ## Suggestion `Regexp.new` should support passing the regexp flags not only as an Integer, but also as a String or Symbol, like so: ``` Regexp.new('foo', 'i') # => /foo/i Regexp.new('foo', :i) # => /foo/i Regexp.new('foo', 'imx') # => /foo/imx Regexp.new('foo', :imx) # => /foo/imx # edge cases Regexp.new('foo', 'iii') # => /foo/i Regexp.new('foo', :iii) # => /foo/i Regexp.new('foo', '') # => /foo/ Regexp.new('foo', :'') # => /foo/ # unsupported flags could be ignored - # or raise an ArgumentError to reveal changed behavior? Regexp.new('foo', 'jmq') # => /foo/m Regexp.new('foo', :jmq) # => /foo/m Regexp.new('foo', '-m') # => /foo/m Regexp.new('foo', :'-m') # => /foo/m ``` ## Reasons 1. The constants are a bit cumbersome to use, particularly when building the regexp from variable data: ``` def make_regexp(regexp_body, opt_string) opt_int = 0 opt_int |= Regexp::IGNORECASE if opt_string.include?('i') opt_int |= Regexp::MULTILINE if opt_string.include?('m') opt_int |= Regexp::EXTENDED if opt_string.include?('x') Regexp.new(regexp_body, opt_int) end ``` 2. Passing a String or Symbol is already silently accepted, and people might get the wrong impression that it works: ``` Regexp.new('foo', 'i') # => /foo/i Regexp.new('foo', :i) # => /foo/i ``` ... but it doesn't really work: ``` Regexp.new('foo', 'x') # => /foo/i Regexp.new('foo', :x) # => /foo/i ``` ## Backwards compatibility This change would not be fully backwards compatible. Code that relies on the second argument being either a String/Symbol or nil to decide whether the Regexp should be case insensitive would break (unless the String or Symbol contains "i"). I can't come up with a scenario where one would write such code, though - except maybe code golfing? -- https://bugs.ruby-lang.org/ Unsubscribe: