From: "zeke (Zeke Gabrielse)" Date: 2022-04-27T15:00:05+00:00 Subject: [ruby-core:108417] [Ruby master Feature#18757] Introduce %R for anchored regular expression patterns Issue #18757 has been updated by zeke (Zeke Gabrielse). Description updated Fix `validates_format` pattern ---------------------------------------- Feature #18757: Introduce %R for anchored regular expression patterns https://bugs.ruby-lang.org/issues/18757#change-97449 * Author: zeke (Zeke Gabrielse) * Status: Open * Priority: Normal ---------------------------------------- When defining regular expression patterns, it's often the case that you want to anchor with `\A` and `\z` to match the full text input, rather than `^` and `$`, respectively, which may (unintentionally) match text including newlines. This is especially true in the context of an web application such as a Rails app. Unfortunately, `\A` and `\z` reduce the legibility of a regular expression. For example, take this `ActionMailbox` usage: ```ruby class ApplicationMailbox < ActionMailbox::Base routing %r{\Areplies\+.*?@ruby-lang\.org\z}i => :replies routing %r{\Asales@.*?\z}i => :leads end ``` At first glance, it may look as if the second route matches `Asales`, but that's not the case upon further inspection. To improve legibility, a developer may choose to use `^` instead of `\A`. Because when defining a pattern using `\A` and `\z`, readability suffers, but especially for `\A`. In other cases, developers forget to use `\A` and `\z` over `^` or `$` when validating or matching against user input. I propose Ruby introduces a new percent-notation, `%R{}`, for defining interpolated regular expression patterns that automatically anchor a pattern with `\A` and `\z`. For example, the above will look like below: ```ruby class ApplicationMailbox < ActionMailbox::Base routing %R{replies\+.*?@ruby-lang\.org}i => :replies routing %R{sales@.*?}i => :leads end ``` This is much more readable, and it's safer ��� developers using `%R{}` are not going to accidentally use `^` or `$` instead of `\A` and `\z`, respectively (the former being vulnerable to matching input data containing newlines). This is especially useful in pattern matching data where some values may be a symbol or a string, depending on where the data originated (internally vs externally): ```ruby data = { type: :foo, id: 1 } # Could also be: { type: 'foo', id: 1 } case data in type: %R(foo), id: # ... else end ``` Formally, the new anchored regex percent notation would work as follows: ```ruby re = %R(test) # => /\Atest\z/ re.match?('test') # => true re.match?('testing') # => false re.match?('a test') # => false re.match?(:test) # => true re.match?(:testing) # => false re.match?(:a_test) # => false ``` This would also be useful for data validation purposes, where a developer could clean up patterns that previously used regular expressions with `\A...\z` and `^...$`, such as with Rails model validations, e.g. `validates_format(with: %R{[-a-z0-9]+)` I do understand that having an uppercase `%R` behaves differently than other percent notations (i.e. lowercase is typically non-interpolated, uppercase interpolated), but since `%r` already allows interpolation, I figured it was okay to be a bit different. Regardless ��� I'm open to other syntax suggestions. -- https://bugs.ruby-lang.org/ Unsubscribe: