[#108771] [Ruby master Bug#18816] Ractor segfaulting MacOS 12.4 (aarch64 / M1 processor) — "brodock (Gabriel Mazetto)" <noreply@...>

Issue #18816 has been reported by brodock (Gabriel Mazetto).

8 messages 2022/06/05

[#108802] [Ruby master Feature#18821] Expose Pattern Matching interfaces in core classes — "baweaver (Brandon Weaver)" <noreply@...>

Issue #18821 has been reported by baweaver (Brandon Weaver).

9 messages 2022/06/08

[#108822] [Ruby master Feature#18822] Ruby lack a proper method to percent-encode strings for URIs (RFC 3986) — "byroot (Jean Boussier)" <noreply@...>

Issue #18822 has been reported by byroot (Jean Boussier).

18 messages 2022/06/09

[#108937] [Ruby master Bug#18832] Suspicious superclass mismatch — "fxn (Xavier Noria)" <noreply@...>

Issue #18832 has been reported by fxn (Xavier Noria).

16 messages 2022/06/15

[#108976] [Ruby master Misc#18836] DevMeeting-2022-07-21 — "mame (Yusuke Endoh)" <noreply@...>

Issue #18836 has been reported by mame (Yusuke Endoh).

12 messages 2022/06/17

[#109043] [Ruby master Bug#18876] OpenSSL is not available with `--with-openssl-dir` — "Gloomy_meng (Gloomy Meng)" <noreply@...>

Issue #18876 has been reported by Gloomy_meng (Gloomy Meng).

18 messages 2022/06/23

[#109052] [Ruby master Bug#18878] parse.y: Foo::Bar {} is inconsistently rejected — "qnighy (Masaki Hara)" <noreply@...>

Issue #18878 has been reported by qnighy (Masaki Hara).

9 messages 2022/06/26

[#109055] [Ruby master Bug#18881] IO#read_nonblock raises IOError when called following buffered character IO — "javanthropus (Jeremy Bopp)" <noreply@...>

Issue #18881 has been reported by javanthropus (Jeremy Bopp).

9 messages 2022/06/26

[#109063] [Ruby master Bug#18882] File.read cuts off a text file with special characters when reading it on MS Windows — magynhard <noreply@...>

Issue #18882 has been reported by magynhard (Matth辰us Johannes Beyrle).

15 messages 2022/06/27

[#109081] [Ruby master Feature#18885] Long lived fork advisory API (potential Copy on Write optimizations) — "byroot (Jean Boussier)" <noreply@...>

Issue #18885 has been reported by byroot (Jean Boussier).

23 messages 2022/06/28

[#109083] [Ruby master Bug#18886] Struct aref and aset don't trigger any tracepoints. — "ioquatix (Samuel Williams)" <noreply@...>

Issue #18886 has been reported by ioquatix (Samuel Williams).

8 messages 2022/06/29

[#109095] [Ruby master Misc#18888] Migrate ruby-lang.org mail services to Google Domains and Google Workspace — "shugo (Shugo Maeda)" <noreply@...>

Issue #18888 has been reported by shugo (Shugo Maeda).

16 messages 2022/06/30

[ruby-core:109025] [Ruby master Feature#18838] Avoid swallowing regexp escapes in the lexer

From: "jeremyevans0 (Jeremy Evans)" <noreply@...>
Date: 2022-06-20 22:11:56 UTC
List: ruby-core #109025
Issue #18838 has been updated by jeremyevans0 (Jeremy Evans).

Backport deleted (2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN)
ruby -v deleted (3.0.3)
Subject changed from Regexp#source behaves inconsistently with / to Avoid swallowing regexp escapes in the lexer
Tracker changed from Bug to Feature

In the `/\//` and `%r/\//` cases, the regexp source is transformed from `\/` to `/` in the lexer (`tokadd_string`) before it even hits the parser, let alone the regexp engine.  From a regexp perspective, `/\//` and `%r/\//` are treated as `Regexp.new('/')`, and `%r{\/}` as `Regexp.new('\/')`.

Regexp#source should provide the source of the regexp, not necessarily the source as given in the source code.  The statement `escape sequences are retained as is` refers to Regexp escape sequences, and the `\` in `/\//` and `%r/\//` is not a regexp escape sequence, but a lexer escape sequence (similar to `%/\//` or `%s/\//`).  This issue is not related to `/` specifically, it occurs for most terminators: `%r,\,,.source # => ","`

Note that in cases where escaping would actually change regexp behavior, the lexer doesn't swallow the escape character: `%r$\$$.source # => "\\$"`

It's fairly simple to remove this behavior from the lexer just by deleting code:

```diff
diff --git a/parse.y b/parse.y
index 167f064b31..523d5a85b3 100644
--- a/parse.y
+++ b/parse.y
@@ -7130,19 +7130,6 @@ tokadd_mbchar(struct parser_params *p, int c)
     return c;
 }

-static inline int
-simple_re_meta(int c)
-{
-    switch (c) {
-      case '$': case '*': case '+': case '.':
-      case '?': case '^': case '|':
-      case ')': case ']': case '}': case '>':
-       return TRUE;
-      default:
-       return FALSE;
-    }
-}
-
 static int
 parser_update_heredoc_indent(struct parser_params *p, int c)
 {
@@ -7277,10 +7264,6 @@ tokadd_string(struct parser_params *p,
                       }
                     }

-                   if (c == term && !simple_re_meta(c)) {
-                       tokadd(p, c);
-                       continue;
-                   }
                    pushback(p, c);
                    if ((c = tokadd_escape(p, enc)) < 0)
                        return -1;
```

However, it breaks 3 tests in `test_regexp.rb`: `test_source_unescaped`, `test_source`, and `test_equal`. It also breaks a couple of specs, `Literal Regexps supports escaping characters when used as a terminator` and `Regexp#source will remove escape characters`.

Since the current behavior is clearly by design in the tests and specs, I can safely conclude this is not a bug, or at most, it is a minor documentation bug (I'll update the documentation).  Switching to feature request.  I'll add this the list of tickets to review at the next developer meeting, since while I'm not in favor of making the change, I do think this issue warrants discussion.

----------------------------------------
Feature #18838: Avoid swallowing regexp escapes in the lexer
https://bugs.ruby-lang.org/issues/18838#change-98139

* Author: andrykonchin (Andrew Konchin)
* Status: Open
* Priority: Normal
----------------------------------------
According to `Regexp#source` documentation:

```
Returns the original string of the pattern.
/ab+c/ix.source #=> "ab+c"

Note that escape sequences are retained as is.
/\x20\+/.source  #=> "\\x20\\+"
```

It works well but backslash (/) is processed in different way by different regexp literal forms.

Examples:

```ruby
/\//.source # => "/"
%r/\//.source # => "/"
%r{\/}.source # => "\\/"
```

Expected result - in all the cases result is the same.

Moreover as documentation states - `escape sequences are retained as is`. So I would say that only `%r{}` works properly.

The issue was reported here https://github.com/oracle/truffleruby/issues/2569.



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

In This Thread