[#104169] [Ruby master Feature#17938] Keyword alternative for boolean positional arguments — matheusrichardt@...

Issue #17938 has been reported by matheusrich (Matheus Richard).

12 messages 2021/06/04

[#104213] [Ruby master Feature#17942] Add a `initialize(public @a, private @b)` shortcut syntax for defining public/private accessors for instance vars — tyler@...

Issue #17942 has been reported by TylerRick (Tyler Rick).

6 messages 2021/06/09

[#104288] [Ruby master Bug#17992] Upstreaming the htmlentities gem into CGI#.(un)escape_html — alexandermomchilov@...

Issue #17992 has been reported by AMomchilov (Alexander Momchilov).

9 messages 2021/06/15

[#104338] [Ruby master Misc#17997] DevelopersMeeting20210715Japan — mame@...

Issue #17997 has been reported by mame (Yusuke Endoh).

10 messages 2021/06/17

[#104361] [Ruby master Bug#18000] have_library doesn't work when ruby is compiled with --disable-shared --disable-install-static-library — jean.boussier@...

Issue #18000 has been reported by byroot (Jean Boussier).

9 messages 2021/06/18

[#104401] [Ruby master Feature#18007] Help developers of C extensions meet requirements in "doc/extension.rdoc" — mike.dalessio@...

Issue #18007 has been reported by mdalessio (Mike Dalessio).

16 messages 2021/06/25

[#104430] [Ruby master Bug#18011] `Method#parameters` is incorrect for forwarded arguments — josh.cheek@...

Issue #18011 has been reported by josh.cheek (Josh Cheek).

12 messages 2021/06/29

[ruby-core:104440] [Ruby master Bug#18013] Unexpected results when mxiing negated character classes and case-folding

From: jiri.marsik@...
Date: 2021-06-29 12:05:15 UTC
List: ruby-core #104440
Issue #18013 has been updated by jirkamarsik (Jirka Marsik).


duerst (Martin D端rst) wrote in #note-2:
> Just a question: What's the purpose of nested character classes?

They are useful in combination with the set intersection operator `&&`. They let you, e.g., exclude characters from some character set, as in the example below, which considers all lowercase-letters except for the English vowels `aeiou`.

```
irb(main):001:0> /[\p{Ll}&&[^aeiou]]/u.match("a")
=> nil
irb(main):002:0> /[\p{Ll}&&[^aeiou]]/u.match("b")
=> #<MatchData "b">
irb(main):003:0> /[\p{Ll}&&[^aeiou]]/u.match(".")
=> nil
irb(main):004:0> /[\p{Ll}&&[^aeiou]]/u.match("留")
=> #<MatchData "留">
``` 


----------------------------------------
Bug #18013: Unexpected results when mxiing negated character classes and case-folding
https://bugs.ruby-lang.org/issues/18013#change-92692

* Author: jirkamarsik (Jirka Marsik)
* Status: Open
* Priority: Normal
* ruby -v: ruby 3.0.1p64 (2021-04-05 revision 0fb782ee38) [x86_64-linux]
* Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN
----------------------------------------
```
irb(main):001:0> /[^a-c]/i.match("A")
=> nil
irb(main):002:0> /[[^a-c]]/i.match("A")
=> #<MatchData "A">
```

The two regular expressions above match different strings, because the character classes denote different sets of characters. In order for `/[^a-c]/i` to produce correct results, Oniguruma provided a fix that can still be easily seen in the code as it is hidden behind an always-on preprocessor flag (`CASE_FOLD_IS_APPLIED_INSIDE_NEGATIVE_CCLASS`, https://github.com/ruby/ruby/blob/9eae8cdefba61e9e51feb30a4b98525593169666/regparse.c#L5528). The idea of the fix is to first case-fold a character class and only then apply the negation (essentially moving the case-fold operator *inside* the negation).

In the case of our first regular expression, `[a-c]` is case-folded into `[a-cA-C]` and that is then inverted into `[^a-cA-C]`, which is the expected result. However, this case-folding logic is currently only being applied to the top-most character class and so if we use a nested negated character class, the order of the operations will be switched.

With our second regular expression, `[a-c]` will first be negated to yield `[^a-c]`, which will then be case-folded into `.`, the set of all characters (since `[^a-c]` contains `A-C`, which case-fold into `a-c`).

A way to fix this would be to apply case-folding for nested character classes as well, so that the nested character classes behave the same as the top-most character class. Then, we would get the same semantics for both expressions.



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

In This Thread

Prev Next