[#104169] [Ruby master Feature#17938] Keyword alternative for boolean positional arguments — matheusrichardt@...

Issue #17938 has been reported by matheusrich (Matheus Richard).

12 messages 2021/06/04

[#104213] [Ruby master Feature#17942] Add a `initialize(public @a, private @b)` shortcut syntax for defining public/private accessors for instance vars — tyler@...

Issue #17942 has been reported by TylerRick (Tyler Rick).

6 messages 2021/06/09

[#104288] [Ruby master Bug#17992] Upstreaming the htmlentities gem into CGI#.(un)escape_html — alexandermomchilov@...

Issue #17992 has been reported by AMomchilov (Alexander Momchilov).

9 messages 2021/06/15

[#104338] [Ruby master Misc#17997] DevelopersMeeting20210715Japan — mame@...

Issue #17997 has been reported by mame (Yusuke Endoh).

10 messages 2021/06/17

[#104361] [Ruby master Bug#18000] have_library doesn't work when ruby is compiled with --disable-shared --disable-install-static-library — jean.boussier@...

Issue #18000 has been reported by byroot (Jean Boussier).

9 messages 2021/06/18

[#104401] [Ruby master Feature#18007] Help developers of C extensions meet requirements in "doc/extension.rdoc" — mike.dalessio@...

Issue #18007 has been reported by mdalessio (Mike Dalessio).

16 messages 2021/06/25

[#104430] [Ruby master Bug#18011] `Method#parameters` is incorrect for forwarded arguments — josh.cheek@...

Issue #18011 has been reported by josh.cheek (Josh Cheek).

12 messages 2021/06/29

[ruby-core:104435] [Ruby master Bug#18012] Case-insensitive character classes can only match multiple code points when top-level character class is not negated

From: jiri.marsik@...
Date: 2021-06-29 08:35:07 UTC
List: ruby-core #104435
Issue #18012 has been reported by jirkamarsik (Jirka Marsik).

----------------------------------------
Bug #18012: Case-insensitive character classes can only match multiple code points when top-level character class is not negated
https://bugs.ruby-lang.org/issues/18012

* Author: jirkamarsik (Jirka Marsik)
* Status: Open
* Priority: Normal
* ruby -v: ruby 3.0.1p64 (2021-04-05 revision 0fb782ee38) [x86_64-linux]
* Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN
----------------------------------------
Some Unicode characters case-fold to strings of multiple code points, e.g. the ligature `\ufb00` can match the string `ff`.

```
irb(main):001:0> /\A[\ufb00]\z/i.match("\ufb00")
=> #<MatchData "ff">
irb(main):002:0> /\A[\ufb00]\z/i.match("ff")
=> #<MatchData "ff">
```

As expected, when we negate this character class, we can no longer match neither the ligature character `\ufb00` nor the string `ff`.

```
irb(main):003:0> /\A[^\ufb00]\z/i.match("\ufb00")
=> nil
irb(main):004:0> /\A[^\ufb00]\z/i.match("ff")
=> nil
```

Then, when we add a second negation, the `\ufb00` ligature reappears in the character set but the string `ff` is no longer accepted.

```
irb(main):005:0> /\A[^[^\ufb00]]\z/i.match("\ufb00")
=> #<MatchData "ff">
irb(main):006:0> /\A[^[^\ufb00]]\z/i.match("ff")
=> nil
```

This reveals that the multi-code-point matches in character classes are blocked by negation. However, this is implemented only by checking whether the topmost character class is negated. If we wrap the character class in another set of brackets, the semantics change.

```
irb(main):007:0> /\A[[^[^\ufb00]]]\z/i.match("\ufb00")
=> #<MatchData "ff">
irb(main):008:0> /\A[[^[^\ufb00]]]\z/i.match("ff")
=> #<MatchData "ff">
```

The cause behind this discrepancy (the fact that `[^[^\ufb00]]` and `[[^[^\ufb00]]]` match different strings) is the extra `IS_NCCLASS_NOT` check in `i_apply_case_fold` (https://github.com/ruby/ruby/blob/9eae8cdefba61e9e51feb30a4b98525593169666/regparse.c#L5568).




-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

In This Thread

Prev Next