[#104169] [Ruby master Feature#17938] Keyword alternative for boolean positional arguments — matheusrichardt@...

Issue #17938 has been reported by matheusrich (Matheus Richard).

12 messages 2021/06/04

[#104213] [Ruby master Feature#17942] Add a `initialize(public @a, private @b)` shortcut syntax for defining public/private accessors for instance vars — tyler@...

Issue #17942 has been reported by TylerRick (Tyler Rick).

6 messages 2021/06/09

[#104288] [Ruby master Bug#17992] Upstreaming the htmlentities gem into CGI#.(un)escape_html — alexandermomchilov@...

Issue #17992 has been reported by AMomchilov (Alexander Momchilov).

9 messages 2021/06/15

[#104338] [Ruby master Misc#17997] DevelopersMeeting20210715Japan — mame@...

Issue #17997 has been reported by mame (Yusuke Endoh).

10 messages 2021/06/17

[#104361] [Ruby master Bug#18000] have_library doesn't work when ruby is compiled with --disable-shared --disable-install-static-library — jean.boussier@...

Issue #18000 has been reported by byroot (Jean Boussier).

9 messages 2021/06/18

[#104401] [Ruby master Feature#18007] Help developers of C extensions meet requirements in "doc/extension.rdoc" — mike.dalessio@...

Issue #18007 has been reported by mdalessio (Mike Dalessio).

16 messages 2021/06/25

[#104430] [Ruby master Bug#18011] `Method#parameters` is incorrect for forwarded arguments — josh.cheek@...

Issue #18011 has been reported by josh.cheek (Josh Cheek).

12 messages 2021/06/29

[ruby-core:104275] [Ruby master Bug#17989] Case insensitive Regexps do not handle characters with overlapping case foldings

From: jiri.marsik@...
Date: 2021-06-15 11:43:14 UTC
List: ruby-core #104275
Issue #17989 has been reported by jirkamarsik (Jirka Marsik).

----------------------------------------
Bug #17989: Case insensitive Regexps do not handle characters with overlapping case foldings
https://bugs.ruby-lang.org/issues/17989

* Author: jirkamarsik (Jirka Marsik)
* Status: Open
* Priority: Normal
* ruby -v: ruby 3.0.1p64 (2021-04-05 revision 0fb782ee38) [x86_64-linux]
* Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN
----------------------------------------
When a Regexp uses the case-insensitive flag, strings are compared by first case folding them and then comparing the case foldings for equality. When a literal string is encountered in a Regexp source, the pattern analyzer tries to enumerate all possible strings that would case fold to the same string as the string in the pattern. In this way, case folding can be avoided when the Regexp is used to match. However, the algorithm used to enumerate all the possible strings which case fold to the same string is not complete. It assumes that the case foldings of different characters do not overlap (i.e. the multi-character case folding of some character cannot be a prefix or suffix of the multi-character case folding of some other character). However, this is not the case for several Unicode characters.

In the code below, many of the equalities `A == B`, tested via `/A/i.match("B")`, do not hold. Those that do hold hold only because the number of case-equivalent strings detected by the analyzer crosses a threshold at which point the analyzer abandons this optimization.

```
/\ufb00/i.match("ff")  # LATIN SMALL LIGATURE FF
/\ufb01/i.match("fi")  # LATIN SMALL LIGATURE FI
/\ufb02/i.match("fl")  # LATIN SMALL LIGATURE FL
/\ufb03/i.match("ffi") # LATIN SMALL LIGATURE FFI
/\ufb04/i.match("ffl") # LATIN SMALL LIGATURE FFL

# (ff)i == (ffi)
/\ufb00i/i.match("\ufb03")
# (ffi) == (ff)i
/\ufb03/i.match("\ufb00i")
# f(fi) == (ffi)
/f\ufb01/i.match("\ufb03")
# (ffi) == f(fi)
/\ufb03/i.match("f\ufb01")

# (ff)l == (ffl)
/\ufb00l/i.match("\ufb04")
# (ffl) == (ff)l
/\ufb04/i.match("\ufb00l")
# f(fl) == (ffl)
/f\ufb02/i.match("\ufb04")
# (ffl) == f(fl)
/\ufb04/i.match("f\ufb02")


/\u1f50/i.match("\u03c5\u0313")       # GREEK SMALL LETTER UPSILON WITH PSILI
/\u1f52/i.match("\u03c5\u0313\u0300") # GREEK SMALL LETTER UPSILON WITH PSILI AND VARIA
/\u1f54/i.match("\u03c5\u0313\u0301") # GREEK SMALL LETTER UPSILON WITH PSILI AND OXIA
/\u1f56/i.match("\u03c5\u0313\u0342") # GREEK SMALL LETTER UPSILON WITH PSILI AND PERISPOMENI

# (upsilon psili) varia == (upsilon psili varia)
/\u1f50\u0300/i.match("\u1f52")
# (upsilon psili varia) == (upsilon psili) varia
/\u1f52/i.match("\u1f50\u0300")

# (upsilon psili) oxia == (upsilon psili oxia)
/\u1f50\u0301/i.match("\u1f54")
# (upsilon psili oxia) == (upsilon psili) oxia
/\u1f54/i.match("\u1f50\u0301")

# (upsilon psili) perispomeni == (upsilon psili perispomeni)
/\u1f50\u0342/i.match("\u1f56")
# (upsilon psili perispomeni) == (upsilon psili) perispomeni
/\u1f56/i.match("\u1f50\u0342")


/\u1fb6/i.match("\u03b1\u0342")       # GREEK SMALL LETTER ALPHA WITH PERISPOMENI
/\u1fb7/i.match("\u03b1\u0342\u03b9") # GREEK SMALL LETTER ALPHA WITH PERISPOMENI AND YPOGEGRAMMENI

# (alpha perispomeni) ypogegrammeni == (alpha perispomeni ypogegrammeni)
/\u1fb6\u03b9/i.match("\u1fb7")
# (alpha perispomeni ypogegrammeni) == (alpha perispomeni) ypogegrammeni
/\u1fb7/i.match("\u1fb6\u03b9")


/\u1fc6/i.match("\u03b7\u0342")       # GREEK SMALL LETTER ETA WITH PERISPOMENI
/\u1fc7/i.match("\u03b7\u0342\u03b9") # GREEK SMALL LETTER ETA WITH PERISPOMENI AND YPOGEGRAMMENI

# (eta perispomeni) ypogegrammeni == (eta perispomeni ypogegrammeni)
/\u1fc6\u03b9/i.match("\u1fc7")
# (eta perispomeni ypogegrammeni) == (eta perispomeni) ypogegrammeni
/\u1fc7/i.match("\u1fc6\u03b9")


/\u1ff6/i.match("\u03c9\u0342")       # GREEK SMALL LETTER OMEGA WITH PERISPOMENI
/\u1ff7/i.match("\u03c9\u0342\u03b9") # GREEK SMALL LETTER OMEGA WITH PERISPOMENI AND YPOGEGRAMMENI

# (omega perispomeni) ypogegrammeni == (omega perispomeni ypogegrammeni)
/\u1ff6\u03b9/i.match("\u1ff7")
# (omega perispomeni ypogegrammeni) == (omega perispomeni) ypogegrammeni
/\u1ff7/i.match("\u1ff6\u03b9")
```



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

In This Thread

Prev Next