[ruby-core:123148] [Ruby Bug#21559] Unicode normalization nfd -> nfc -> nfd is not reversible
From:
"ima1zumi (Mari Imaizumi) via ruby-core" <ruby-core@...>
Date:
2025-09-01 00:50:27 UTC
List:
ruby-core #123148
Issue #21559 has been updated by ima1zumi (Mari Imaizumi).
Assignee set to ima1zumi (Mari Imaizumi)
This looks like a bug. Per Unicode TR15, the identity toNFD(x) == toNFD(toNFC(x)) must be maintained. https://unicode.org/reports/tr15/#Design_Goals
It seems the NFC process is combining characters across U+11930, even though its CCC is 0.
CC: @duerst
----------------------------------------
Bug #21559: Unicode normalization nfd -> nfc -> nfd is not reversible
https://bugs.ruby-lang.org/issues/21559#change-114480
* Author: tompng (tomoya ishida)
* Status: Open
* Assignee: ima1zumi (Mari Imaizumi)
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
I expect `nfd(nfc(str)) == nfd(str)` but found a string that doesn't.
~~~ruby
# Ruby 3.1 - 3.5
str = "s\u{11930}\u{323}\u{11930}\u{307}"
p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd)
#=> false
~~~
~~~ruby
# ruby 3.5.0dev
str = "s\u{1611e}\u{323}\u{1611e}\u{307}\u{1611f}"
p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd)
#=> false
~~~
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/