From: psychoslave@... Date: 2020-05-31T16:00:08+00:00 Subject: [ruby-core:98600] [Ruby master Bug#16927] String.tr won't return the expected result for some sign with diacritics Issue #16927 has been reported by psychoslave (mathieu lovato stumpf guntz). ---------------------------------------- Bug #16927: String.tr won't return the expected result for some sign with diacritics https://bugs.ruby-lang.org/issues/16927 * Author: psychoslave (mathieu lovato stumpf guntz) * Status: Open * Priority: Normal * ruby -v: ruby 2.7.0p0 (2019-12-25 revision 647ee6f091) [x86_64-linux] * Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN ---------------------------------------- # Context Not much interest for the bug here, but I always appreciate to be given more context. So, as part of a larger project, I came with the need to be able to utter every number from zero to 255 with a single syllable written as a consonnant-vowel-consonnant (CVC) in IPA. To avoid any ambiguity, the nomenclature also had to avoid collision with existing numerical terms, like six and ten, but for all languages for which documentation was found. As it was not enough nerdy, I came with the idea to mark with diacritics primes and congruence with 2, 8, 12, 16 (optional and without intended phonological alteration though). If you are curious about it, you can look at [the algorithm](https://gitlab.com/psychoslave/isotopy/-/blob/master/tool/combinations/trigrams.rb) I used to build the nomenclature matching [the specification](https://gitlab.com/psychoslave/isotopy/-/issues/4). # Code to reproduce the bug ``` ruby #!/bin/env ruby translated = 'aeiou'.tr('aeiou', '��������������������') substitued = 'aeiou'.sub(/aeiou/, '��������������������') puts `ruby -v`, translated == substitued, translated, substitued # Actual result ``` On my box this outputs: ``` ruby 2.7.0p0 (2019-12-25 revision 647ee6f091) [x86_64-linux] false ���������� �������������������� ``` # Expected result `tr` should return a congruent result: either it should fail for all signs with similar diacritics, or (preferably) return the specified Unicode glyph. That is, in the code above `translated == substitued` should be true. # Remarks I am not a Unicode Guru: maybe the missing signs generating the difference comes from the way they are encoded. I am aware that some glyphs come in duplicates, as solo code points vs. combining code point sequences. However I'm not able to precise if the above code use a mixture of both. -- https://bugs.ruby-lang.org/ Unsubscribe: