From: duerst Date: 2022-02-23T08:17:52+00:00 Subject: [ruby-core:107731] [Ruby master Bug#18590] String#downcase and CAPITAL LETTER I WITH DOT ABOVE Issue #18590 has been updated by duerst (Martin D��rst). Status changed from Assigned to Closed andrykonchin (Andrew Konchin) wrote in #note-3: > Thank you for the suggestion. > > I am wondering whether `String#downcase` (when called without arguments) follows only Unicode case mapping rules (as stated in the [documentation]). Or also the folding ones? > > I would expect that a call of `String#downcase` without arguments uses the one-to-one case mapping rules, that are specified in the [UnicodeData.txt] file. It should use the mappings in https://www.unicode.org/Public/UCD/latest/ucd/SpecialCasing.txt. And that is 0069 0307 (i.e. 'i' followed by dot above) for '��'.downcase. > [documentation]: https://ruby-doc.org/core-3.0.0/String.html#method-i-downcase > [UnicodeData.txt]: https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt The data in UnicodeData is restricted to simple case mappings (i.e. mappings that don't change the length of the string in terms of number of codepoints). In Ruby, there is no need for such a restriction. See also https://www.sw.it.aoyama.ac.jp/2016/pub/RubyKaigi/, slide 23. I'm closing this, because it works as intended/described, as far as I can see. ---------------------------------------- Bug #18590: String#downcase and CAPITAL LETTER I WITH DOT ABOVE https://bugs.ruby-lang.org/issues/18590#change-96654 * Author: andrykonchin (Andrew Konchin) * Status: Closed * Priority: Normal * Assignee: duerst (Martin D��rst) * ruby -v: 3.1.0p0 * Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN ---------------------------------------- Downcasing for "��" character works in an unexpected way: ```ruby '��'.downcase => "i��" ``` Expected result - downcasing should return "i". Instead, it returns small "i" and additional "dot" character: ```ruby '��'.downcase.chars => ["i", "��"] ``` According to the standard Unicode case mapping character '��'(0130) maps to lowercased 'i' (0069). ``` 0130;LATIN CAPITAL LETTER I WITH DOT ABOVE;Lu;0;L;0049 0307;;;;N;LATIN CAPITAL LETTER I DOT;;;0069; ``` https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt -- https://bugs.ruby-lang.org/ Unsubscribe: