From: "mame (Yusuke Endoh)" Date: 2022-02-21T03:34:03+00:00 Subject: [ruby-core:107677] [Ruby master Bug#18590] String#downcase and CAPITAL LETTER I WITH DOT ABOVE Issue #18590 has been updated by mame (Yusuke Endoh). Assignee set to duerst (Martin D��rst) Status changed from Open to Assigned The document of Unicode case folding (http://www.unicode.org/Public/UCD/latest/ucd/CaseFolding.txt) says: ``` 0130; F; 0069 0307; # LATIN CAPITAL LETTER I WITH DOT ABOVE 0130; T; 0069; # LATIN CAPITAL LETTER I WITH DOT ABOVE ``` "F" is for "full case folding", and "T" is for "Turkic languages". String#downcase uses full Unicode case mapping by default (See https://docs.ruby-lang.org/en/3.0/String.html#method-i-downcase). You can get the result you expected by `:turkic` option. ``` '��'.downcase(:turkic).chars => ["i"] ``` ---------------------------------------- Bug #18590: String#downcase and CAPITAL LETTER I WITH DOT ABOVE https://bugs.ruby-lang.org/issues/18590#change-96596 * Author: andrykonchin (Andrew Konchin) * Status: Assigned * Priority: Normal * Assignee: duerst (Martin D��rst) * ruby -v: 3.1.0p0 * Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN ---------------------------------------- Downcasing for "��" character works in an unexpected way: ```ruby '��'.downcase => "i��" ``` Expected result - downcasing should return "i". Instead, it returns small "i" and additional "dot" character: ```ruby '��'.downcase.chars => ["i", "��"] ``` According to the standard Unicode case mapping character '��'(0130) maps to lowercased 'i' (0069). ``` 0130;LATIN CAPITAL LETTER I WITH DOT ABOVE;Lu;0;L;0049 0307;;;;N;LATIN CAPITAL LETTER I DOT;;;0069; ``` https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt -- https://bugs.ruby-lang.org/ Unsubscribe: