From: duerst@... Date: 2020-12-17T06:56:01+00:00 Subject: [ruby-core:101494] [Ruby master Bug#17400] Incorrect character downcase for Greek Sigma Issue #17400 has been updated by duerst (Martin D��rst). I have to acknowledge that I 'cut some corners'. It's essentially table 3.17 on p. 151/2 of the Unicode Standard (see https://www.unicode.org/versions/Unicode13.0.0/ch03.pdf). The problem from the implementation side is that it requires context, of possibly unlimited length. The context before the character is somewhat easier to handle ('just' need a little state machine) than the context after the character (which needs lookahead). Another potential problem is that programs using downcase (and capitalize and swapcase) may not give all the necessary context, because they may do this operation in pieces. But that's their problem. The problem from the user side is that it isn't (and can't be made) perfect, as e.g. the example in https://www.unicode.org/versions/Unicode13.0.0/ch03.pdf shows. I seem to remember that John Cowan also gave another example, where a final sigma (��) appeared in the middle of a Greek word, at the boundary between two components. I haven't found that example in my archives, but I may get back to John and ask him again. But using final sigma in whatever Unicode defines as the appropriate context is definitely much closer to what the user may want. I'll try to think about how to improve our implementation, but can't promise to get to it before February, sorry. ---------------------------------------- Bug #17400: Incorrect character downcase for Greek Sigma https://bugs.ruby-lang.org/issues/17400#change-89273 * Author: xfalcox (Rafael Silva) * Status: Open * Priority: Normal * Assignee: duerst (Martin D��rst) * ruby -v: ruby 3.0.0dev (2020-12-16T18:46:44Z master 93ba3ac036) [x86_64-linux] * Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN ---------------------------------------- An issue caused by this bug was first reported at Discourse support community at https://meta.discourse.org/t/unicode-username-results-in-error-loading-profile-page/173182?u=falco. The issue is that in Greek, there are two ways to downcase the letter �������� - �������� when it is used at the end of a word - �������� anywhere else NodeJS follows this rule: ``` ��� node Welcome to Node.js v12.11.1. Type ".help" for more information. > "������������".toLowerCase() '������������' ``` Python too: ``` ��� python Python 3.8.2 (default, Nov 23 2020, 16:33:30) [GCC 10.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> "������������".lower() '������������' ``` Ruby (both 2.7 and 3) doesn't: ``` ��� ruby --version ruby 3.0.0dev (2020-12-16T18:46:44Z master 93ba3ac036) [x86_64-linux] ��� irb irb(main):001:0> "������������".downcase => "������������" ``` ``` ��� ruby --version ruby 2.7.1p83 (2020-03-31 revision a0c7c23c9c) [x86_64-linux] ��� irb irb(main):001:0> "������������".downcase => "������������" ``` -- https://bugs.ruby-lang.org/ Unsubscribe: