From: Run Paint Run Run <redmine@...> Date: 2009-08-05T15:34:30+09:00 Subject: [ruby-core:24775] [Feature #1889] Teach Onigurma Unicode 5.0 Character Properties Feature #1889: Teach Onigurma Unicode 5.0 Character Properties http://redmine.ruby-lang.org/issues/show/1889 Author: Run Paint Run Run Status: Open, Priority: Low Category: M17N Onigurma understands named category properties such that >> 0x012c.chr('utf-8') => "��" >> 0x012c.chr('utf-8') =~ /\p{Lu}/ => 0 By my reckoning there are about 3,000 characters in the current UnicodeData.txt that it doesn't have property mappings for. For example: U+AA59 (CHAM DIGIT NINE) is in the Nd category (http://unicode.org/cldr/utility/character.jsp?a=AA59) yet: >> puts 0xaa59.chr('utf-8') ��� => nil >> 0xaa59.chr('utf-8') =~ /\p{Nd}/ => nil I've attached two patches for the two categories I've updated in the hope that somebody familiar with the code can either tell me I'm on the right track, or explain a better approach. :-) If they look OK I'll try adding the remainder. (The diffs are a bit noisy because I tried to retain the original ordering and layout of the code). ---------------------------------------- http://redmine.ruby-lang.org