From: Run Paint Run Run <redmine@...>
Date: 2009-08-05T15:34:30+09:00
Subject: [ruby-core:24775] [Feature #1889] Teach Onigurma Unicode 5.0 Character Properties

Feature #1889: Teach Onigurma Unicode 5.0 Character Properties
http://redmine.ruby-lang.org/issues/show/1889

Author: Run Paint Run Run
Status: Open, Priority: Low
Category: M17N

Onigurma understands named category properties such that 

  >> 0x012c.chr('utf-8')
  => "��"
  >> 0x012c.chr('utf-8') =~ /\p{Lu}/
  => 0

By my reckoning there are about 3,000 characters in the current UnicodeData.txt that it doesn't have property mappings for. For example: U+AA59 (CHAM DIGIT NINE) is in the Nd category (http://unicode.org/cldr/utility/character.jsp?a=AA59) yet:

  >> puts 0xaa59.chr('utf-8')
  ���
  => nil
  >> 0xaa59.chr('utf-8') =~ /\p{Nd}/
  => nil

I've attached two patches for the two categories I've updated in the hope that somebody familiar with the code can either tell me I'm on the right track, or explain a better approach. :-) If they look OK I'll try adding the remainder. 

(The diffs are a bit noisy because I tried to retain the original ordering and layout of the code).


----------------------------------------
http://redmine.ruby-lang.org