From: Jani Patokallio <jpatokal@...> Date: 2011-11-29T13:10:44+09:00 Subject: [ruby-core:41386] [ruby-trunk - Bug #5685][Open] Oniguruma does not recognize U+30FC as Katakana Issue #5685 has been reported by Jani Patokallio. ---------------------------------------- Bug #5685: Oniguruma does not recognize U+30FC as Katakana http://redmine.ruby-lang.org/issues/5685 Author: Jani Patokallio Status: Open Priority: Normal Assignee: Category: Target version: 1.9.3 ruby -v: ruby 1.9.3dev (2011-09-23 revision 33323) [x86_64-darwin10.8.0] The character U+30FC KATAKANA-HIRAGANA PROLONGED SOUND MARK (Japanese choonpu) belongs to the Unicode Katakana block (U+30A0-30FF), but it is not matched by /\p{Katakana}/. Demonstration: "������������������������������������������������������".gsub(/(\p{Katakana}|\p{Hiragana}|\p{Han})+/, 'X') => "X���X" In other words, all kana and kanji in that string except U+30FC are matched. And it really is 30FC/12540: "������������������������������������������������������".gsub(/(\p{Katakana}|\p{Hiragana}|\p{Han})+/, '').unpack("U*") => [12540] Also occurs in Ruby 1.8 with the Oniguruma library. -- http://redmine.ruby-lang.org