From: "phluid61 (Matthew Kerwin)" Date: 2012-12-19T08:25:55+09:00 Subject: [ruby-core:50970] [ruby-trunk - Bug #4044] Regex matching errors when using \W character class and /i option Issue #4044 has been updated by phluid61 (Matthew Kerwin). ben_h (Ben Hoskings) wrote: > But, I'm not sure how [^\W] should treat these characters: > 0x00DF (Latin small letter sharp s) > 0x017F (Latin small letter long s) > 0x212A (Kelvin sign) Can you just fall back on the Unicode categories? If we define "word characters" as Letters and Numbers, U+212A is {Lu} and thus a word character. Similary U+017F is {Ll}. Seems a bit weird in the case of Kelvin (also the Angstrom Sign U+212B = {Lu}) but at least Unicode is a fixed and universally accessible standard. ---------------------------------------- Bug #4044: Regex matching errors when using \W character class and /i option https://bugs.ruby-lang.org/issues/4044#change-34836 Author: ben_h (Ben Hoskings) Status: Feedback Priority: Normal Assignee: naruse (Yui NARUSE) Category: core Target version: 1.9.2 ruby -v: ruby 1.9.2p0 (2010-08-18 revision 29036) [x86_64-darwin10.4.0] =begin Hi all, Josh Bassett and I just discovered an issue with regex matches on ruby-1.9.2p0. (We reduced it while we were hacking on gemcutter.) The case-insensitive (/i) option together with the non-word character class (\W) match inconsistently against the alphabet. Specifically the regex doesn't match properly against the letters 'k' and 's'. The following expression demonstrates the problem in irb: puts ('a'..'z').to_a.map {|c| [c, c.ord, c[/[^\W]/i] ].inspect } As a reference, the following two expressions are working properly: puts ('a'..'z').to_a.map {|c| [c, c.ord, c[/[^\W]/] ].inspect } puts ('a'..'z').to_a.map {|c| [c, c.ord, c[/[\w]/i] ].inspect } Cheers Ben Hoskings & Josh Bassett =end -- http://bugs.ruby-lang.org/