From: Yui NARUSE <naruse@...>
Date: 2012-01-10T18:07:13+09:00
Subject: [ruby-core:42029] [ruby-trunk - Bug #5871] regexp \W matches some word characters when inside a case-insensitive character class


Issue #5871 has been updated by Yui NARUSE.


Martin D��rst wrote:
> Shohei Urabe writes:
> 
> > Martin D��rst wrote:
> > > Shouhei Urabe writes:
> > > 
> > > > Quite generally speaking you are advised not to use /i in Unicode.
> > > 
> > > Are there other examples where /i is advised against? If yes, please let's look at them and try to fix them, too.
> > 
> > /D��kstra/i.match("DIJKSTRA") or something like that.
> 
> What about /D��kstra/.match("Dijkstra") ?
> $ ruby -e "puts /D\u0133kstra/.match('Dijkstra').inspect"
> nil

It is not an issue of case equivalence.

> If this doesn't match without case equivalence, why should it match with case equivalence?
> (I'm assuming that matching is transitive and that matching by /i should be a superset of matching without.)

irb(main):005:0> /[^a-z]/=~"A"
=> 0
irb(main):006:0> /[^a-z]/i=~"A"
=> nil
----------------------------------------
Bug #5871: regexp \W matches some word characters when inside a case-insensitive character class
https://bugs.ruby-lang.org/issues/5871

Author: Gareth Adams
Status: Rejected
Priority: Normal
Assignee: 
Category: 
Target version: 
ruby -v: ruby 1.9.2p290 (2011-07-09 revision 32553) [x86_64-darwin10.8.0]


=begin
The following replacement, which should do nothing, has removed the upper- and lower-case "K"s and "S"s from the result:

    > "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz".gsub(/[\W]/i,"")
    => "ABCDEFGHIJLMNOPQRTUVWXYZabcdefghijlmnopqrtuvwxyz"

The result is correct (the same as the input string) if I remove either the character class:
 
    > "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz".gsub(/\W/i,"")
    => "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" 

or the case insensitive flag:

    > "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz".gsub(/[\W]/,"")
    => "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"

This has been observed in two separate ruby 1.9 installs:

* ruby 1.9.2p290 (2011-07-09 revision 32553) [x86_64-darwin10.8.0]
* ruby 1.9.3p0 (2011-10-30 revision 33570) [x86_64-darwin11.2.0]
  
but works correctly in 1.8
=end



-- 
http://bugs.ruby-lang.org/