[#43120] [ruby-trunk - Bug #6124][Open] What is the purpose of "fake" gems in Ruby — Vit Ondruch <v.ondruch@...>

27 messages 2012/03/07

[#43142] Questions about thread performance (with benchmark included) — Rodrigo Rosenfeld Rosas <rr.rosas@...>

A while ago I've written an article entitled "How Nokogiri and JRuby

10 messages 2012/03/08

[#43148] [ruby-trunk - Feature #6126][Open] Introduce yes/no constants aliases for true/false — Egor Homakov <homakov@...>

16 messages 2012/03/09

[#43238] [ruby-trunk - Feature #6130][Open] inspect using to_s is pain — Thomas Sawyer <transfire@...>

21 messages 2012/03/11

[#43313] [ruby-trunk - Feature #6150][Open] add Enumerable#grep_v — Suraj Kurapati <sunaku@...>

17 messages 2012/03/15

[#43325] [ruby-trunk - Bug #6154][Open] Eliminate extending WaitReadable/Writable at runtime — Charles Nutter <headius@...>

25 messages 2012/03/16

[#43334] [ruby-trunk - Bug #6155][Open] Enumerable::Lazy#flat_map raises an exception when an element does not respond to #each — Dan Kubb <dan.kubb@...>

9 messages 2012/03/16

[#43370] [ruby-trunk - Feature #6166][Open] Enumerator::Lazy#pinch — Thomas Sawyer <transfire@...>

15 messages 2012/03/17

[#43373] [ruby-trunk - Bug #6168][Open] Segfault in OpenSSL bindings — Nguma Abojo <git.email.address@...>

14 messages 2012/03/17

[#43454] [ruby-trunk - Bug #6174][Open] Fix collision of ConditionVariable#wait timeout and #signal (+ other cosmetic changes) — "funny_falcon (Yura Sokolov)" <funny.falcon@...>

10 messages 2012/03/18

[#43497] [ruby-trunk - Bug #6179][Open] File::pos broken in Windows 1.9.3p125 — "jmthomas (Jason Thomas)" <jmthomas@...>

24 messages 2012/03/20

[#43502] [ruby-trunk - Feature #6180][Open] to_b for converting objects to a boolean value — "AaronLasseigne (Aaron Lasseigne)" <aaron.lasseigne@...>

17 messages 2012/03/20

[#43529] [ruby-trunk - Bug #6183][Open] Enumerator::Lazy performance issue — "gregolsen (Innokenty Mikhailov)" <anotheroneman@...>

36 messages 2012/03/21

[#43543] [ruby-trunk - Bug #6184][Open] [BUG] Segmentation fault ruby 1.9.3p165 (2012-03-18 revision 35078) [x86_64-darwin11.3.0] — "Gebor (Pierre-Henry Frohring)" <frohring.pierrehenry@...>

8 messages 2012/03/21

[#43672] [ruby-trunk - Feature #6201][Open] do_something then return :special_case (include "then" operator) — "rosenfeld (Rodrigo Rosenfeld Rosas)" <rr.rosas@...>

12 messages 2012/03/26

[#43678] [ruby-trunk - Bug #6203][Open] Array#values_at does not handle ranges with end index past the end of the array — "ferrous26 (Mark Rada)" <markrada26@...>

15 messages 2012/03/26

[#43794] [ruby-trunk - Feature #6216][Open] SystemStackError backtraces should not be reduced to one line — "postmodern (Hal Brodigan)" <postmodern.mod3@...>

15 messages 2012/03/28

[#43814] [ruby-trunk - Feature #6219][Open] Return value of Hash#store — "MartinBosslet (Martin Bosslet)" <Martin.Bosslet@...>

20 messages 2012/03/28

[#43858] [ruby-trunk - Feature #6222][Open] Use ++ to connect statements — "gcao (Guoliang Cao)" <gcao99@...>

12 messages 2012/03/29

[#43904] [ruby-trunk - Feature #6225][Open] Hash#+ — "trans (Thomas Sawyer)" <transfire@...>

36 messages 2012/03/29

[#43951] [ruby-trunk - Bug #6228][Open] [mingw] Errno::EBADF in ruby/test_io.rb on ruby_1_9_3 — "jonforums (Jon Forums)" <redmine@...>

28 messages 2012/03/30

[#43996] [ruby-trunk - Bug #6236][Open] WEBrick::HTTPServer swallows Exception — "regularfry (Alex Young)" <alex@...>

13 messages 2012/03/31

[ruby-core:43612] [ruby-trunk - Bug #4044] Regex matching errors when using \W character class and /i option

From: duerst (Martin Dürst) <duerst@...>
Date: 2012-03-25 06:04:57 UTC
List: ruby-core #43612
Issue #4044 has been updated by duerst (Martin D端rst).


Hello Yui,

We discussed this issue at today's developpers' meeting in Akihabara.

There was wide consensus among the attendees that it is very strange to have 'k' and 's' included in the set of non-word (\W) characters. Therefore we are sorry, but we don't agree with your https://bugs.ruby-lang.org/issues/4044#note-7.

duerst (Martin D端rst) wrote:
> My current proposal is that we analyse what casing data is being used in what places when using /i (case insensitive matching) in regular expressions, and that we then fix that.

We have discussed this a bit. The first question is what \w should refer to in Ruby. I personally would hope that in the long term, we can move this to include all word characters (i.e. also non-ascii Latin, other scripts, Hiragana, Katakana, Kanji,...). But the general opinion today was that we should keep this as ASCII only currently. Anyway, this bug is independent of this problem, because in both cases, \w includes 'k' and 's', and therefore in both cases, \W must not include 'k' nor 's'.

Also, we noted that regular expression components such as \w or \W should be independent of whether /i is set or not. The reason for that is that \w already takes care of combining lower- and upper-case characters. So there's nothing a /i can improve, and it should not make things worse.

> By the way, can somebody explain the following difference:
> 
> $ ruby -e "puts /[\W]|\u1234/i.match('k').inspect"
> #<MatchData "k">
> 
> $ ruby -e "puts /\W|\u1234/i.match('k').inspect"
> nil
> 
> (|\u1234 is there just to force the regexp to be in UTF-8.)

I suspect that this is due to the fact that \W in character classes gets expanded to an actual list of characters (or ranges) before case-extension (/i), whereas \W outside character classes does not get affected by case-extension.

Given the above, I have reopened this bug. I hope to be able to help you over the next two weeks, but I hope you can take the lead.

Regards,   Martin.

----------------------------------------
Bug #4044: Regex matching errors when using \W character class and /i option
https://bugs.ruby-lang.org/issues/4044#change-25109

Author: ben_h (Ben Hoskings)
Status: Feedback
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category: core
Target version: 1.9.2
ruby -v: ruby 1.9.2p0 (2010-08-18 revision 29036) [x86_64-darwin10.4.0]


=begin
 Hi all,
 
 Josh Bassett and I just discovered an issue with regex matches on ruby-1.9.2p0. (We reduced it while we were hacking on gemcutter.)
 
 The case-insensitive (/i) option together with the non-word character class (\W) match inconsistently against the alphabet. Specifically the regex doesn't match properly against the letters 'k' and 's'.
 
 The following expression demonstrates the problem in irb:
 
     puts ('a'..'z').to_a.map {|c| [c, c.ord, c[/[^\W]/i] ].inspect }
 
 As a reference, the following two expressions are working properly:
 
     puts ('a'..'z').to_a.map {|c| [c, c.ord, c[/[^\W]/] ].inspect }
     puts ('a'..'z').to_a.map {|c| [c, c.ord, c[/[\w]/i] ].inspect }
 
 Cheers
 Ben Hoskings & Josh Bassett
=end



-- 
http://bugs.ruby-lang.org/

In This Thread

Prev Next