From: "Martin J. Dürst" <duerst@...>
Date: 2016-02-03T16:21:19+09:00
Subject: [ruby-core:73666] Re: [Ruby trunk Bug#4044]Regex matching errors when using \W character class and /i option

On 2016/02/03 12:21, matthew@kerwin.net.au wrote:

> I want to write a spec for this, but some of the details are unclear to me. Can we confirm whether each of the following are spec?

Please don't just assume that the current behavior is spec. If it 
doesn't match with common sense in any way, it's very clear that we have 
to fix it. There may be borderline cases that are up for discussion, but 
at least most of the examples I have seen don't meet that criterion.

My understanding was that Ken Takata fixed the problem with r47598, but 
I'll try to have another look at that.

When I looked at Ken's solution last time
(the details are at the following link, in Japanese
https://github.com/k-takata/Onigmo/issues/4), it included some aspects 
related to ASCII, which keeps confusing me.

The relevant specification is Unicode Technical Standard #18, Unicode 
Regular Expressions, in particular 
http://www.unicode.org/reports/tr18/#Simple_Loose_Matches. There are 
various choices at the end of that section that are relevant to this issue.

My personal preference among the choices A-D is B. As far as I 
understand it, it would mean that while a /i option would change how 
literal characters are matched, it would not affect how it affects 
properties such as \W.

My justification for this is as follows: If I want e.g. a word 
character, then that already should include all the necessary 
characters, both upper and lower case (and title case just in case you 
forgot about it :-). It's difficult to see why I'd want the set of 
characters to change when adding /i. The same argument can be applied to 
\W and most if not all similar cases.

The case that I think can be up for discussion is explicit character 
classes, such as [a-z]. Here, in effect automatically adding A-Z (and 
some other case equivalents) may indeed make sense.

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>