From: "Martin J. Dürst" Date: 2016-02-03T16:21:19+09:00 Subject: [ruby-core:73666] Re: [Ruby trunk Bug#4044]Regex matching errors when using \W character class and /i option On 2016/02/03 12:21, matthew@kerwin.net.au wrote: > I want to write a spec for this, but some of the details are unclear to me. Can we confirm whether each of the following are spec? Please don't just assume that the current behavior is spec. If it doesn't match with common sense in any way, it's very clear that we have to fix it. There may be borderline cases that are up for discussion, but at least most of the examples I have seen don't meet that criterion. My understanding was that Ken Takata fixed the problem with r47598, but I'll try to have another look at that. When I looked at Ken's solution last time (the details are at the following link, in Japanese https://github.com/k-takata/Onigmo/issues/4), it included some aspects related to ASCII, which keeps confusing me. The relevant specification is Unicode Technical Standard #18, Unicode Regular Expressions, in particular http://www.unicode.org/reports/tr18/#Simple_Loose_Matches. There are various choices at the end of that section that are relevant to this issue. My personal preference among the choices A-D is B. As far as I understand it, it would mean that while a /i option would change how literal characters are matched, it would not affect how it affects properties such as \W. My justification for this is as follows: If I want e.g. a word character, then that already should include all the necessary characters, both upper and lower case (and title case just in case you forgot about it :-). It's difficult to see why I'd want the set of characters to change when adding /i. The same argument can be applied to \W and most if not all similar cases. The case that I think can be up for discussion is explicit character classes, such as [a-z]. Here, in effect automatically adding A-Z (and some other case equivalents) may indeed make sense. Unsubscribe: