[ruby-core:71756] [Ruby trunk - Bug #10891] /[[:punct:]]/ POSIX group broken (with string literals?)

From: shugo@...
Date: 2015-11-30 13:48:56 UTC
List: ruby-core #71756
Issue #10891 has been updated by Shugo Maeda.


Yui NARUSE wrote:
> It follows UTR#18's Standard Recommendation.
> http://www.unicode.org/reports/tr18/#punct

In general, it would be a reasonable choice.

However, in Ruby, the problem is that it's hard to guess the programmers intention from code,
because the behavior is decided not by the regular expression, but by the target string.

```
def do_something(s)
  ...
  if /[[:punct:]]/ =~ s  # should "<" match, or shouldn't?
    ...
  end
  ...
end
```

If you want to reject symbols, `/\p{P}/` can be used instead, and it's more readable.


----------------------------------------
Bug #10891: /[[:punct:]]/ POSIX group broken (with string literals?)
https://bugs.ruby-lang.org/issues/10891#change-55167

* Author: Tom Lord
* Status: Feedback
* Priority: Normal
* Assignee: Yui NARUSE
* ruby -v: ruby 2.2.0p0 (2014-12-25 revision 49005) [x86_64-linux]
* Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN, 2.2: UNKNOWN
----------------------------------------
The regular expression: `/[[:punct:]]/` should match the following characters:

    ! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~

However, it only works for these characters:

    ! " # % & ' ( ) * , - . / : ; ? @ [ \\ ] _ { }

And does not work for these characters:

    $ + < = > ^ ` | ~

However, this is where it gets really weird... Consider the following:

    60.chr == "<" # true
    60.chr =~ /[[:punct:]]/ # => 0
    "<" =~ /[[:punct:]]/ # => nil

So, it seems that the regular expression only fails for string literals!



-- 
https://bugs.ruby-lang.org/

In This Thread

Prev Next