From: "charliesome (Charlie Somerville)" Date: 2012-12-03T21:27:33+09:00 Subject: [ruby-core:50522] [ruby-trunk - Bug #7501] \w in a regular expression doesn't match international characters Issue #7501 has been updated by charliesome (Charlie Somerville). /[[:alpha:]]+/ should behave as you expect ---------------------------------------- Bug #7501: \w in a regular expression doesn't match international characters https://bugs.ruby-lang.org/issues/7501#change-34360 Author: eltomito (Tomas Partl) Status: Open Priority: Normal Assignee: Category: core Target version: ruby -v: ruby 1.9.3p0 (2011-10-30 revision 33570) [i686-linux] When using regexp matching, \w doesn't match characters which are not in the English alphabet. For example, the characters "��������������a��������������" should all be matched by \w but aren't. This program demonstrates the bug: -------------------------------------------------------- # encoding: utf-8 match = /\w+/.match( "abcdefghijklmnopqrstuvwxyz" ) puts match.to_s match = /\w+/.match( "����������������������������" ) #some Czech characters puts match.to_s match = /\w+/.match( "������" ) #some German characters puts match.to_s ---------------------------------------------------------- Expected output: ---------------------------------------------------------- abcdefghijklmnopqrstuvwxyz ���������������������������� ������ ---------------------------------------------------------- Actual output: ---------------------------------------------------------- abcdefghijklmnopqrstuvwxyz ---------------------------------------------------------- -- http://bugs.ruby-lang.org/