From: "eltomito (Tomas Partl)" Date: 2012-12-03T18:22:11+09:00 Subject: [ruby-core:50516] [ruby-trunk - Bug #7501][Open] \w in a regular expression doesn't match international characters Issue #7501 has been reported by eltomito (Tomas Partl). ---------------------------------------- Bug #7501: \w in a regular expression doesn't match international characters https://bugs.ruby-lang.org/issues/7501 Author: eltomito (Tomas Partl) Status: Open Priority: Normal Assignee: Category: core Target version: ruby -v: ruby 1.9.3p0 (2011-10-30 revision 33570) [i686-linux] When using regexp matching, \w doesn't match characters which are not in the English alphabet. For example, the characters "��������������a��������������" should all be matched by \w but aren't. This program demonstrates the bug: -------------------------------------------------------- # encoding: utf-8 match = /\w+/.match( "abcdefghijklmnopqrstuvwxyz" ) puts match.to_s match = /\w+/.match( "����������������������������" ) #some Czech characters puts match.to_s match = /\w+/.match( "������" ) #some German characters puts match.to_s ---------------------------------------------------------- Expected output: ---------------------------------------------------------- abcdefghijklmnopqrstuvwxyz ���������������������������� ������ ---------------------------------------------------------- Actual output: ---------------------------------------------------------- abcdefghijklmnopqrstuvwxyz ---------------------------------------------------------- -- http://bugs.ruby-lang.org/