From: "shyouhei (Shyouhei Urabe)" Date: 2012-12-04T03:43:41+09:00 Subject: [ruby-core:50537] [ruby-trunk - Bug #7501][Rejected] \w in a regular expression doesn't match international characters Issue #7501 has been updated by shyouhei (Shyouhei Urabe). Status changed from Open to Rejected If I remember correctly this is an intentional design. Because as Unicode version grows, the definition of what is a word character and what is not changes form time to time. It is hard for us to follow that. ---------------------------------------- Bug #7501: \w in a regular expression doesn't match international characters https://bugs.ruby-lang.org/issues/7501#change-34380 Author: eltomito (Tomas Partl) Status: Rejected Priority: Normal Assignee: Category: core Target version: ruby -v: ruby 1.9.3p0 (2011-10-30 revision 33570) [i686-linux] When using regexp matching, \w doesn't match characters which are not in the English alphabet. For example, the characters "��������������a��������������" should all be matched by \w but aren't. This program demonstrates the bug: -------------------------------------------------------- # encoding: utf-8 match = /\w+/.match( "abcdefghijklmnopqrstuvwxyz" ) puts match.to_s match = /\w+/.match( "����������������������������" ) #some Czech characters puts match.to_s match = /\w+/.match( "������" ) #some German characters puts match.to_s ---------------------------------------------------------- Expected output: ---------------------------------------------------------- abcdefghijklmnopqrstuvwxyz ���������������������������� ������ ---------------------------------------------------------- Actual output: ---------------------------------------------------------- abcdefghijklmnopqrstuvwxyz ---------------------------------------------------------- -- http://bugs.ruby-lang.org/