From: "sawa (Tsuyoshi Sawada)" Date: 2013-04-06T09:58:50+09:00 Subject: [ruby-core:54046] [ruby-trunk - Bug #8210] Multibyte character interfering with end-line character within a regex Issue #8210 has been updated by sawa (Tsuyoshi Sawada). =begin The problem seems to happen with combination of a certain token, `?`, and `\z`. "���������������" =~ /a?\z/ # => nil "���������������" =~ / ?\z/ # => nil "���������������" =~ /\t?\z/ # => nil "���������������" =~ /\n?\z/ # => nil "���������������" =~ /\s?\z/ # => nil "���������������" =~ /.?\z/ # => 4 "���������������" =~ /\S?\z/ # => 4 "���������������" =~ /\W?\z/ # => 5 "���������������" =~ /���?\z/ # => 5 "���������������" =~ /\w?\z/ # => 5 =end ---------------------------------------- Bug #8210: Multibyte character interfering with end-line character within a regex https://bugs.ruby-lang.org/issues/8210#change-38278 Author: sawa (Tsuyoshi Sawada) Status: Open Priority: Normal Assignee: Category: Target version: ruby -v: 2.0 =begin With this regex: regex1 = /\z/ the following strings match as expected: "hello" =~ regex1 # => 5 "���������������" =~ regex1 # => 5 but with these regexes: regex2 = /#$/?\z/ regex3 = /\n?\z/ they show difference: "hello" =~ regex2 # => 5 "hello" =~ regex3 # => 5 "���������������" =~ regex2 # => nil "���������������" =~ regex3 # => nil The string encoding is UTF-8, and the OS is Linux (i.e., `$/` is `"\n"`). I expect them to behave the same, and believe this is a bug. =end -- http://bugs.ruby-lang.org/