From: "k_takata (Ken Takata)" Date: 2013-04-10T00:41:29+09:00 Subject: [ruby-core:54145] [ruby-trunk - Bug #8210] Multibyte character interfering with end-line character within a regex Issue #8210 has been updated by k_takata (Ken Takata). File fix-8210-1.diff added File fix-8210-2.diff added This problem was caused by optimization of \z. I wrote two patches to fix this problem. Maybe fix-8210-1.diff is more efficient than fix-8210-2.diff, but the former one tries to do backward search when 'start==range' after 'start' is adjusted. This behavior is a little bit confusing. ---------------------------------------- Bug #8210: Multibyte character interfering with end-line character within a regex https://bugs.ruby-lang.org/issues/8210#change-38399 Author: sawa (Tsuyoshi Sawada) Status: Assigned Priority: Normal Assignee: naruse (Yui NARUSE) Category: M17N Target version: current: 2.1.0 ruby -v: 2.0 =begin With this regex: regex1 = /\z/ the following strings match as expected: "hello" =~ regex1 # => 5 "���������������" =~ regex1 # => 5 but with these regexes: regex2 = /#$/?\z/ regex3 = /\n?\z/ they show difference: "hello" =~ regex2 # => 5 "hello" =~ regex3 # => 5 "���������������" =~ regex2 # => nil "���������������" =~ regex3 # => nil The string encoding is UTF-8, and the OS is Linux (i.e., `$/` is `"\n"`). I expect them to behave the same, and believe this is a bug. =end -- http://bugs.ruby-lang.org/