From: naruse@... Date: 2017-09-15T06:18:56+00:00 Subject: [ruby-core:82806] [Ruby trunk Bug#13892] Matching the end of a string followed by an empty greedy regex and a word boundary (.*\b) fails in all versions >= 1.9 Issue #13892 has been updated by naruse (Yui NARUSE). As far as I understand, this is intentional behavior of Oniguruma (Onigmo), which Ruby uses. How do you think, k-takata? ---------------------------------------- Bug #13892: Matching the end of a string followed by an empty greedy regex and a word boundary (.*\b) fails in all versions >= 1.9 https://bugs.ruby-lang.org/issues/13892#change-66684 * Author: jhriggs (Jim Riggs) * Status: Open * Priority: Normal * Assignee: * Target version: * ruby -v: 1.8.7, 1.9.3p551, 2.0.0p648, 2.1.9p490, 2.2.7p470, 2.3.4p301, 2.4.1p111 * Backport: 2.2: UNKNOWN, 2.3: UNKNOWN, 2.4: UNKNOWN ---------------------------------------- This is a very specific regex failure that occurs when the final character of the string is matched by the end of a pattern that terminates with `.*\b`. For example: ~~~ ruby "abc" =~ /c.*\b/ "abc" =~ /abc.*\b/ "abc" =~ /\b.*abc.*\b/ ~~~ In Ruby 1.8.7 and every other language I have tested (perl, pcre, javascript, python, go, ...) this matches. Starting in 1.9, it appears that the greedy `.*` causes the `\b` to fail, though it should match. This only occurs when the pattern matches at the end of the string being matched. Based on my non-exhaustive testing, this only occurs with `.*\b`; other patterns like`.?\b` and specific characters such as `d*\b` work as expected: ~~~ ruby1.8 : /c.*\b/ : true ruby1.8 : /abc.*\b/ : true ruby1.8 : /\b.*abc.*\b/ : true ruby1.8 : /c.?\b/ : true ruby1.8 : /abc.?\b/ : true ruby1.8 : /\b.?abc.?\b/ : true ruby1.8 : /d*\b/ : true ruby1.8 : /abcd*\b/ : true ruby1.8 : /\b.*abcd*\b/ : true ruby1.9 : /c.*\b/ : false ruby1.9 : /abc.*\b/ : false ruby1.9 : /\b.*abc.*\b/ : false ruby1.9 : /c.?\b/ : true ruby1.9 : /abc.?\b/ : true ruby1.9 : /\b.?abc.?\b/ : true ruby1.9 : /d*\b/ : true ruby1.9 : /abcd*\b/ : true ruby1.9 : /\b.*abcd*\b/ : true ruby2.0 : /c.*\b/ : false ruby2.0 : /abc.*\b/ : false ruby2.0 : /\b.*abc.*\b/ : false ruby2.0 : /c.?\b/ : true ruby2.0 : /abc.?\b/ : true ruby2.0 : /\b.?abc.?\b/ : true ruby2.0 : /d*\b/ : true ruby2.0 : /abcd*\b/ : true ruby2.0 : /\b.*abcd*\b/ : true ruby2.1 : /c.*\b/ : false ruby2.1 : /abc.*\b/ : false ruby2.1 : /\b.*abc.*\b/ : false ruby2.1 : /c.?\b/ : true ruby2.1 : /abc.?\b/ : true ruby2.1 : /\b.?abc.?\b/ : true ruby2.1 : /d*\b/ : true ruby2.1 : /abcd*\b/ : true ruby2.1 : /\b.*abcd*\b/ : true ruby2.2 : /c.*\b/ : false ruby2.2 : /abc.*\b/ : false ruby2.2 : /\b.*abc.*\b/ : false ruby2.2 : /c.?\b/ : true ruby2.2 : /abc.?\b/ : true ruby2.2 : /\b.?abc.?\b/ : true ruby2.2 : /d*\b/ : true ruby2.2 : /abcd*\b/ : true ruby2.2 : /\b.*abcd*\b/ : true ruby2.3 : /c.*\b/ : false ruby2.3 : /abc.*\b/ : false ruby2.3 : /\b.*abc.*\b/ : false ruby2.3 : /c.?\b/ : true ruby2.3 : /abc.?\b/ : true ruby2.3 : /\b.?abc.?\b/ : true ruby2.3 : /d*\b/ : true ruby2.3 : /abcd*\b/ : true ruby2.3 : /\b.*abcd*\b/ : true ruby2.4 : /c.*\b/ : false ruby2.4 : /abc.*\b/ : false ruby2.4 : /\b.*abc.*\b/ : false ruby2.4 : /c.?\b/ : true ruby2.4 : /abc.?\b/ : true ruby2.4 : /\b.?abc.?\b/ : true ruby2.4 : /d*\b/ : true ruby2.4 : /abcd*\b/ : true ruby2.4 : /\b.*abcd*\b/ : true ~~~ See also: * https://regex101.com/r/JBzSic/2 (PHP/PCRE, Javascript, Python, Go) * http://fiddle.re/gkm4ad (Go, Java, Javascript, .Net, Perl, PHP, Python, XRegExp) * http://java-regex-tester.appspot.com/regex/04925044-ca95-46c6-bec5-329057c04ab2 (Java) -- https://bugs.ruby-lang.org/ Unsubscribe: