From: duerst@... Date: 2019-09-17T09:37:54+00:00 Subject: [ruby-core:94944] [Ruby master Bug#16158] "st" Character Sequence In Regex Look-Behind Causes Illegal Pattern Error When Combined With POSIX Bracket Expressions And Case Insensitivity Flag Issue #16158 has been updated by duerst (Martin D�rst). Some more information: The onigmo documentation says (https://github.com/k-takata/Onigmo/blob/master/doc/RE#L270): ``` Subexp of look-behind must be fixed-width. But top-level alternatives can be of various lengths. ex. (?<=a|bc) is OK. (?<=aaa(?:b|cd)) is not allowed. ``` Now what onigmo does internally is apparently that it considers the st ligature as case equivalent to upper-case ST, which is again case equivalent to lowercase st. You can see that as follows: ``` $ ruby -ve 'puts(/\uFB06/i =~ "most")' ruby 2.7.0dev (2019-07-06T03:43:38Z trunk f296c260ef) [x86_64-cygwin] 2 ``` The st ligature is a single character, so its length is 1, but the length of ST and st is 2. So with the //i option, st seems to no longer be fixed width, and that's why onigmo refuses to deal with this and produces an error. So in some way, this is as per spec, although it's surprising and annoying. ---------------------------------------- Bug #16158: "st" Character Sequence In Regex Look-Behind Causes Illegal Pattern Error When Combined With POSIX Bracket Expressions And Case Insensitivity Flag https://bugs.ruby-lang.org/issues/16158#change-81561 * Author: michaeltomko (Michael Tomko) * Status: Open * Priority: Normal * Assignee: * Target version: * ruby -v: ruby 2.7.0dev (2019-09-11 master 146677a1e7) [x86_64-openbsd6.5] * Backport: 2.5: UNKNOWN, 2.6: UNKNOWN ---------------------------------------- *This is my first Ruby bug submission. Please let me know if there is anything else that I can provide that would be helpful. Thanks for your time!* I've tried just about as many combinations as I can think of and I have been able to narrow down the issue to the following components being present in a regular expression. * The character sequence "st" either preceded by any characters OR being a part of a top-level alternation inside of a look-behind. The issue occurs with both positive and negative look-behinds. ex: `(? pat = /(? pat = /(?i)(? pat = /(? pat = /((? /((? pat = /((? /((? pat = /((? /((? pat = /(? /(? pat = /(? /(?