From: merch-redmine@... Date: 2021-05-13T15:12:50+00:00 Subject: [ruby-core:103836] [Ruby master Bug#14367] Wrong interpretation of backslash C in regexp literals Issue #14367 has been updated by jeremyevans0 (Jeremy Evans). jeremyevans0 (Jeremy Evans) wrote in #note-8: > nobu (Nobuyoshi Nakada) wrote in #note-7: > > Agree that the previous behavior might not be intentional, but commit:11ae581a4a7f5d5f5ec6378872eab8f25381b1b9 also seems something broken on other than US-ASCII encoding. > > > > ``` > > $ LANG=en_US.UTF-8 ./ruby -vce '/\c\xFF/' > > ruby 3.1.0dev (2021-05-13T01:55:43Z master 11ae581a4a) [x86_64-darwin19] > > -e:1: invalid multibyte escape: /\x9F/ > > -e:1: warning: possibly useless use of a literal in void context > > ``` > > The previous behavior also ended up with a regexp which matches a 8-bit character, so maybe Ruby should have given the same error before? Alternatively, I can revert if that is better? My previous statement was incorrect. The reason it worked before is that `\c` behavior in regexps was wrong and did not result in the 8-bit character it should have. If you used a character resulting in a high bit, you did get the same error: ``` $ LANG=en_US.UTF-8 ruby -vce '/\M-a/' ruby 3.0.1p64 (2021-04-05 revision 0fb782ee38) [x86_64-openbsd] -e:1: too short escaped multibyte character: /\M-a/ -e:1: warning: possibly useless use of a literal in void context ``` You would also get an error if you created a regexp using a string instead of using a literal regexp: ``` $ LANG=en_US.UTF-8 ruby -ve '/#{s="\c\xff"}/' ruby 3.0.1p64 (2021-04-05 revision 0fb782ee38) [x86_64-openbsd] -e:1: warning: possibly useless use of a literal in void context -e:1:in `
': invalid multibyte character (ArgumentError) ``` So I don't think anything is broken on UTF-8 (or other encodings). Before, it should have raised an error and it didn't because the incorrect algorithm resulted in the wrong character. Now it raises an error as it should. ---------------------------------------- Bug #14367: Wrong interpretation of backslash C in regexp literals https://bugs.ruby-lang.org/issues/14367#change-91954 * Author: shyouhei (Shyouhei Urabe) * Status: Closed * Priority: Normal * ruby -v: ruby 2.6.0dev (2018-01-16 trunk 61875) [x86_64-darwin15] * Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN ---------------------------------------- Following ruby code returns nil. ```sh % LC_ALL=C ruby -ve 'p(/\c\xFF/ =~ "\c\xFF")' ruby 2.6.0dev (2018-01-16 trunk 61875) [x86_64-darwin15] nil ``` Is this intentional? -- https://bugs.ruby-lang.org/ Unsubscribe: