From: "dbrown9@... (Dustin Brown) via ruby-core" Date: 2023-12-04T21:54:46+00:00 Subject: [ruby-core:115588] [Ruby master Bug#20039] Matching US-ASCII string to copied UTF-8 Regexp causes invalid multibyte character error Issue #20039 has been reported by dbrown9@gmail.com (Dustin Brown). ---------------------------------------- Bug #20039: Matching US-ASCII string to copied UTF-8 Regexp causes invalid multibyte character error https://bugs.ruby-lang.org/issues/20039 * Author: dbrown9@gmail.com (Dustin Brown) * Status: Open * Priority: Normal * ruby -v: ruby 3.3.0dev (2023-12-03 master 85bc80a) * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- Matching a US-ASCII string to a UTF-8 encoded regexp with multibyte characters works as expected. ```ruby re = Regexp.new("\u2018".encode("UTF-8")) "".encode("US-ASCII").match?(re) => false ``` However, if that regexp is used to initialize a new regexp, the comparison fails with a Invalid mutibyte character error. ```ruby re = Regexp.new("\u2018".encode("UTF-8")) "".encode("US-ASCII").match?(Regexp.new(re)) => ArgumentError: regexp preprocess failed: invalid multibyte character ``` -- https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/