From: duerst via ruby-core Date: 2024-04-04T00:09:16+00:00 Subject: [ruby-core:117437] [Ruby master Misc#20406] Question about Regexp encoding negotiation Issue #20406 has been updated by duerst (Martin D�rst). This is a more general comment, but my impression is that the encoding flags on regular expressions may be outdated. They exist since before Ruby introduced encoding information for Strings,... in Ruby 1.9. It may be time now to look into how/when they can be deprecated. ---------------------------------------- Misc #20406: Question about Regexp encoding negotiation https://bugs.ruby-lang.org/issues/20406#change-107813 * Author: andrykonchin (Andrew Konchin) * Status: Open ---------------------------------------- I am wondering what are the rules to calculate Regexp literal encoding in case an encoding modifier is specified. From the documentstion: > By default, a regexp with only US-ASCII characters has US-ASCII encoding: > ... > A regular expression containing non-US-ASCII characters is assumed to use the source encoding. This can be overridden with one of the following modifiers. > //n ... > //u ... > //e ... > //s ... Looking at the following examples I would assume that these rules are followed except one case: ```ruby p /\xc2\xa1/e .encoding # EUC-JP p /#{ }\xc2\xa1/e .encoding # EUC-JP p /a/e .encoding # EUC-JP p /a #{} a/e .encoding # EUC-JP p /#{} a/e .encoding # US-ASCII ``` The last Regexp `/#{} a/e` is supposed to have `EUC-JP` encoding but has `US-ASCII`. So I am wondering what rule is applied in this case. -- https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/