From: "k0kubun (Takashi Kokubun) via ruby-core" Date: 2025-07-14T21:57:49+00:00 Subject: [ruby-core:122772] [Ruby Bug#21503] \p{Word} does not match on \p{Join_Control} while docs say it does Issue #21503 has been updated by k0kubun (Takashi Kokubun). The patch to master modified `CR_Word` on `enc/unicode/16.0.0/name2ctype.h`, but Ruby 3.4 uses `enc/unicode/15.0.0/name2ctype.h` that has a different content in `CR_Word`. I'm not sure how to backport this properly. Could @procmarco or anybody else have a look at making a backport PR to ruby_3_4 branch on GitHub? ---------------------------------------- Bug #21503: \p{Word} does not match on \p{Join_Control} while docs say it does https://bugs.ruby-lang.org/issues/21503#change-114051 * Author: procmarco (Marco Concetto Rudilosso) * Status: Closed * ruby -v: 3.4.4 * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: REQUIRED ---------------------------------------- in the [docs](https://ruby-doc.org/3.4.1/Regexp.html#:~:text=/%5Cp%7B-,Word,-%7D/%3A%20A%20member) it is mentioned that `\p{Word}` matches the equivalent of: `[\p{M}\p{Nd}\p{Pc}\p{Alpha}\p{Join_Control}]` as it's also defined in the [unicode spec](https://unicode.org/reports/tr18/#word) the issue is that it does not seem to be the case ``` irb(main):018> REGEX = /\p{Word}/u => /\p{Word}/ irb(main):019> "\u200D".gsub(REGEX, "-") => "���" irb(main):020> REGEX2 = /\p{Join_Control}/u => /\p{Join_Control}/ irb(main):021> "\u200D".gsub(REGEX2, "-") => "-" ``` There's 2 solutions here, either we change the docs or the code. -- https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/