From: "nobu (Nobuyoshi Nakada) via ruby-core" Date: 2023-09-06T16:13:06+00:00 Subject: [ruby-core:114664] [Ruby master Bug#19867] Unicode line and paragraph separator are not stripped Issue #19867 has been updated by nobu (Nobuyoshi Nakada). Yes, `\s`, `\w` etc match only single-byte ASCII characters. I don't think changing the behavior by default is good idea. An optional (keyword) argument may be better. ---------------------------------------- Bug #19867: Unicode line and paragraph separator are not stripped https://bugs.ruby-lang.org/issues/19867#change-104492 * Author: iainbeeston (Iain Beeston) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.2 (2023-03-30 revision e51014f9c0) [arm64-darwin22] * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- Unicode newline and paragraph separators are not removed by any of the strip methods: `"\u2028\u2029\u0000\t\n\v\f\r ".strip # => "\u2028\u2029"` I would have expected `strip` (and `lstrip`, `rstrip`) to remove unicode whitespace as well. It looks like #7154 reported something similar but for regular expressions and way back In ruby 1.9. I think that fixing this should be simple (just checking for `\x2028` and `\x2029` in ctype.h) but I'm not sure if it's supposed to behave this way or if changing it could introduce unexpected consequences. -- https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/