From: "iainbeeston (Iain Beeston) via ruby-core" Date: 2023-09-06T15:36:11+00:00 Subject: [ruby-core:114663] [Ruby master Bug#19867] Unicode line and paragraph separator are not stripped Issue #19867 has been updated by iainbeeston (Iain Beeston). I can see that the `[[:space:]]` regex class does match unicode whitespace characters (`"\u2028" =~ /[[:space:]]/` # => 0`) but `\s` does not (`"\u2028" =~ /\s/` # => nil`) ---------------------------------------- Bug #19867: Unicode line and paragraph separator are not stripped https://bugs.ruby-lang.org/issues/19867#change-104491 * Author: iainbeeston (Iain Beeston) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.2 (2023-03-30 revision e51014f9c0) [arm64-darwin22] * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- Unicode newline and paragraph separators are not removed by any of the strip methods: `"\u2028\u2029\u0000\t\n\v\f\r ".strip # => "\u2028\u2029"` I would have expected `strip` (and `lstrip`, `rstrip`) to remove unicode whitespace as well. It looks like #7154 reported something similar but for regular expressions and way back In ruby 1.9. I think that fixing this should be simple (just checking for `\x2028` and `\x2029` in ctype.h) but I'm not sure if it's supposed to behave this way or if changing it could introduce unexpected consequences. -- https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/