[#109403] [Ruby master Feature#18951] Object#with to set and restore attributes around a block — "byroot (Jean Boussier)" <noreply@...>

Issue #18951 has been reported by byroot (Jean Boussier).

23 messages 2022/08/01

[#109423] [Ruby master Misc#18954] DevMeeting-2022-08-18 — "mame (Yusuke Endoh)" <noreply@...>

Issue #18954 has been reported by mame (Yusuke Endoh).

10 messages 2022/08/04

[#109449] [Ruby master Feature#18959] Handle gracefully nil kwargs eg. **nil — "LevLukomskyi (Lev Lukomskyi)" <noreply@...>

Issue #18959 has been reported by LevLukomskyi (Lev Lukomskyi).

27 messages 2022/08/08

[#109456] [Ruby master Bug#18960] Module#using raises RuntimeError when called at toplevel from wrapped script — "shioyama (Chris Salzberg)" <noreply@...>

Issue #18960 has been reported by shioyama (Chris Salzberg).

15 messages 2022/08/09

[#109550] [Ruby master Feature#18965] Further Thread::Queue improvements — "byroot (Jean Boussier)" <noreply@...>

Issue #18965 has been reported by byroot (Jean Boussier).

14 messages 2022/08/18

[#109575] [Ruby master Bug#18967] Segmentation fault in stackprof with Ruby 2.7.6 — "RubyBugs (A Nonymous)" <noreply@...>

Issue #18967 has been reported by RubyBugs (A Nonymous).

10 messages 2022/08/19

[#109598] [Ruby master Bug#18970] CRuby adds an invalid header to bin/bundle (and others) which makes it unusable in Bash on Windows — "Eregon (Benoit Daloze)" <noreply@...>

Issue #18970 has been reported by Eregon (Benoit Daloze).

17 messages 2022/08/20

[#109645] [Ruby master Bug#18973] Kernel#sprintf: %c allows codepoints above 127 for 7-bits ASCII encoding — "andrykonchin (Andrew Konchin)" <noreply@...>

Issue #18973 has been reported by andrykonchin (Andrew Konchin).

8 messages 2022/08/23

[#109689] [Ruby master Misc#18977] DevMeeting-2022-09-22 — "mame (Yusuke Endoh)" <noreply@...>

Issue #18977 has been reported by mame (Yusuke Endoh).

16 messages 2022/08/25

[#109707] [Ruby master Feature#18980] Re-reconsider numbered parameters: `it` as a default block parameter — "k0kubun (Takashi Kokubun)" <noreply@...>

Issue #18980 has been reported by k0kubun (Takashi Kokubun).

40 messages 2022/08/26

[#109756] [Ruby master Feature#18982] Add an `exception: false` argument for Queue#push, Queue#pop, SizedQueue#push and SizedQueue#pop — "byroot (Jean Boussier)" <noreply@...>

Issue #18982 has been reported by byroot (Jean Boussier).

11 messages 2022/08/29

[#109773] [Ruby master Misc#18984] Doc for Range#size for Float/Rational does not make sense — "masasakano (Masa Sakano)" <noreply@...>

Issue #18984 has been reported by masasakano (Masa Sakano).

7 messages 2022/08/29

[ruby-core:109668] [Ruby master Bug#18931] Inconsistent handling of invalid codepoints in String#lstrip and String#rstrip

From: "jeremyevans0 (Jeremy Evans)" <noreply@...>
Date: 2022-08-24 19:38:57 UTC
List: ruby-core #109668
Issue #18931 has been updated by jeremyevans0 (Jeremy Evans).


I submitted a pull request to always raise an exception for rstrip strings with broken coderange: https://github.com/ruby/ruby/pull/6282

That may not match the lstrip behavior exactly, as it complains about cases where the broken coderange is before the last non-whitespace character. However, it seems the simplest solution, and I'm not sure we want to go out of our way to support broken strings in rstrip.  

----------------------------------------
Bug #18931: Inconsistent handling of invalid codepoints in String#lstrip and String#rstrip
https://bugs.ruby-lang.org/issues/18931#change-98890

* Author: nirvdrum (Kevin Menard)
* Status: Open
* Priority: Normal
* ruby -v: ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [arm64-darwin21]
* Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
When attempting to strip a string, there are three basic options when an invalid code point is encountered:

1) Ignore the code point
2) Strip the code point
3) Raise an exception

For background, Ruby does not consider the string's code range for `lstrip` or `rstrip`. It permits stripping strings with a `ENC_CODERANGE_BROKEN` so long as any invalid code points are not encountered while performing the loop to remove whitespace. What it does when such a code point is encountered, however, is not consistent between `lstrip` and `rstrip`.

`String#lstrip` will unconditionally raise an invalid byte sequence error:

```
> ruby -v
ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [arm64-darwin21]

> ruby -e 'p " \x80abc".lstrip'
-e:1:in `lstrip': invalid byte sequence in UTF-8 (ArgumentError)
	from -e:1:in `<main>'

> ruby -e 'p " \x80 abc".lstrip'
-e:1:in `lstrip': invalid byte sequence in UTF-8 (ArgumentError)
	from -e:1:in `<main>'

> ruby -e 'p "\x80 abc".lstrip'
-e:1:in `lstrip': invalid byte sequence in UTF-8 (ArgumentError)
	from -e:1:in `<main>'

> ruby -e 'p "\x80".lstrip'
-e:1:in `lstrip': invalid byte sequence in UTF-8 (ArgumentError)
	from -e:1:in `<main>'

> ruby -e ' p " a\x80bc".lstrip'
"a\x80bc"   # This one is okay because the broken code point appears after a non-whitespace code point.
```

Things get a lot messier with `String#rstrip`, however. Depending on context, `rstrip` may raise an exception, treat the broken code point as a non-whitespace boundary and stop processing, or treat the broken code point as if it were whitespace and remove it.

`String#rstrip` will ignore the invalid code point if it immediately follows a non-whitespace code point:

```
> ruby -v
ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [arm64-darwin21]

> ruby -e 'p "abc\x80 ".rstrip'
"abc\x80"

> ruby -e 'p "abc\x80".rstrip'
"abc\x80"
```

`String#rstrip` will remove the invalid code point if it is surround by whitespace:

```
> ruby -v
ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [arm64-darwin21]

> ruby -e 'p "abc \x80".rstrip'
"abc"

> ruby -e 'p "abc \x80 ".rstrip'
"abc"

> ruby -e 'p " \x80 ".rstrip'
""
```

`String#rstrip` will raise an exception if no valid, non-whitespace code points appear before it:

```
> ruby -v
ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [arm64-darwin21]

> ruby -e 'p "\x80 ".rstrip'
-e:1:in `rstrip': invalid byte sequence in UTF-8 (ArgumentError)
	from -e:1:in `<main>'

> ruby -e 'p "\x80".rstrip'
-e:1:in `rstrip': invalid byte sequence in UTF-8 (ArgumentError)
	from -e:1:in `<main>'
```

It looks to me like the current behavior is a byproduct of the functions chosen for finding code point boundaries, rather than something deliberately chosen. E.g., `rb_str_lstrip` will call `rb_enc_codepoint_len`, which raises on invalid code points, while `rb_str_rstrip` calls `rb_enc_prev_char`, which doesn't perform the same code point validation.  I think it'd make for a better user experience if `lstrip` and `rstrip` behaved consistently with each other, which would then unify the behavior in `rstrip`. What that behavior should be needs to be decided and I'm hoping to reach consensus on the semantics in this issue.





-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

In This Thread

Prev Next