From: "YO4 (Yoshinao Muramatsu) via ruby-core" <ruby-core@...>
Date: 2025-11-12T15:12:28+00:00
Subject: [ruby-core:123778] [Ruby Bug#21683] IO#each_codepoint do not take care of encoding when IO uses encoding conversion for reading.

Issue #21683 has been reported by YO4 (Yoshinao Muramatsu).

----------------------------------------
Bug #21683: IO#each_codepoint do not take care of encoding when IO uses encoding conversion for reading.
https://bugs.ruby-lang.org/issues/21683

* Author: YO4 (Yoshinao Muramatsu)
* Status: Open
* ruby -v: ruby 3.5.0dev (2025-11-03T10:33:44Z master 0832e954c9) +PRISM [x64-mingw-ucrt]
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
without encoding conversion
```irb
irb(main):001> open(File::NULL, 'r') { |f| f.ungetc(%Q[\u{3042}\u{3044}\u{3046}]); f.each_codepoint.map { |c| c.to_s(16) } }
=> ["3042", "3044", "3046"] # => valid
```

with encoding conversion
```irb
irb(main):001> open(File::NULL, 'rt') { |f| f.ungetc(%Q[\u{3042}\u{3044}\u{3046}]); f.each_codepoint.map { |c| c.to_s(16) } }
=> ["e3", "81", "82", "e3", "81", "84", "e3", "81", "86"] # => invalid
```

prior to ruby 3.4 lacks 6cd98c24fe9aeea3829ac3d554a277f053cec0be (Allow IO#each_codepoint to work with unetc even when encoding conversion active)
using ungetbyte can similarly reproduce this.
```irb
irb(main):001> open(File::NULL, 'rt') { |f| f.ungetbyte(%Q[\u{3042}\u{3044}\u{3046}]); p f.each_codepoint.map { |c| c.to_s(16) } }
=> ["e3", "81", "82", "e3", "81", "84", "e3", "81", "86"]
```


-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/