From: "YO4 (Yoshinao Muramatsu) via ruby-core" Date: 2025-11-12T15:12:28+00:00 Subject: [ruby-core:123778] [Ruby Bug#21683] IO#each_codepoint do not take care of encoding when IO uses encoding conversion for reading. Issue #21683 has been reported by YO4 (Yoshinao Muramatsu). ---------------------------------------- Bug #21683: IO#each_codepoint do not take care of encoding when IO uses encoding conversion for reading. https://bugs.ruby-lang.org/issues/21683 * Author: YO4 (Yoshinao Muramatsu) * Status: Open * ruby -v: ruby 3.5.0dev (2025-11-03T10:33:44Z master 0832e954c9) +PRISM [x64-mingw-ucrt] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- without encoding conversion ```irb irb(main):001> open(File::NULL, 'r') { |f| f.ungetc(%Q[\u{3042}\u{3044}\u{3046}]); f.each_codepoint.map { |c| c.to_s(16) } } => ["3042", "3044", "3046"] # => valid ``` with encoding conversion ```irb irb(main):001> open(File::NULL, 'rt') { |f| f.ungetc(%Q[\u{3042}\u{3044}\u{3046}]); f.each_codepoint.map { |c| c.to_s(16) } } => ["e3", "81", "82", "e3", "81", "84", "e3", "81", "86"] # => invalid ``` prior to ruby 3.4 lacks 6cd98c24fe9aeea3829ac3d554a277f053cec0be (Allow IO#each_codepoint to work with unetc even when encoding conversion active) using ungetbyte can similarly reproduce this. ```irb irb(main):001> open(File::NULL, 'rt') { |f| f.ungetbyte(%Q[\u{3042}\u{3044}\u{3046}]); p f.each_codepoint.map { |c| c.to_s(16) } } => ["e3", "81", "82", "e3", "81", "84", "e3", "81", "86"] ``` -- https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/