From: "javanthropus (Jeremy Bopp) via ruby-core" <ruby-core@...>
Date: 2024-11-12T14:32:00+00:00
Subject: [ruby-core:119895] [Ruby master Bug#20889] IO#ungetc and IO#ungetbyte should not cause IO#pos to report an inaccurate position

Issue #20889 has been reported by javanthropus (Jeremy Bopp).

----------------------------------------
Bug #20889: IO#ungetc and IO#ungetbyte should not cause IO#pos to report an inaccurate position
https://bugs.ruby-lang.org/issues/20889

* Author: javanthropus (Jeremy Bopp)
* Status: Open
* ruby -v: ruby 3.3.6 (2024-11-05 revision 75015d4c1f) [x86_64-linux]
* Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
```ruby
require 'tempfile'

Tempfile.open(encoding: 'utf-8') do |f|
  f.write('0123456789')
  f.rewind
  f.ungetbyte(93)
  f.pos       # => -1; negative value is surprising!
end

Tempfile.open(encoding: 'utf-8') do |f|
  f.write('0123456789')
  f.rewind
  f.ungetc('a'.encode('utf-8'))
  f.pos       # => -1; similar to the ungetbyte case
end

Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
  f.write('0123456789')
  f.rewind
  f.ungetc('a'.encode('utf-16le'))
  f.pos       # => 0; maybe should be -2 to match the previous ungetc case?
end
```

It doesn't seem logical that `IO#pos` should ever be affected by `IO#ungetc` or `IO#ungetbyte`.  The pushed characters or bytes aren't really in the stream source.  The value of `IO#pos` implies that jumping directly to that position via `IO#seek` and reading from there would return the same character or byte that was pushed, but the pushed characters or bytes are lost when the operation to seek in the stream is performed.  In the case where `IO#pos` is a negative value, attempting to seek to that position actually raises an exception.

In the `IO#ungetc` with character conversion case above, it seems unreasonable to make `IO#pos` report an even less correct position.  In that case, the position would need to be adjusted by 2 bytes in reverse due to the internal encoding of the stream, but that is completely inconsistent with the behavior of `IO#pos` when reading from the stream normally where it reports the underlying stream's byte position and not the number of transcoded bytes that have been read:

```ruby
require 'tempfile'

Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
  f.write('0123456789')
  f.rewind
  f.getc.bytesize # => 2; due to the internal encoding of the stream
  f.pos           # => 1; reports actual bytes read from the stream, not transcoded bytes
end
```

Attempting to use `IO#pos` when there are characters or bytes pushed into the read buffer by way of `IO#ungetc` or `IO#ungetbyte` should result in one of the following behaviors:
1. Raise and exception
2. Return the stream's position, clearing the read buffer entirely
3. Return the stream's position, ignoring the pushed characters or bytes, and produce a warning


-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/