[ruby-core:119773] [Ruby master Bug#20869] IO buffer handling is inconsistent when seeking
From:
"javanthropus (Jeremy Bopp) via ruby-core" <ruby-core@...>
Date:
2024-11-06 15:08:09 UTC
List:
ruby-core #119773
Issue #20869 has been updated by javanthropus (Jeremy Bopp).
I think your code change highlights another bug caused by the current behavior where `IO#pos` can report negative values. Oddly, `IO#seek(0, :CUR)` still returns 0:
```ruby
require 'tempfile'
Tempfile.open do |f|
f.write('0123456789')
f.rewind
f.ungetbyte(97)
f.pos # => -1
f.seek(0, :CUR) # => 0
end
```
Note that `IO#pos` works correctly when used with `IO#ungetc` while transcoding since that cases causes an entirely different buffer to be used which is currently ignored by the seeking functions. As demonstrated in the issue description though, that buffer isn't ever cleared when using the seeking functions.
Conceptually, it makes sense to me that the seeking functions should only care about bytes from the underlying stream since that's what they operate on. They should ignore read buffer manipulation by `IO#ungetbyte` and `IO#ungetc` since the data pushed by those methods have no relationship to the bytes of the stream. What I don't see anywhere I've looked is a statement regarding how `IO#ungetbyte` and `IO#ungetc` *should* interact with seeking operations. The existing specs and docs don't seem to cover those cases.
It would be great to get clarification here before working on solutions. While I think the best solution would be to disregard the bytes added by `IO#ungetbyte` and `IO#ungetc` and to clear the relevant buffers when seeking, I can imagine others may prefer to preserve the buffers. Maybe the solution is to leave the behavior deliberately undefined and just warn people against mixing these methods via documentation.
----------------------------------------
Bug #20869: IO buffer handling is inconsistent when seeking
https://bugs.ruby-lang.org/issues/20869#change-110439
* Author: javanthropus (Jeremy Bopp)
* Status: Open
* ruby -v: ruby 3.3.4 (2024-07-09 revision be1089c8ec) [x86_64-linux]
* Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
When performing any of the seek based operations on IO (IO#seek, IO#pos=, or IO#rewind), the read buffer is inconsistently cleared:
```ruby
require 'tempfile'
Tempfile.open do |f|
f.write('0123456789')
f.rewind
# Calling #ungetbyte as the first read buffer
# operation uses a buffer that is preserved during
# seek operations
f.ungetbyte(97)
# Byte buffer will not be cleared
f.seek(2, :SET)
f.getbyte # => 97
end
Tempfile.open do |f|
f.write('0123456789')
f.rewind
# Calling #getbyte before #ungetbyte uses a
# buffer that is not preserved when seeking
f.getbyte
f.ungetbyte(97)
# Byte buffer will be cleared
f.seek(2, :SET)
f.getbyte # => 50
end
```
Similar behavior happens when reading characters:
```ruby
require 'tempfile'
Tempfile.open do |f|
f.write('0123456789')
f.rewind
# Calling #ungetc as the first read buffer
# operation uses a buffer that is preserved during
# seek operations
f.ungetc('a')
# Character buffer will not be cleared
f.seek(2, :SET)
f.getc # => 'a'
end
Tempfile.open do |f|
f.write('0123456789')
f.rewind
# Calling #getc before #ungetc uses a
# buffer that is not preserved when seeking
f.getc
f.ungetc('a')
# Character buffer will be cleared
f.seek(2, :SET)
f.getc # => '2'
end
```
When transcoding, however, the character buffer is never cleared when seeking:
```ruby
require 'tempfile'
Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
f.write('0123456789')
f.rewind
f.ungetc('a'.encode('utf-16le'))
# Character buffer will not be cleared
f.seek(2, :SET)
f.getc # => 'a'.encode('utf-16le')
end
Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
f.write('0123456789')
f.rewind
f.getc
f.ungetc('a'.encode('utf-16le'))
# Character buffer will not be cleared
f.seek(2, :SET)
f.getc # => 'a'.encode('utf-16le')
end
```
I would expect the buffers to be cleared in all cases except possibly when the seek operation doesn't actually move the file pointer such as when calling IO#pos or IO#seek(0, :CUR). The inconsistent behavior demonstrated here is a problem regardless though.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/