From: "javanthropus (Jeremy Bopp) via ruby-core" Date: 2024-11-06T15:08:09+00:00 Subject: [ruby-core:119773] [Ruby master Bug#20869] IO buffer handling is inconsistent when seeking Issue #20869 has been updated by javanthropus (Jeremy Bopp). I think your code change highlights another bug caused by the current behavior where `IO#pos` can report negative values. Oddly, `IO#seek(0, :CUR)` still returns 0: ```ruby require 'tempfile' Tempfile.open do |f| f.write('0123456789') f.rewind f.ungetbyte(97) f.pos # => -1 f.seek(0, :CUR) # => 0 end ``` Note that `IO#pos` works correctly when used with `IO#ungetc` while transcoding since that cases causes an entirely different buffer to be used which is currently ignored by the seeking functions. As demonstrated in the issue description though, that buffer isn't ever cleared when using the seeking functions. Conceptually, it makes sense to me that the seeking functions should only care about bytes from the underlying stream since that's what they operate on. They should ignore read buffer manipulation by `IO#ungetbyte` and `IO#ungetc` since the data pushed by those methods have no relationship to the bytes of the stream. What I don't see anywhere I've looked is a statement regarding how `IO#ungetbyte` and `IO#ungetc` *should* interact with seeking operations. The existing specs and docs don't seem to cover those cases. It would be great to get clarification here before working on solutions. While I think the best solution would be to disregard the bytes added by `IO#ungetbyte` and `IO#ungetc` and to clear the relevant buffers when seeking, I can imagine others may prefer to preserve the buffers. Maybe the solution is to leave the behavior deliberately undefined and just warn people against mixing these methods via documentation. ---------------------------------------- Bug #20869: IO buffer handling is inconsistent when seeking https://bugs.ruby-lang.org/issues/20869#change-110439 * Author: javanthropus (Jeremy Bopp) * Status: Open * ruby -v: ruby 3.3.4 (2024-07-09 revision be1089c8ec) [x86_64-linux] * Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- When performing any of the seek based operations on IO (IO#seek, IO#pos=, or IO#rewind), the read buffer is inconsistently cleared: ```ruby require 'tempfile' Tempfile.open do |f| f.write('0123456789') f.rewind # Calling #ungetbyte as the first read buffer # operation uses a buffer that is preserved during # seek operations f.ungetbyte(97) # Byte buffer will not be cleared f.seek(2, :SET) f.getbyte # => 97 end Tempfile.open do |f| f.write('0123456789') f.rewind # Calling #getbyte before #ungetbyte uses a # buffer that is not preserved when seeking f.getbyte f.ungetbyte(97) # Byte buffer will be cleared f.seek(2, :SET) f.getbyte # => 50 end ``` Similar behavior happens when reading characters: ```ruby require 'tempfile' Tempfile.open do |f| f.write('0123456789') f.rewind # Calling #ungetc as the first read buffer # operation uses a buffer that is preserved during # seek operations f.ungetc('a') # Character buffer will not be cleared f.seek(2, :SET) f.getc # => 'a' end Tempfile.open do |f| f.write('0123456789') f.rewind # Calling #getc before #ungetc uses a # buffer that is not preserved when seeking f.getc f.ungetc('a') # Character buffer will be cleared f.seek(2, :SET) f.getc # => '2' end ``` When transcoding, however, the character buffer is never cleared when seeking: ```ruby require 'tempfile' Tempfile.open(encoding: 'utf-8:utf-16le') do |f| f.write('0123456789') f.rewind f.ungetc('a'.encode('utf-16le')) # Character buffer will not be cleared f.seek(2, :SET) f.getc # => 'a'.encode('utf-16le') end Tempfile.open(encoding: 'utf-8:utf-16le') do |f| f.write('0123456789') f.rewind f.getc f.ungetc('a'.encode('utf-16le')) # Character buffer will not be cleared f.seek(2, :SET) f.getc # => 'a'.encode('utf-16le') end ``` I would expect the buffers to be cleared in all cases except possibly when the seek operation doesn't actually move the file pointer such as when calling IO#pos or IO#seek(0, :CUR). The inconsistent behavior demonstrated here is a problem regardless though. -- https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/