From: "byroot (Jean Boussier) via ruby-core" Date: 2024-11-05T17:52:55+00:00 Subject: [ruby-core:119748] [Ruby master Bug#20869] IO buffer handling is inconsistent when seeking Issue #20869 has been updated by byroot (Jean Boussier). I just looked into this a bit, I'm not quite familiar enough with the code to really propose a fix, but I get what is happening: ungetbyte just shift the buffer offset, but the FD offset in unchanged. ```c static void io_ungetbyte(VALUE str, rb_io_t *fptr) { // snip... // ungetbyte just shift the buffer offset, but the FD offset in unchanged fptr->rbuf.off-=(int)len; fptr->rbuf.len+=(int)len; MEMMOVE(fptr->rbuf.ptr+fptr->rbuf.off, RSTRING_PTR(str), char, len); } `fptr->rbuf.len == 1`, but real FD offset is 0 So we're doing `lseek(-1)` which fail with `EINVAL` ```c static void io_unread(rb_io_t *fptr) { rb_off_t r; rb_io_check_closed(fptr); if (fptr->rbuf.len == 0 || fptr->mode & FMODE_DUPLEX) return; /* xxx: target position may be negative if buffer is filled by ungetc */ errno = 0; // fptr->rbuf.len == 1, but real FD offset is 0 // So we're doing lseek(-1) which fail with EINVAL r = lseek(fptr->fd, -fptr->rbuf.len, SEEK_CUR); if (r < 0 && errno) { if (errno == ESPIPE) fptr->mode |= FMODE_DUPLEX; return; } fptr->rbuf.off = 0; fptr->rbuf.len = 0; return; } ``` So I suppose some more tracking info is needed to know that the real FD position and the buffer offset are desynced. ---------------------------------------- Bug #20869: IO buffer handling is inconsistent when seeking https://bugs.ruby-lang.org/issues/20869#change-110411 * Author: javanthropus (Jeremy Bopp) * Status: Open * ruby -v: ruby 3.3.4 (2024-07-09 revision be1089c8ec) [x86_64-linux] * Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- When performing any of the seek based operations on IO (IO#seek, IO#pos=, or IO#rewind), the read buffer is inconsistently cleared: ```ruby require 'tempfile' Tempfile.open do |f| f.write('0123456789') f.rewind # Calling #ungetbyte as the first read buffer # operation uses a buffer that is preserved during # seek operations f.ungetbyte(97) # Byte buffer will not be cleared f.seek(2, :SET) f.getbyte # => 97 end Tempfile.open do |f| f.write('0123456789') f.rewind # Calling #getbyte before #ungetbyte uses a # buffer that is not preserved when seeking f.getbyte f.ungetbyte(97) # Byte buffer will be cleared f.seek(2, :SET) f.getbyte # => 50 end ``` Similar behavior happens when reading characters: ```ruby require 'tempfile' Tempfile.open do |f| f.write('0123456789') f.rewind # Calling #ungetc as the first read buffer # operation uses a buffer that is preserved during # seek operations f.ungetc('a') # Character buffer will not be cleared f.seek(2, :SET) f.getc # => 'a' end Tempfile.open do |f| f.write('0123456789') f.rewind # Calling #getc before #ungetc uses a # buffer that is not preserved when seeking f.getc f.ungetc('a') # Character buffer will be cleared f.seek(2, :SET) f.getc # => '2' end ``` When transcoding, however, the character buffer is never cleared when seeking: ```ruby require 'tempfile' Tempfile.open(encoding: 'utf-8:utf-16le') do |f| f.write('0123456789') f.rewind f.ungetc('a'.encode('utf-16le')) # Character buffer will not be cleared f.seek(2, :SET) f.getc # => 'a'.encode('utf-16le') end Tempfile.open(encoding: 'utf-8:utf-16le') do |f| f.write('0123456789') f.rewind f.getc f.ungetc('a'.encode('utf-16le')) # Character buffer will not be cleared f.seek(2, :SET) f.getc # => 'a'.encode('utf-16le') end ``` I would expect the buffers to be cleared in all cases except possibly when the seek operation doesn't actually move the file pointer such as when calling IO#pos or IO#seek(0, :CUR). The inconsistent behavior demonstrated here is a problem regardless though. -- https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/