[ruby-core:119748] [Ruby master Bug#20869] IO buffer handling is inconsistent when seeking
From:
"byroot (Jean Boussier) via ruby-core" <ruby-core@...>
Date:
2024-11-05 17:52:55 UTC
List:
ruby-core #119748
Issue #20869 has been updated by byroot (Jean Boussier).
I just looked into this a bit, I'm not quite familiar enough with the code to really propose a fix, but I get what is happening:
ungetbyte just shift the buffer offset, but the FD offset in unchanged.
```c
static void
io_ungetbyte(VALUE str, rb_io_t *fptr)
{
// snip...
// ungetbyte just shift the buffer offset, but the FD offset in unchanged
fptr->rbuf.off-=(int)len;
fptr->rbuf.len+=(int)len;
MEMMOVE(fptr->rbuf.ptr+fptr->rbuf.off, RSTRING_PTR(str), char, len);
}
`fptr->rbuf.len == 1`, but real FD offset is 0
So we're doing `lseek(-1)` which fail with `EINVAL`
```c
static void
io_unread(rb_io_t *fptr)
{
rb_off_t r;
rb_io_check_closed(fptr);
if (fptr->rbuf.len == 0 || fptr->mode & FMODE_DUPLEX)
return;
/* xxx: target position may be negative if buffer is filled by ungetc */
errno = 0;
// fptr->rbuf.len == 1, but real FD offset is 0
// So we're doing lseek(-1) which fail with EINVAL
r = lseek(fptr->fd, -fptr->rbuf.len, SEEK_CUR);
if (r < 0 && errno) {
if (errno == ESPIPE)
fptr->mode |= FMODE_DUPLEX;
return;
}
fptr->rbuf.off = 0;
fptr->rbuf.len = 0;
return;
}
```
So I suppose some more tracking info is needed to know that the real FD position and the buffer offset are desynced.
----------------------------------------
Bug #20869: IO buffer handling is inconsistent when seeking
https://bugs.ruby-lang.org/issues/20869#change-110411
* Author: javanthropus (Jeremy Bopp)
* Status: Open
* ruby -v: ruby 3.3.4 (2024-07-09 revision be1089c8ec) [x86_64-linux]
* Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
When performing any of the seek based operations on IO (IO#seek, IO#pos=, or IO#rewind), the read buffer is inconsistently cleared:
```ruby
require 'tempfile'
Tempfile.open do |f|
f.write('0123456789')
f.rewind
# Calling #ungetbyte as the first read buffer
# operation uses a buffer that is preserved during
# seek operations
f.ungetbyte(97)
# Byte buffer will not be cleared
f.seek(2, :SET)
f.getbyte # => 97
end
Tempfile.open do |f|
f.write('0123456789')
f.rewind
# Calling #getbyte before #ungetbyte uses a
# buffer that is not preserved when seeking
f.getbyte
f.ungetbyte(97)
# Byte buffer will be cleared
f.seek(2, :SET)
f.getbyte # => 50
end
```
Similar behavior happens when reading characters:
```ruby
require 'tempfile'
Tempfile.open do |f|
f.write('0123456789')
f.rewind
# Calling #ungetc as the first read buffer
# operation uses a buffer that is preserved during
# seek operations
f.ungetc('a')
# Character buffer will not be cleared
f.seek(2, :SET)
f.getc # => 'a'
end
Tempfile.open do |f|
f.write('0123456789')
f.rewind
# Calling #getc before #ungetc uses a
# buffer that is not preserved when seeking
f.getc
f.ungetc('a')
# Character buffer will be cleared
f.seek(2, :SET)
f.getc # => '2'
end
```
When transcoding, however, the character buffer is never cleared when seeking:
```ruby
require 'tempfile'
Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
f.write('0123456789')
f.rewind
f.ungetc('a'.encode('utf-16le'))
# Character buffer will not be cleared
f.seek(2, :SET)
f.getc # => 'a'.encode('utf-16le')
end
Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
f.write('0123456789')
f.rewind
f.getc
f.ungetc('a'.encode('utf-16le'))
# Character buffer will not be cleared
f.seek(2, :SET)
f.getc # => 'a'.encode('utf-16le')
end
```
I would expect the buffers to be cleared in all cases except possibly when the seek operation doesn't actually move the file pointer such as when calling IO#pos or IO#seek(0, :CUR). The inconsistent behavior demonstrated here is a problem regardless though.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/