From: csaba@... Date: 2021-08-09T23:20:30+00:00 Subject: [ruby-core:104856] [Ruby master Bug#18074] ARGF.read(length) exhibits short rea Issue #18074 has been updated by csabahenk2 (Csaba Henk). [`argf_read`](https://github.com/ruby/ruby/blob/v3_0_2/io.c#L12307-L12351) pseudocodized: ``` c /* 1 */ argf_read(VALUE *length) /* 2 */ { /* 3 */ len = NUM2LONG(length); /* 4 */ str = StringValue(); /* 5 */ retry: /* 6 */ tmp = io_read(length, ARGF.current_file); /* 7 */ if (tmp != NIL_P) /* 8 */ rb_str_append(str, tmp); /* 9 */ if (tmp == NIL_P) { /* 10 */ ARGF.close(); /* 11 */ ARGF.next(); /* 12 */ goto retry; /* 13 */ } else { /* 14 */ slen = RSTRING_LEN(str); /* 15 */ if (slen < len) { /* 16 */ len -= slen; /* 17 */ length = LONG2NUM(len); /* 18 */ goto retry; /* 19 */ } /* 20 */ } /* 21 */ return str; /* 22 */ } ``` The bug resides in line 16, when `len` is reduced. The joint effect of lines 16-17 that both `len` and `length` are reduced by the length of the string being assembled (`str`). Here `len` is a C long that represents the required total read length (according to the role it fulfils in the conditional guard expression of line 15), and `length` is a Ruby numeric value representing the remaining length (after the read attempts, that occur in line 6, that took place up to that point). So modifying `len` is incorrect. Why do we need three files for the bug to manifest? - The bug is triggered in the conditional guard of line 15, with certain corrupt values of `len`. - That requires two passings through line 15, as initial `len` is not yet corrupt, it only becomes such in the next line (line 16). Assuming `ARGC` contains three files, and `argf_read` is invoked with a `length` that exceeds the combined length of the first two files, the control flow goes as follows: - The first I/O activity is reading in the first file, upon hitting line 6. From this the code flow passes on to line 15, where the condition shall be fulfilled, as we read less than the required length, so we get to line 16, where the corruption occurs. - We are still at the first file, I/O is attempted on it again, but as we reached its end already, `io_read()` returns nil (stored in `tmp`). - A nil `tmp` lands us in the block of lines 9-13, where we step to the second file. - This also means avoiding the critical line 15. - We move on to do I/O on the second file, reading it in, too. - As we still have data to read (residing in the third file), we should pass into the code block between lines 15-19 from where we would go to doing I/O again. - However, the conditional guard of line 15, that could give us pass into this code block, can bogusly fail due to the corrupt, incorrectly reduced `len`. The exact condition for `length` to trigger the bug: if the three files are of size _s1_, _s2_, _s3_, then, as said before: - _s1_, _s2_, _s3_ > 0 - _s1_ + _s2_ < `length` and also: - 2*_s1_ + _s2_ > `length` Checking these criteria is left to the reader. ---------------------------------------- Bug #18074: ARGF.read(length) exhibits short rea https://bugs.ruby-lang.org/issues/18074#change-93204 * Author: csabahenk2 (Csaba Henk) * Status: Open * Priority: Normal * ruby -v: ruby 3.1.0dev (2021-08-04T09:29:42Z master 6e55facdb3) [x86_64-linux] * Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN ---------------------------------------- If ruby is invoked with three file name arguments (which refer to extant non-empty regular files), and `ARFG.read` is called with a length argument that exceeds the combined size of the first two files[1], then only the content of the first two files will be collected in the resultant string. ``` shell $ for f in a b c; do echo -n $f > $f; done $ ruby -e 'p ARGF.read(3)' a b c "ab" ``` [1]: This is actually just a necessary condition, the exact criteria will be provided in followup analysis. -- https://bugs.ruby-lang.org/ Unsubscribe: