From: "YO4 (Yoshinao Muramatsu) via ruby-core" Date: 2025-12-04T15:29:16+00:00 Subject: [ruby-core:124029] [Ruby Bug#21765] stop using the C runtime _read() on Windows Issue #21765 has been reported by YO4 (Yoshinao Muramatsu). ---------------------------------------- Bug #21765: stop using the C runtime _read() on Windows https://bugs.ruby-lang.org/issues/21765 * Author: YO4 (Yoshinao Muramatsu) * Status: Open * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- When creating an IO instance in Windows, the default data mode is text mode. In reality, the IO encoding conversion mechanism is not used when encoding conversion is not performed. Instead, the CRLF conversion provided by the C runtime's `_read()` is used. This is explicitly for speed. https://bugs.ruby-lang.org/issues/6401#note-4 As a trade-off, `SET_BINARY_MODE(fptr)` and `SET_BINARY_MODE_WITH_SEEK_CUR(fptr)` are used in various places within io.c, altering the state of the file descriptor. This made the flow of operations difficult to understand and changes hard to implement, especially for developers on other platforms. Additionally, the issues I recently reported were discovered while verifying the impact of modifying the CRLF conversion to utilize the encoding conversion mechanism. #21691 On Windows some of binary read functions of IO are not functional #21687 IO���pos goes wrong after EOF character(ctrl-z) met #21634 Combining read(1) with eof? causes dropout of results unexpectedly on Windows. These issues arise because data read into the rbuf does not match the stream due to newline conversion, or because the buffer end and file position do not align when CTRLZ is detected. As a fix for Bug #21687, I created PR #15216. However, this relies on the internal behavior of the C runtime's `_read()` function, and it seems there is no way to avoid this dependency. **I propose removing the use of C runtime _read().** Reason for Proposal - The mismatch between rbuf and stream contents complicates io_unread() and makes maintenance difficult. - Changing the O_BINARY/O_TEXT state of the file descriptor in various places hinders understanding of the behavior and makes modifications difficult. Two methods to remove C runtime _read() while maintaining current behavior 1. Interpret CRLF and CTRLZ when reading rbuf within io.c. 1. Interpret CRLF and CTRLZ within the encoding conversion framework. My initial idea was to implement the second, using encoding conversion. However, this internally changes the read operation from rbuf to cbuf, resulting in a change to the behavior of ungetc. The proposal in Bug #21682 attempted to generalize this change to minimize its impact. https://bugs.ruby-lang.org/issues/21682 **This issue proposes the first method, crlf conversion during rbuf read.** Problems caused by inconsistencies between the rbuf and stream contents are avoided, and io_unread() becomes the same as on other platforms. Compared to implementing it as an encoding conversion, the advantage is that there is no change in behavior. On the other hand, since each read method in io.c requires individual handling, using encoding conversion results in more localized changes. -- https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/