[#108771] [Ruby master Bug#18816] Ractor segfaulting MacOS 12.4 (aarch64 / M1 processor) — "brodock (Gabriel Mazetto)" <noreply@...>

Issue #18816 has been reported by brodock (Gabriel Mazetto).

8 messages 2022/06/05

[#108802] [Ruby master Feature#18821] Expose Pattern Matching interfaces in core classes — "baweaver (Brandon Weaver)" <noreply@...>

Issue #18821 has been reported by baweaver (Brandon Weaver).

9 messages 2022/06/08

[#108822] [Ruby master Feature#18822] Ruby lack a proper method to percent-encode strings for URIs (RFC 3986) — "byroot (Jean Boussier)" <noreply@...>

Issue #18822 has been reported by byroot (Jean Boussier).

18 messages 2022/06/09

[#108937] [Ruby master Bug#18832] Suspicious superclass mismatch — "fxn (Xavier Noria)" <noreply@...>

Issue #18832 has been reported by fxn (Xavier Noria).

16 messages 2022/06/15

[#108976] [Ruby master Misc#18836] DevMeeting-2022-07-21 — "mame (Yusuke Endoh)" <noreply@...>

Issue #18836 has been reported by mame (Yusuke Endoh).

12 messages 2022/06/17

[#109043] [Ruby master Bug#18876] OpenSSL is not available with `--with-openssl-dir` — "Gloomy_meng (Gloomy Meng)" <noreply@...>

Issue #18876 has been reported by Gloomy_meng (Gloomy Meng).

18 messages 2022/06/23

[#109052] [Ruby master Bug#18878] parse.y: Foo::Bar {} is inconsistently rejected — "qnighy (Masaki Hara)" <noreply@...>

Issue #18878 has been reported by qnighy (Masaki Hara).

9 messages 2022/06/26

[#109055] [Ruby master Bug#18881] IO#read_nonblock raises IOError when called following buffered character IO — "javanthropus (Jeremy Bopp)" <noreply@...>

Issue #18881 has been reported by javanthropus (Jeremy Bopp).

9 messages 2022/06/26

[#109063] [Ruby master Bug#18882] File.read cuts off a text file with special characters when reading it on MS Windows — magynhard <noreply@...>

Issue #18882 has been reported by magynhard (Matth辰us Johannes Beyrle).

15 messages 2022/06/27

[#109081] [Ruby master Feature#18885] Long lived fork advisory API (potential Copy on Write optimizations) — "byroot (Jean Boussier)" <noreply@...>

Issue #18885 has been reported by byroot (Jean Boussier).

23 messages 2022/06/28

[#109083] [Ruby master Bug#18886] Struct aref and aset don't trigger any tracepoints. — "ioquatix (Samuel Williams)" <noreply@...>

Issue #18886 has been reported by ioquatix (Samuel Williams).

8 messages 2022/06/29

[#109095] [Ruby master Misc#18888] Migrate ruby-lang.org mail services to Google Domains and Google Workspace — "shugo (Shugo Maeda)" <noreply@...>

Issue #18888 has been reported by shugo (Shugo Maeda).

16 messages 2022/06/30

[ruby-core:108944] [Ruby master Bug#18833] Documentation for IO#gets is inaccurate (bytes versus characters)

From: "adh1003 (Andrew Hodgkinson)" <noreply@...>
Date: 2022-06-16 04:02:43 UTC
List: ruby-core #108944
Issue #18833 has been updated by adh1003 (Andrew Hodgkinson).


For avoidance of doubt, the behaviour of Ruby itself is (IMHO) sensible and working well. The only change needed is to alter the word "bytes" to "characters" for the `IO#gets` description of the `limit` parameter.

----------------------------------------
Bug #18833: Documentation for IO#gets is inaccurate (bytes versus characters)
https://bugs.ruby-lang.org/issues/18833#change-98039

* Author: adh1003 (Andrew Hodgkinson)
* Status: Open
* Priority: Normal
* ruby -v: N/A
* Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
Please see https://ruby-doc.org/core-3.1.2/IO.html#method-i-gets:

> With integer argument `limit` given, returns up to `limit+1` bytes:

In relation to https://github.com/janko/down/pull/74, I discovered that while `IO#read` ignores the stream's specified encoding if asked to read a specific number of bytes and does then do exactly that - reads the requested number of 8-bit bytes - `IO#gets` respects the encoding if given a `limit` and the **number provided is characters, not bytes**. This means that not only might more actual bytes be read from the file (advancing its file pointer accordingly) due to things like a BOM, more bytes might also be read for multi-byte encoding. Moreover, the number of bytes in the returned data can exceed the number passed to the method (because it's a number of characters, contrary to the documentation) and it won't necessarily include some bytes from the very start of the file (a UTF-8 BOM is stripped, for example). `IO#gets` *does* correctly handle a multibyte character being split at the limit of the requested read position if taken as bytes and continues reading more bytes until it has read the requested number of complete characters.

(It is in fact clearly unavoidable that it works in an encoding-aware fashion, else it would be unable to accurately interpret the `sep` parameter. Coercing everything down to a pure 8-bit byte stream and trying to dumb-match the stream that way would risk mismatching a separator byte stream within the wider file byte stream at a non-character boundary).

This is causing confusion for people implementing IO subclasses or IO-like classes and I'm sure you recognise that it is of critical importance that the distinction between bytes and characters is made accurately, especially in such a crucial low-level piece of documentation as IO.

If you wish, I can have a go at figuring out a PR for it (not really done that ouside of GitHub before, so something of a learning curve!).





-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

In This Thread