From: "Eregon (Benoit Daloze)" Date: 2022-02-08T10:15:11+00:00 Subject: [ruby-core:107518] [Ruby master Feature#18576] Rename `ASCII-8BIT` encoding to `BINARY` Issue #18576 has been updated by Eregon (Benoit Daloze). +1000 for this, I think ASCII-8BIT is always extremely confusing, and BINARY is much more revealing (= we don't know what the actual encoding is, or it might be binary data and not text). I've seen many Ruby users confused by this. I'm not sure why I never thought to propose it here TBH. I've literally never used the `Encoding::ASCII_8BIT` form in code (and rarely if ever seen it) but `Encoding::BINARY` many times. The property that bytes < 128 are interpreted as US-ASCII is nothing special, every `Encoding#ascii_compatible?` behaves like that. And almost all non-dummy Ruby encodings are `#ascii_compatible?`, the only two exceptions are UTF-16/32 (both LE/BE). Two things particularly confusing about the name ASCII-8BIT: * It's completely unclear it might mean binary data or unknown encoding * ISO-8859-* and many other encodings are 8-bit ascii-compatible encodings. Yet ASCII-8BIT which name seems to imply something close is nothing like that (the 8th bit is undefined, uninterpreted but valid). (FWIW JCodings, the Java library for Ruby encodings has ASCIIEncoding.INSTANCE for BINARY, that's even worse as it's even more confusing with US-ASCII, I've been thinking how to fix that in JCodings in a compatible way) ---------------------------------------- Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY` https://bugs.ruby-lang.org/issues/18576#change-96423 * Author: byroot (Jean Boussier) * Status: Open * Priority: Normal ---------------------------------------- ### Context I'm now used to it, but something that confused me for years was errors such as: ```ruby >> "f��e" + "\xFF".b (irb):3:in `+': incompatible character encodings: UTF-8 and ASCII-8BIT (Encoding::CompatibilityError) ``` When you aren't that familiar with Ruby, it's really not evident that `ASCII-8BIT` basically means "no encoding" or "binary". And even when you know it, if you don't read carefully it's very easily confused with `US-ASCII`. The `Encoding::BINARY` alias is much more telling IMHO. ### Proposal Since `Encoding::ASCII_8BIT` has been aliased as `Encoding::BINARY` for years, I think renaming it to `BINARY` and then making asking `ASCII_8BIT` the alias would significantly improve usability without backward compatibility concerns. The only concern I could see would be the consistency with a handful of C API functions: - `rb_encoding *rb_ascii8bit_encoding(void)` - `int rb_ascii8bit_encindex(void)` - `VALUE rb_io_ascii8bit_binmode(VALUE io)` But that's for much more advanced users, so I don't think it's much of a concern. -- https://bugs.ruby-lang.org/ Unsubscribe: