ruby-core

Issue #18576 has been updated by naruse (Yui NARUSE).


The name `ASCII-8BIT` expresses how we deeply considered about what "binary" is. Ruby 1.9's encoding system is serial invents. Ruby invented some ideas: ASCII COMPATIBLE and ASCII-8BIT.

> Two things particularly confusing about the name ASCII-8BIT:
>
> * It's completely unclear it might mean binary data or unknown encoding
> * ISO-8859-* and many other encodings are 8-bit ascii-compatible encodings. Yet ASCII-8BIT which name seems to imply something close is nothing like that (the 8th bit is undefined, uninterpreted but valid).

Your two questions raises very good points. The answer for them is tightly coupled with the name `ASCII-8BIT`.

> * It's completely unclear it might mean binary data or unknown encoding

I want to ask you that how often you can actually distinguish them. Ruby's assumption is that developers cannot distinguish them in normal use cases. If so, Ruby may not provide two objects. If Ruby provide only one object for them, developers don't need clarify it.

> ISO-8859-* and many other encodings are 8-bit ascii-compatible encodings. Yet ASCII-8BIT which name seems to imply something close is nothing like that (the 8th bit is undefined, uninterpreted but valid).

This is very good question. Ruby's answer is "yes, ASCII-8BIT is similar to ISO-8859-*". As you say, an ASCII-8BIT string's 8-bit range is undefined. But Ruby doesn't matter that. In the real world such phenomenon is sometimes discovered.

For example the charset of HTTP Header is usually ISO-8859-1. Many languages struggled how to handle these octets. Python and .NET handles this as binary. It prevents to leverage powerful String methods to such binary data. Ruby handles it as ASCII-8BIT. Ruby's insight is binaries Ruby handles is usually such octets. The name `ASCII-8BIT` reflects such insight.

Therefore the conclusion for your question is that they are just what the real world is. The name just reflects that.


Anyway Rails programmers don't need such understanding usually. If renaming cares people who just hit the surface of this chaos, it might be worth considered, though changing encoding.name may hit the compatibility issue.

----------------------------------------
Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`
https://bugs.ruby-lang.org/issues/18576#change-96438

* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
----------------------------------------
### Context

I'm now used to it, but something that confused me for years was errors such as:

```ruby
>> "f辿e" + "\xFF".b
(irb):3:in `+': incompatible character encodings: UTF-8 and ASCII-8BIT (Encoding::CompatibilityError)
```

When you aren't that familiar with Ruby, it's really not evident that `ASCII-8BIT` basically means "no encoding" or "binary".

And even when you know it, if you don't read carefully it's very easily confused with `US-ASCII`.

The `Encoding::BINARY` alias is much more telling IMHO.

### Proposal

Since `Encoding::ASCII_8BIT` has been aliased as `Encoding::BINARY` for years, I think renaming it to `BINARY` and then making asking `ASCII_8BIT` the alias would significantly improve usability without backward compatibility concerns.

The only concern I could see would be the consistency with a handful of C API functions:

  - `rb_encoding *rb_ascii8bit_encoding(void)`
  - `int rb_ascii8bit_encindex(void)`
  - `VALUE rb_io_ascii8bit_binmode(VALUE io)`

But that's for much more advanced users, so I don't think it's much of a concern.




-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

Thread

Prev Next

In This Thread

Prev Next