From: ruby@...
Date: 2017-03-03T01:32:42+00:00
Subject: [ruby-core:79884] [Ruby trunk Feature#13166] Feature Request: Byte	Arrays for Ruby 3

Issue #13166 has been updated by Kevin Menard.


I'm in favor of a separate byte type as well. I think it conveys intent much more clearly, is easier to reason about, is easier to optimize, and is less error-prone.

While an ASCII-8BIT string can do the work, it leads to two use cases for Strings that may be at odds with each other (e.g., code ranges don't really mean anything for binary data). It also requires extra diligence to get the desired outcome. It's very easy to look like you're doing what you want, but get different outcomes. E.g., assuming a default encoding of UTF-8:

```ruby
a = String.new
a << 0xff
a.encoding # => #<Encoding:ASCII-8BIT>
a.bytes # => [255]

b = ""
b << 0xff
b.encoding # => #<Encoding:UTF-8>
b.bytes # => [195, 191]

c = ""
c.encoding # => #<Encoding:UTF-8>
c << a
c.encoding # => #<Encoding:ASCII-8BIT>
c.bytes # => [255]
```

This may seem a bit contrived, but it's very easy to think you're doing one thing and actually be doing something else when working with Ruby strings if you're not paying careful attention to the encodings. String encoding negotiation might help fix common problems, but it could also silently change the encoding on you without your realization. Unfortunately, I've seen a fair bit of code that resorts to String#force_encoding to fix that problem. Now, if you're not careful you have a CR_BROKEN string and String operations aren't very well-defined on CR_BROKEN strings.

All of these issues are avoidable, but it requires a lot of forethought and, to some extent, familiarity with the way String is implemented to ensure you're maintaining the integrity of your binary data and achieving your performance objectives. I can appreciate that using String for two purposes like this looks attractive in Ruby because they're both backed by byte arrays in C. However, this is a situation where I think splitting the two use cases into different classes leads to more user-friendly code. As an added benefit, I believe a dedicated byte array type would be easier to optimize when it doesn't need to be concerned with adhering to the String API as well.

----------------------------------------
Feature #13166: Feature Request: Byte Arrays for Ruby 3
https://bugs.ruby-lang.org/issues/13166#change-63314

* Author: Jabari Zakiya
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I do a lot of numerically intensive applications.
In many instances I use arrays that contain boolean data (true|false or 1|0) values.

When I create such an array like:

`data = Array.new(size, value)` or just  `data = Array.new(size)`

is it correct that the default memory unit size is that of the cpu, i.e. (32|64)-bit?

Since almost all modern cpus are byte addressable, I want to optimally use their system memory 
by being able to explicitly create arrays of byte addressable elements.

For these use cases, this wlll allow my apps to extend their memory use capacity, instead
of wasting 31|63 bit of memory on 32|64 bit cpus systems just to store a boolean value.

To be clear, I am not talking about storing "strings" or "chars" but addessable 8-bit number elements.

I have not seen this capability documented in Ruby, thus I request this feature be added to
Ruby 3, and propose the following syntax that will be backwards compatible (non conflicting).

```ruby
data = Array8.new(size, value)
```

Having explicit addressable byte arrays not only will increase memory use compactness of many
applications, this compactness will directly contribute to the Ruby 3x3 goal for performance
by allowing more data to be held entirely in cache memory when possible.

Thanks in advance for its consideratoin.


-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>