From: usa@...
Date: 2018-01-31T13:26:39+00:00
Subject: [ruby-core:85302] [Ruby trunk Bug#13950] String#tr incorrectly	marks strings as CR_7BIT

Issue #13950 has been updated by usa (Usaku NAKAMURA).

Backport changed from 2.3: REQUIRED, 2.4: DONE to 2.3: DONE, 2.4: DONE

ruby_2_3 r62137 merged revision(s) 60060.

----------------------------------------
Bug #13950: String#tr incorrectly marks strings as CR_7BIT
https://bugs.ruby-lang.org/issues/13950#change-70082

* Author: nirvdrum (Kevin Menard)
* Status: Closed
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.4.2p198 (2017-09-14 revision 59899) [x86_64-linux]
* Backport: 2.3: DONE, 2.4: DONE
----------------------------------------
String#tr has a curious bit of code attributable to r22547, dating back to Ruby 1.9.2. It seems to blindly change the calculated code range from `CR_VALID` to `CR_7BIT`:


From `tr_trans` in `string.c`:

```
if (cr == ENC_CODERANGE_VALID)
    cr = ENC_CODERANGE_7BIT;
```

The net result of this is strings that can't possibly be `CR_7BIT` simply by virtue of their encoding end up incorrectly be marked as `CR_7BIT`. For example:

```
s = "b".encode("utf-16le")
from = "a-z".encode("utf-16le")
to = "*".encode("utf-16le")
result = s.tr(from, to)

p to
p to.encoding
p to.bytes
p to.ascii_only?

puts

p result
p result.encoding
p result.bytes
p result.ascii_only?

puts
p Encoding::UTF_16LE.ascii_compatible?

```

That produces the following output:

```
"*"
#<Encoding:UTF-16LE>
[42, 0]
false

"*"
#<Encoding:UTF-16LE>
[42, 0]
true

false
```

In this case, the original `to` string is identical to the `result` string. They have the same encoding and the same bytes. However, the result is marked as `CR_7BIT` (indicated by the `String#ascii_only?` value). UTF-16LE is not ASCII-compatible and should never have strings that are `CR_7BIT`.



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>