From: usa@... Date: 2018-01-31T13:26:39+00:00 Subject: [ruby-core:85302] [Ruby trunk Bug#13950] String#tr incorrectly marks strings as CR_7BIT Issue #13950 has been updated by usa (Usaku NAKAMURA). Backport changed from 2.3: REQUIRED, 2.4: DONE to 2.3: DONE, 2.4: DONE ruby_2_3 r62137 merged revision(s) 60060. ---------------------------------------- Bug #13950: String#tr incorrectly marks strings as CR_7BIT https://bugs.ruby-lang.org/issues/13950#change-70082 * Author: nirvdrum (Kevin Menard) * Status: Closed * Priority: Normal * Assignee: * Target version: * ruby -v: ruby 2.4.2p198 (2017-09-14 revision 59899) [x86_64-linux] * Backport: 2.3: DONE, 2.4: DONE ---------------------------------------- String#tr has a curious bit of code attributable to r22547, dating back to Ruby 1.9.2. It seems to blindly change the calculated code range from `CR_VALID` to `CR_7BIT`: From `tr_trans` in `string.c`: ``` if (cr == ENC_CODERANGE_VALID) cr = ENC_CODERANGE_7BIT; ``` The net result of this is strings that can't possibly be `CR_7BIT` simply by virtue of their encoding end up incorrectly be marked as `CR_7BIT`. For example: ``` s = "b".encode("utf-16le") from = "a-z".encode("utf-16le") to = "*".encode("utf-16le") result = s.tr(from, to) p to p to.encoding p to.bytes p to.ascii_only? puts p result p result.encoding p result.bytes p result.ascii_only? puts p Encoding::UTF_16LE.ascii_compatible? ``` That produces the following output: ``` "*" #<Encoding:UTF-16LE> [42, 0] false "*" #<Encoding:UTF-16LE> [42, 0] true false ``` In this case, the original `to` string is identical to the `result` string. They have the same encoding and the same bytes. However, the result is marked as `CR_7BIT` (indicated by the `String#ascii_only?` value). UTF-16LE is not ASCII-compatible and should never have strings that are `CR_7BIT`. -- https://bugs.ruby-lang.org/ Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe> <http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>