From: naruse@... Date: 2021-02-26T05:43:37+00:00 Subject: [ruby-core:102613] [Ruby master Bug#16842] `inspect` prints the UTF-8 character U+0085 (NEXT LINE) verbatim even though it is not printable Issue #16842 has been updated by naruse (Yui NARUSE). Why U+0085 is categorized as `Print` in Ruby is historically Oniguruma treats as that. https://moriyoshi.hatenablog.com/entry/20090307/1236410006 I'm neutral about the change, but I want the change should have detailed comment or link to this ticket. ---------------------------------------- Bug #16842: `inspect` prints the UTF-8 character U+0085 (NEXT LINE) verbatim even though it is not printable https://bugs.ruby-lang.org/issues/16842#change-90601 * Author: sawa (Tsuyoshi Sawada) * Status: Assigned * Priority: Normal * Assignee: duerst (Martin D�rst) * ruby -v: ruby 2.8.0dev (2020-05-09T13:24:57Z master 889b0fe46f) [x86_64-linux] * Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN ---------------------------------------- The UTF-8 character U+0085 (NEXT LINE) is not printable, but `inspect` prints the character verbatim (within double quotation): ```ruby 0x85.chr(Encoding::UTF_8).match?(/\p{print}/) # => false 0x85.chr(Encoding::UTF_8).inspect #=> "\" \"" ``` My understanding is that non-printable characters are not printed verbatim with `inspect`: ```ruby "\n".match?(/\p{print}/) # => false "\n".inspect #=> "\"\\n\"" ``` while printable characters are: ```ruby "a".match?(/\p{print}/) # => true "a".inspect # => "\"a\"" ``` I ran the following script, and found that U+0085 is the only character within the range U+0000 to U+FFFF that behaves like this. ```ruby def verbatim?(char) !char.inspect.start_with?(%r{\"\\[a-z]}) end def printable?(char) char.match?(/\p{print}/) end (0x0000..0xffff).each do |i| begin char = i.chr(Encoding::UTF_8) rescue RangeError next end puts '%#x' % i unless verbatim?(char) == printable?(char) end ``` -- https://bugs.ruby-lang.org/ Unsubscribe: