From: merch-redmine@... Date: 2021-02-25T23:54:06+00:00 Subject: [ruby-core:102611] [Ruby master Bug#16842] `inspect` prints the UTF-8 character U+0085 (NEXT LINE) verbatim even though it is not printable Issue #16842 has been updated by jeremyevans0 (Jeremy Evans). Assignee set to duerst (Martin D�rst) Status changed from Open to Assigned Behavior here seems to be dependent on the encoding: ``` $ LC_ALL=C ruby -e "p 0x85.chr(Encoding::UTF_8).inspect.b" "\"\\u0085\"" $ LC_ALL=en_US.UTF-8 ruby -e "p 0x85.chr(Encoding::UTF_8).inspect.b" "\"\xC2\x85\"" ``` I've submitted a pull request to fix the behavior, though the implementation is rather crude: https://github.com/ruby/ruby/pull/4229 @duerst Is there a better fix by handling the unicode properties differently? ---------------------------------------- Bug #16842: `inspect` prints the UTF-8 character U+0085 (NEXT LINE) verbatim even though it is not printable https://bugs.ruby-lang.org/issues/16842#change-90598 * Author: sawa (Tsuyoshi Sawada) * Status: Assigned * Priority: Normal * Assignee: duerst (Martin D�rst) * ruby -v: ruby 2.8.0dev (2020-05-09T13:24:57Z master 889b0fe46f) [x86_64-linux] * Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN ---------------------------------------- The UTF-8 character U+0085 (NEXT LINE) is not printable, but `inspect` prints the character verbatim (within double quotation): ```ruby 0x85.chr(Encoding::UTF_8).match?(/\p{print}/) # => false 0x85.chr(Encoding::UTF_8).inspect #=> "\" \"" ``` My understanding is that non-printable characters are not printed verbatim with `inspect`: ```ruby "\n".match?(/\p{print}/) # => false "\n".inspect #=> "\"\\n\"" ``` while printable characters are: ```ruby "a".match?(/\p{print}/) # => true "a".inspect # => "\"a\"" ``` I ran the following script, and found that U+0085 is the only character within the range U+0000 to U+FFFF that behaves like this. ```ruby def verbatim?(char) !char.inspect.start_with?(%r{\"\\[a-z]}) end def printable?(char) char.match?(/\p{print}/) end (0x0000..0xffff).each do |i| begin char = i.chr(Encoding::UTF_8) rescue RangeError next end puts '%#x' % i unless verbatim?(char) == printable?(char) end ``` -- https://bugs.ruby-lang.org/ Unsubscribe: