From: shyouhei@... Date: 2017-02-16T01:41:09+00:00 Subject: [ruby-core:79547] [Ruby trunk Bug#13216] Possible unexpected behaviour reading string starting with a byte order mark Issue #13216 has been updated by Shyouhei Urabe. Description updated Hello. Gabriel Giordano wrote: > $ echo -n -e '\xEF\xBB\xBFid' | ruby -e 'puts STDIN.read.bytes' > 239 > 187 > 191 > 105 > 100 > > $ echo -n -e 'id' | ruby -e 'puts STDIN.read.bytes' > 105 > 100 These two are as expected, aren't they? > $ echo -n -e '\xEF\xBB\xBFid' | ruby -e 'puts STDIN.read.to_sym' > id I think it's the `puts` method that eats the BOM. ``` % echo -n -e '\xEF\xBB\xBFid' | ruby -e 'puts STDIN.read.to_sym.to_s.dump' "\uFEFFid" ``` This symbol actually includes U+FEFF, which is normally invisible in the middle of a string. > $ echo -n -e 'id' | ruby -e 'puts STDIN.read.to_sym' > id This is OK I believe. > $ echo -n -e '\xEF\xBB\xBFid' | ruby -e 'puts STDIN.read.to_sym == :id' > false Given the symbol generated by reading stdin does contain U+FEFF, this is natural. > $ echo -n -e 'id' | ruby -e 'puts STDIN.read.to_sym == :id' > true No problem here. > $ echo -n -e '\xEF\xBB\xBFid' | ruby -e 'puts STDIN.read.bytes.pack("U")' > �� This IS weird. Smells like a bug to me. ---- So all but the last one are working well (at least seems to me). The last one needs more inspection. ---------------------------------------- Bug #13216: Possible unexpected behaviour reading string starting with a byte order mark https://bugs.ruby-lang.org/issues/13216#change-62987 * Author: Gabriel Giordano * Status: Open * Priority: Normal * Assignee: * Target version: * ruby -v: ruby 2.4.0p0 (2016-12-24 revision 57164) [x86_64-linux] * Backport: 2.2: UNKNOWN, 2.3: UNKNOWN, 2.4: UNKNOWN ---------------------------------------- Maybe the comparison between symbols has an unexpected behaviour. Tested with ruby 2.4.0 ``` $ echo -n -e '\xEF\xBB\xBFid' | ruby -e 'puts STDIN.read.bytes' 239 187 191 105 100 $ echo -n -e 'id' | ruby -e 'puts STDIN.read.bytes' 105 100 $ echo -n -e '\xEF\xBB\xBFid' | ruby -e 'puts STDIN.read.to_sym' id $ echo -n -e 'id' | ruby -e 'puts STDIN.read.to_sym' id $ echo -n -e '\xEF\xBB\xBFid' | ruby -e 'puts STDIN.read.to_sym == :id' false $ echo -n -e 'id' | ruby -e 'puts STDIN.read.to_sym == :id' true $ echo -n -e '\xEF\xBB\xBFid' | ruby -e 'puts STDIN.read.bytes.pack("U")' �� -- https://bugs.ruby-lang.org/ Unsubscribe: