From: geoff@... Date: 2014-12-10T00:55:39+00:00 Subject: [ruby-core:66761] [ruby-trunk - Bug #10584] [Open] String.valid_encoding?, String.ascii_only? fails to account for BOM. Issue #10584 has been reported by Geoff Nixon. ---------------------------------------- Bug #10584: String.valid_encoding?, String.ascii_only? fails to account for BOM. https://bugs.ruby-lang.org/issues/10584 * Author: Geoff Nixon * Status: Open * Priority: Normal * Assignee: * Category: core * Target version: current: 2.2.0 * ruby -v: ruby 2.2.0preview2 (2014-11-28 trunk 48628) [x86_64-darwin14] * Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN ---------------------------------------- IMO: - A Unicode (UTF-16, UTF-32) string with a valid BOM should not be considered a valid encoding if endianness is changed. - A UTF-8 string with BOM should not consider the BOM as a codepoint. ~~~sh > file utf-16be-file utf-16be-file: POSIX shell script, Big-endian UTF-16 Unicode text executable > file utf-16le-file utf-16le-file: POSIX shell script, Little-endian UTF-16 Unicode text executable > file utf-8-with-bom-file utf-8-with-bom-file: POSIX shell script, UTF-8 Unicode (with BOM) text executable ~~~ ~~~sh > ruby -e "p File.binread('utf-16le-file').force_encoding('UTF-16BE').valid_encoding?" true # false > ruby -e "p File.binread('utf-16be-file').force_encoding('UTF-16LE').valid_encoding?" true # false > ruby -e "p File.read('utf-8-with-bom-file').ascii_only?" false # true > ruby -e "p File.read('utf-8-with-bom-file')[0]" "" # '#' ~~~ No? ---Files-------------------------------- utf-8-with-bom-file (14 Bytes) utf-16be-file (2.45 KB) utf-16le-file (2.46 KB) -- https://bugs.ruby-lang.org/