From: nobu@... Date: 2017-11-23T12:42:02+00:00 Subject: [ruby-core:83869] [Ruby trunk Bug#14127] (CSV) generating UTF-16LE encoded file without BOM Issue #14127 has been updated by nobu (Nobuyoshi Nakada). laykou (Ladislav Gallay) wrote: > This file should contain BOM information so that it is properly detected as UTF-16LE file. > > How to generate such file: > > ~~~ruby > file = CSV.generate(encoding: 'UTF-16LE') do |csv| > csv << ['something', '������������������'] > end > ~~~ csv.rb seems having bugs in ASCII-incompatible encodings support. > According to `file -I file.csv` this file is recognized as `application/octet-stream; charset=binary` because it is missing the BOM information. > > According to Wikipedia https://en.wikipedia.org/wiki/UTF-16 it should contain "\xFF\xFE" on the beginning of the document so that everyone knows iths UTF-16LE. `CSV.generate` just builds a CSV string, doesn't create a file. Writing the result to a file with BOM is an application's responsibility. ```ruby CSV.open("utf16.csv", "w:UTF-16LE:utf-8") do |csv| csv.to_io.write "\uFEFF" csv << ['something', '������������������'] end ``` > Here is someone trying to fix this in the similiar way: https://stackoverflow.com/a/22950912/1632815 I did it: manually adding that BOM information. ```ruby new_html_file = File.open("foo.txt", "w:UTF-16LE") new_html_file << "\uFEFF" << some_text ``` ---------------------------------------- Bug #14127: (CSV) generating UTF-16LE encoded file without BOM https://bugs.ruby-lang.org/issues/14127#change-67906 * Author: laykou (Ladislav Gallay) * Status: Open * Priority: Normal * Assignee: * Target version: * ruby -v: 2.4.1 * Backport: 2.3: UNKNOWN, 2.4: UNKNOWN ---------------------------------------- This file should contain BOM information so that it is properly detected as UTF-16LE file. How to generate such file: ~~~ruby file = CSV.generate(encoding: 'UTF-16LE') do |csv| csv << ['something', '������������������'] end ~~~ According to `file -I file.csv` this file is recognized as `application/octet-stream; charset=binary` because it is missing the BOM information. According to Wikipedia https://en.wikipedia.org/wiki/UTF-16 it should contain "\xFF\xFE" on the beginning of the document so that everyone knows iths UTF-16LE. Here is someone trying to fix this in the similiar way: https://stackoverflow.com/a/22950912/1632815 I did it: manually adding that BOM information. ~~~ ruby ## Adds BOM, albeit in a somewhat hacky way. new_html_file = File.open(foo.txt, "w:UTF-8") new_html_file << "\xFF\xFE".force_encoding('utf-16le') + some_text.force_encoding('utf-8').encode('utf-16le') ~~~ -- https://bugs.ruby-lang.org/ Unsubscribe: