From: "mame (Yusuke Endoh)" Date: 2022-06-09T06:51:50+00:00 Subject: [ruby-core:108816] [Ruby master Bug#16143] BOM UTF-8 is not removed after rewind Issue #16143 has been updated by mame (Yusuke Endoh). Status changed from Open to Feedback I think this issue can be easily worked around by using `IO#set_encoding_by_bom` which was introduced by #15210. ``` csv = CSV.open('bom_test.csv', 'r:BOM|UTF-8', headers: true) p csv.shift #=> # # workaround csv.rewind csv.binmode csv.to_io.set_encoding_by_bom p csv.shift #=> # ``` Do we really need any change? It is surprising to me if `IO#pos` is non-zero after `IO#rewind`. `IO#rewind(bom: true)` or something, which @akr proposes, may be less surprising. But IMHO, `IO#set_encoding_by_bom` is enough. ---------------------------------------- Bug #16143: BOM UTF-8 is not removed after rewind https://bugs.ruby-lang.org/issues/16143#change-97893 * Author: Dirk (Dirk Meier-Eickhoff) * Status: Feedback * Priority: Normal * ruby -v: ruby 2.6.2p47 (2019-03-13 revision 67232) [x86_64-darwin17] * Backport: 2.5: UNKNOWN, 2.6: UNKNOWN ---------------------------------------- I have a CSV file with "forced quotes" and UTF-8 BOM (\xEF\xBB\xBF) which CSV can not read after a `rewind`. I get "CSV::MalformedCSVError: Illegal quoting in line 1." My UTF-8 CSV file with BOM: ``` ruby File.open('bom_test.csv', 'w') do |io| io.write("\xEF\xBB\xBF\"Name\",\"City\"\n\"John Doe\",\"New York\"") end ``` Reproduce error: ``` ruby # Case 1 csv = CSV.open('bom_test.csv', 'r:BOM|UTF-8', {headers: true}) csv.shift # => # csv.rewind csv.shift # => CSV::MalformedCSVError (Illegal quoting in line 1.) # Case 2 csv = CSV.open('bom_test.csv', 'r:BOM|UTF-8', {headers: true}) csv.readline # => # csv.rewind csv.readline # => CSV::MalformedCSVError (Illegal quoting in line 1.) ``` Sutou Kouhei has posted other reproducable code to my first issue at CSV gem: https://github.com/ruby/csv/issues/103 ``` ruby File.open("/tmp/a.txt", "w") do |x| x.puts("\xEF\xBB\xBFa,b,c") end File.open("/tmp/a.txt", "r:BOM|UTF-8") do |x| p x.gets.unpack("U*") # => [97, 44, 98, 44, 99, 10] x.rewind p x.gets.unpack("U*") # => [65279, 97, 44, 98, 44, 99, 10] end ``` He said: "This [CSV] library rely on Ruby's BOM processing. It seems that Ruby's BOM processing doesn't support rewind." My expectation is that reading a file with BOM always return the same content, regardless of first reading or after a rewind. -- https://bugs.ruby-lang.org/ Unsubscribe: