From: "YO4 (Yoshinao Muramatsu) via ruby-core" Date: 2024-12-26T10:37:40+00:00 Subject: [ruby-core:120417] [Ruby master Bug#20526] File.open(encoding: "bom|utf-8") converts "\r\n" to "\n" on Windows Issue #20526 has been updated by YO4 (Yoshinao Muramatsu). There are similar strangeness around an encoding specifiers. preparations ```ruby RUBY_VERSION # => "3.3.5" File.write("a.txt", "a\r\n") File.binread("a.txt").bytes # => [97, 13, 13, 10] ``` experimentations ```ruby File.open("a.txt") {|f| f.read.bytes} # => [97, 13, 10] # expected(msvcrt[_*] newline) File.open("a.txt", "r:utf-8") {|f| f.read.bytes} # => [97, 13, 10] # expected File.open("a.txt", "r", encoding: "utf-8") {|f| f.read.bytes} # => [97, 13, 10] # expected File.open("a.txt", encoding: "utf-8") {|f| f.read.bytes} # => [97, 10, 10] # XXX: universal newline enabled? ``` The omission of the mode parameter seems to enable universal newline. ```ruby File.open("a.txt", "rt:utf-8") {|f| f.read.bytes} # => [97, 10, 10] # expected(universal newline) File.open("a.txt", "rt:bom|utf-8") {|f| f.read.bytes} # => [97, 10] # XXX File.open("a.txt", "rt", encoding: "utf-8") {|f| f.read.bytes} # => [97, 10, 10] # expected(universal newline) File.open("a.txt", "rt", encoding: "bom|utf-8") {|f| f.read.bytes} # => [97, 10] # XXX ``` XXX: This is odd because universal newline and msvcrt newline appear to be cooperating. ---------------------------------------- Bug #20526: File.open(encoding: "bom|utf-8") converts "\r\n" to "\n" on Windows https://bugs.ruby-lang.org/issues/20526#change-111198 * Author: kou (Kouhei Sutou) * Status: Open * ruby -v: ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x64-mingw-ucrt] * Backport: 3.1: REQUIRED, 3.2: REQUIRED, 3.3: REQUIRED ---------------------------------------- I'm not sure whether this is an intentional behavior or not but it seems that `encoding: "utf-8"` doesn't change newline conversion but `encoding: "bom|utf-8"` changes newline conversion: ```ruby File.write("a.txt", "a\r\n") File.read("a.txt").bytes # => [97, 13, 10] File.open("a.txt", encoding: "utf-8") {|f| f.read.bytes} # => [97, 10, 10] File.open("a.txt", encoding: "bom|utf-8") {|f| f.read.bytes} # => [97, 10] XXX: \r\n -> \n File.open("a.txt", encoding: "bom|utf-8", universal_newline: false) {|f| f.read.bytes} # => [97, 13, 10] ``` Note that the `XXX: ` line in the above codes. Is this an intentional behavior? -- https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/