From: duerst@...
Date: 2019-08-29T06:50:43+00:00
Subject: [ruby-core:94652] [Ruby master Bug#15908] Detecting BOM with	non-UTF encoding

Issue #15908 has been updated by duerst (Martin D�rst).

Status changed from Open to Closed

Depending on usage, distinction of UTF-8 (with/without BOM), UTF-16LE without BOM, UTF-16BE with or without BOM, and so on may also be necessary. Also, for Japanese, traditionally distinction between EUC-JP, Shift_JIS, and ISO-2022-JP can additionally be necessary.

For more complex cases, heuristics are needed. On the other hand, applications may not want to (or not be allowed to, as e.g. for the bootstrap phase of an XML parser) allow more than a well defined subset.

This kind of processing is therefore better left to applications.

I'm closing this issue to not leave it dangling, but please feel free to reopen if you disagree.

----------------------------------------
Bug #15908: Detecting BOM with non-UTF encoding
https://bugs.ruby-lang.org/issues/15908#change-81251

* Author: nobu (Nobuyoshi Nakada)
* Status: Closed
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: 
* Backport: 2.4: UNKNOWN, 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
Currently, "bom|" encoding prefix to `File.open` is ignored if the encoding name is not a UTF.
But one usage of BOM is to tell if the stream is a UTF or not, and especially common on Windows, e.g. UTF-16LE or OEMCP.
So I think this restriction should be removed.

---Files--------------------------------
0001-Enable-BOM-detection-with-non-UTF-encodings.patch (4.27 KB)


-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>