From: merch-redmine@... Date: 2021-06-24T23:50:31+00:00 Subject: [ruby-dev:51068] [Ruby master Bug#12052] String#encode with xml option returns wrong result Issue #12052 has been updated by jeremyevans0 (Jeremy Evans). Status changed from Assigned to Rejected After an extensive session with gdb, I've determined that this isn't an issue with `String#encode`, and it isn't a bug. `"<\0>\0".encode("utf-16le", "utf-16le", xml: :text)` returns the same string as `"<\0>\0".force_encoding("utf-16le")`. I think that's the correct behavior for `String#encode`, since you are specifying the source and destination encodings match. `"<\0>\0".force_encoding("utf-16le")` is the same string as `"\u6C26\u3B74\u2600\u7467;".encode("utf-16le")`. The 10 ASCII bytes are the same as the bytes for the 5 codepoints in UTF16-LE encoding. String#inspect processes the string, and formats each of the non-ASCII codepoints using the `\u` syntax, and the final codepoint (59) as a regular ASCII character. As an example: ```ruby "<\0>\0".encode("utf-16le", "utf-16le", xml: :text) == "<\0>\0".force_encoding("utf-16le") => true "<\0>\0".force_encoding("utf-16le").codepoints => [27686, 15220, 9728, 29799, 59] "<\0>\0".force_encoding("utf-16le").codepoints.map{|x| x >= 128 ? '-u%X'%x : x.chr}.join "-u6C26-u3B74-u2600-u7467;" ``` ---------------------------------------- Bug #12052: String#encode with xml option returns wrong result https://bugs.ruby-lang.org/issues/12052#change-92642 * Author: nobu (Nobuyoshi Nakada) * Status: Rejected * Priority: Normal * Assignee: akr (Akira Tanaka) * Backport: 2.0.0: REQUIRED, 2.1: REQUIRED, 2.2: REQUIRED, 2.3: REQUIRED ---------------------------------------- `String#encode`をASCII非互換エンコーディングから同じエンコーディングへ、`xml:`オプション付きで呼ぶとおかしな結果を返します。 バイナリとして変換してしまっているようです。 ```ruby p "<\0>\0".encode("utf-16le", "utf-16le", xml: :text) #=> "\u6C26\u3B74\u2600\u7467;" ``` -- https://bugs.ruby-lang.org/