From: merch-redmine@... Date: 2021-06-25T17:34:06+00:00 Subject: [ruby-dev:51070] [Ruby master Bug#12052] String#encode with xml option returns wrong result for totally non-ASCII-compatible encodings Issue #12052 has been updated by jeremyevans0 (Jeremy Evans). duerst (Martin Dürst) wrote in #note-2: > Sorry to @jeremyevans0, but I have to disagree. This is a bug. We can disagree about how important it is to fix this bug, but it's a bug nevertheless. First, xml: :text works correctly in other encodings even if the source and destination encodings match. > ```Ruby > " => "<q&" > ``` > > The bug is that we process UTF-16LE as if it consisted of 1-byte ASCII-based code units. I still have to identify exactly where and when that happens. Ah. So you are saying that `"<\0>\0".encode("utf-16le", "utf-16le", xml: :text)` needs to have the same result as: `"<\0>\0".encode("utf-8", "utf-16le", xml: :text).encode("utf-16le")`. I agree, that makes more sense and this is a bug. It looks like this issue occurs when using both multibyte source and destination encoding. If either the source or destination encoding is not multibyte, the issue doesn't occur: ```ruby # Multibyte source, single-byte destination "<\0>\0".encode("utf-8", "utf-16le", xml: :text).bytes => [38, 108, 116, 59, 38, 103, 116, 59] # Single-byte source, multibyte destination "<>".encode("utf-16le", "utf-8", xml: :text).bytes => [38, 0, 108, 0, 116, 0, 59, 0, 38, 0, 103, 0, 116, 0, 59, 0] # Multibyte source, multibyte destination "<\0>\0".encode("utf-16le", "utf-16le", xml: :text).bytes => [38, 108, 116, 59, 0, 38, 103, 116, 59, 0] ``` So a possible way to work around the issue until it can be properly fixed would be to detect the case where both source and destination are multibyte, switch the destination to UTF-8, then encode the result of that to the desired destination encoding. ---------------------------------------- Bug #12052: String#encode with xml option returns wrong result for totally non-ASCII-compatible encodings https://bugs.ruby-lang.org/issues/12052#change-92651 * Author: nobu (Nobuyoshi Nakada) * Status: Open * Priority: Normal * Assignee: akr (Akira Tanaka) * Backport: 2.0.0: REQUIRED, 2.1: REQUIRED, 2.2: REQUIRED, 2.3: REQUIRED ---------------------------------------- `String#encode`をASCII非互換エンコーディングから同じエンコーディングへ、`xml:`オプション付きで呼ぶとおかしな結果を返します。 バイナリとして変換してしまっているようです。 ```ruby p "<\0>\0".encode("utf-16le", "utf-16le", xml: :text) #=> "\u6C26\u3B74\u2600\u7467;" ``` -- https://bugs.ruby-lang.org/