From: "nobu (Nobuyoshi Nakada)" Date: 2013-07-13T15:04:05+09:00 Subject: [ruby-core:55992] [ruby-trunk - Bug #8630][Rejected] Transcoding high-bit bytes from ASCII-8BIT to a text encoding should be :invalid, not :undef Issue #8630 has been updated by nobu (Nobuyoshi Nakada). Status changed from Open to Rejected ---------------------------------------- Bug #8630: Transcoding high-bit bytes from ASCII-8BIT to a text encoding should be :invalid, not :undef https://bugs.ruby-lang.org/issues/8630#change-40483 Author: headius (Charles Nutter) Status: Rejected Priority: Normal Assignee: Category: Target version: ruby -v: 2.0.0 Backport: 1.9.3: UNKNOWN, 2.0.0: UNKNOWN When transcoding from ASCII-8BIT (BINARY) to a text encoding (e.g. UTF-8), MRI will raise an error for high-bit bytes: "\xC3".encode("utf-8", "binary") # => Encoding::UndefinedConversionError This can be disabled by passing :undef => :replace as an option to the encode call. I believe that "undef" is the wrong treatment for this error. Undef means that the input character has no representation in the target encoding. In this case, the error is raised because only US-ASCII range of bytes are *valid* for transcoding, so the transcoding of high-bit bytes is by definition *invalid*, not undefined. In other words, high-bit bytes in ASCII-8BIT/BINARY are *invalid* as characters. The error raised should be InvalidByteSequenceError and it should be prevented by using :invalid => :replace option. -- http://bugs.ruby-lang.org/