From: duerst@... Date: 2015-12-14T08:50:48+00:00 Subject: [ruby-core:72110] [Ruby trunk - Feature #11814] String#valid_encoding? without force_encoding Issue #11814 has been updated by Martin D��rst. Akinori MUSHA wrote: > Suppose you have a list of byte arrays which you don't know which encoding they are encoded in, like when you want to guess the encoding of the file names stored in a zip file. > > So, if you had String#valid_encoding?(enc) you could achieve it like this without modifying, copying or concatenating strings: > > ~~~ > POSSIBLE_ENCODINGS = [Encoding::UTF_8, Encoding::Windows_31J, Encoding::ISO_8859_1, Encoding::ASCII_8BIT] > > encoding = byte_arrays.inject(POSSIBLE_ENCODINGS) { |encs, b| > encs.select { |enc| b.valid_encoding?(enc) } > }.first > ~~~ A few comments on this program: - Encoding::ASCII_8BIT will pick up garbage. Encoding::US_ASCII is much better. - Encoding::ISO_8859_1 is always valid, for all bytes, so ASCII8BIT (or US-ASCII) never get used. - There are many more encodings, but distinguishing them is difficult/impossible with this method. ---------------------------------------- Feature #11814: String#valid_encoding? without force_encoding https://bugs.ruby-lang.org/issues/11814#change-55525 * Author: Usaku NAKAMURA * Status: Rejected * Priority: Normal * Assignee: ---------------------------------------- Now we have to set a encoding to a string to validate it, just like: ```ruby str.force_encoding('euc-jp').valid_encoding? # => true or false ``` But to modify the string is not so smart. knu-san requires the way to validate a string without modifiing it [*1]. Then, I propose to add an optional encoding parameter to `String#valid_encoding?`. ```ruby str.valid_encoding?('euc-jp') # => true or false ``` A patch is attached. [*1] https://twitter.com/knu/status/676009662655934465 (in Japanese) ---Files-------------------------------- valid_encoding.patch (4.4 KB) -- https://bugs.ruby-lang.org/