From: mail@... Date: 2017-03-08T13:14:17+00:00 Subject: [ruby-core:79966] [Ruby trunk Bug#13292] Invalid encodings in UTF-32 Issue #13292 has been reported by Jan Lelis. ---------------------------------------- Bug #13292: Invalid encodings in UTF-32 https://bugs.ruby-lang.org/issues/13292 * Author: Jan Lelis * Status: Open * Priority: Normal * Assignee: * Target version: * ruby -v: ruby 2.4.0p0 (2016-12-24 revision 57164) [x86_64-linux] * Backport: 2.2: UNKNOWN, 2.3: UNKNOWN, 2.4: UNKNOWN ---------------------------------------- Ruby is very strict about valid UTF-8 encodings, which is great. Strings that encode surrogates or too large codepoints are not valid. However, in UTF-32, it is possible to encode such values, and Ruby treats them as valid: Example 1 (too large value) ``` a = [0, 0, 17, 0].pack("C*").force_encoding("UTF-32LE") #=> "\u{110000}" a.valid_encoding? # => true ``` Example 2 (surrogate) ``` b = [0, 216, 0, 0].pack("C*").force_encoding("UTF-32LE") # => "\uD800" b.valid_encoding? #=> true ``` The behaviour should be changed to `String#valid_encoding?` reporting `false` For reference: http://unicode.org/versions/Unicode9.0.0/UnicodeStandard-9.0.pdf (page 71) -- https://bugs.ruby-lang.org/ Unsubscribe: