From: nagachika00@... Date: 2017-03-27T17:34:15+00:00 Subject: [ruby-core:80406] [Ruby trunk Bug#13292] Invalid encodings in UTF-32 Issue #13292 has been updated by nagachika (Tomoyuki Chikanaga). Backport changed from 2.2: DONE, 2.3: REQUIRED, 2.4: DONE to 2.2: DONE, 2.3: DONE, 2.4: DONE ruby_2_3 r58183 merged revision(s) 57816,57817. ---------------------------------------- Bug #13292: Invalid encodings in UTF-32 https://bugs.ruby-lang.org/issues/13292#change-63899 * Author: rbjl (Jan Lelis) * Status: Closed * Priority: Normal * Assignee: * Target version: * ruby -v: ruby 2.4.0p0 (2016-12-24 revision 57164) [x86_64-linux] * Backport: 2.2: DONE, 2.3: DONE, 2.4: DONE ---------------------------------------- Ruby is very strict about valid UTF-8 encodings, which is great. Strings that encode surrogates or too large codepoints are not valid. However, in UTF-32, it is possible to encode such values, and Ruby treats them as valid: Example 1 (too large value) ``` a = [0, 0, 17, 0].pack("C*").force_encoding("UTF-32LE") #=> "\u{110000}" a.valid_encoding? # => true ``` Example 2 (surrogate) ``` b = [0, 216, 0, 0].pack("C*").force_encoding("UTF-32LE") # => "\uD800" b.valid_encoding? #=> true ``` The behaviour should be changed to `String#valid_encoding?` reporting `false` For reference: http://unicode.org/versions/Unicode9.0.0/UnicodeStandard-9.0.pdf (page 71) -- https://bugs.ruby-lang.org/ Unsubscribe: