From: duerst@... Date: 2015-04-24T10:11:36+00:00 Subject: [ruby-core:68982] [Ruby trunk - Feature #11094] [Open] Remove traces of 6-byte UTF-8 Issue #11094 has been reported by Martin D��rst. ---------------------------------------- Feature #11094: Remove traces of 6-byte UTF-8 https://bugs.ruby-lang.org/issues/11094 * Author: Martin D��rst * Status: Open * Priority: Normal * Assignee: ---------------------------------------- UTF-8 was originally defined with a codespace up to 31 bits, and therefore with up to 6 bytes per character. Since quite a few years ago, it has been reduced in all the relevant definitions (ISO, Unicode, IETF) to a codespace up to 0x10FFFF and a maximum of 4 bytes per character. Many places in the Ruby code base are updated to this 4 byte limit (e.g. EncLen_UTF8 in enc/utf_8.c). But other places in the Ruby code base are not yet updated to this limit (e.g. code_to_mbclen in enc/utf_8.c). This should be fixed. [I have classified this as a feature because I wasn't able to find a way to expose this problem in Ruby code, but this should be reclassified as a bug if such a problem can be found.] -- https://bugs.ruby-lang.org/