From: mail@... Date: 2018-01-16T07:52:29+00:00 Subject: [ruby-core:84887] [Ruby trunk Bug#14363] each_grapheme_cluster.size returns the wrong size Issue #14363 has been reported by sos4nt (Stefan Sch����ler). ---------------------------------------- Bug #14363: each_grapheme_cluster.size returns the wrong size https://bugs.ruby-lang.org/issues/14363 * Author: sos4nt (Stefan Sch����ler) * Status: Open * Priority: Normal * Assignee: * Target version: * ruby -v: ruby 2.5.0p0 (2017-12-25 revision 61468) [x86_64-darwin15] * Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN ---------------------------------------- Ruby 2.5 adds `String#each_grapheme_cluster` to enumerate the string's grapheme clusters: ```ruby str = "a\u0300i\u0301" #=> "a��i��" str.each_grapheme_cluster.to_a #=> ["a��", "i��"] ``` Unfortunately, the enumerator's `size` doesn't work as expected: ```ruby str.each_grapheme_cluster.size #=> 4 ``` The source code reveals that it invokes `rb_str_each_char_size`, so it is equivalent to `each_char.size`: ```c static VALUE rb_str_each_grapheme_cluster(VALUE str) { RETURN_SIZED_ENUMERATOR(str, 0, 0, rb_str_each_char_size); return rb_str_enumerate_grapheme_clusters(str, 0); } ``` If the grapheme enumerator's size cannot be calculated lazily, `each_grapheme_cluster.size` should return `nil` to indicate that. -- https://bugs.ruby-lang.org/ Unsubscribe: