From: naruse@... Date: 2017-08-05T10:50:17+00:00 Subject: [ruby-core:82258] [Ruby trunk Feature#13780] String#each_grapheme Issue #13780 has been updated by naruse (Yui NARUSE). shan (Shannon Skipper) wrote: > shevegen (Robert A. Heiler) wrote: > > My only concern is about the name "grapheme". > > > > I don't know how it is for others but ... this is the first time that I even heard the > > term. > > I think the term is correct and it complements #codepoints and #each_codepoint. In Elixir for example: Elixir's grapheme` and Swift's `Character` refer Unicode�� Standard Annex #29's "Grapheme Cluster". http://unicode.org/reports/tr29/ The document says grapheme clusters are ���user-perceived characters���. ---------------------------------------- Feature #13780: String#each_grapheme https://bugs.ruby-lang.org/issues/13780#change-66041 * Author: rbjl (Jan Lelis) * Status: Assigned * Priority: Normal * Assignee: naruse (Yui NARUSE) * Target version: 2.5 ---------------------------------------- Ruby's regex engine has support for graphemes via `\X`: https://github.com/k-takata/Onigmo/blob/791140951eefcf17db4e762e789eb046ea8a114c/doc/RE#L117-L124 This is really useful when working with Unicode strings. However, code like `string.scan(/\X/)` is not so readable enough, which might lead people to use String#each_char, when they really should split by graphemes. What I propose is two new methods: - String#each_grapheme which returns an Enumerator of graphemes (in the same way like `\X`) and - String#graphemes which returns an Array of graphemes (in the same way like `\X`) What do you think? Resources - Unicode�� Standard Annex #29: Unicode Text Segmentation: http://unicode.org/reports/tr29/ - Related issue: https://bugs.ruby-lang.org/issues/12831 -- https://bugs.ruby-lang.org/ Unsubscribe: