From: mail@... Date: 2017-02-22T09:22:54+00:00 Subject: [ruby-core:79673] [Ruby trunk Feature#13241] Method(s) to access Unicode properties for characters/strings Issue #13241 has been updated by Jan Lelis. I think prefixing such methods with `unicode_` would be no problem. While it's a little verbose, it still reads good: - `"bla".unicode_scripts` - `"blubb".unicode_properties(:general_categories)` and so on. Also it is consistent with the `unicode_normalize` API. ---------------------------------------- Feature #13241: Method(s) to access Unicode properties for characters/strings https://bugs.ruby-lang.org/issues/13241#change-63093 * Author: Martin D��rst * Status: Open * Priority: Normal * Assignee: * Target version: ---------------------------------------- [This is currently an exploratory proposal.] Onigmo allows Unicode properties in regular expressions. With this, it's e.g. possible to check whether a string contains some Hiragana: ``` "ABC ��� DEF" =~ /\p{hiragana}/ ``` However, it is currently impossible to ask for e.g. the script of a character. I propose to add a method (or some methods) to String to be able to get such properties. Various (to some extent conflicting) examples: ``` "A������".script => :latin # returns script of first character only "A������".script => [:latin, :hiragana, :katakana] # returns array of property values "A������".property(:script) => :latin # returns specified property of first character only "A������".property(:script) => [:latin, :hiragana, :katakana] # returns array of specified properties' values "A������".properties([:script, :general_category]) => [[:latin, :Lu], [:hiragana, :Lo], [:katakana, :Lo]] # returns arrays of property values, one array per character ``` The interface is still in flux, comments welcome! Implementation depends on #13240. In Python, such functionality (however, quite limited in property coverage, and not directly on String) is available in the standard library (see https://docs.python.org/3/library/unicodedata.html). -- https://bugs.ruby-lang.org/ Unsubscribe: