From: shevegen@... Date: 2017-02-22T17:21:55+00:00 Subject: [ruby-core:79693] [Ruby trunk Feature#13241] Method(s) to access Unicode properties for characters/strings Issue #13241 has been updated by Robert A. Heiler. Jan Lelis wrote: > I think, it should be always plural methods which return a list of properties used in the > string, since Ruby does not distinguish between single characters and strings. The first > example would then rather be: "A������".scripts => [:hiragana, :katakana, :latin] (like the > fourth example). I agree in the sense that your example given makes more sense than the first example, where: "A������".script => :latin # returns script of first character only Only returned one result. I understand it was just an example, but it confused me because I wondered what happened to the other characters? I like the name "property" or "properties" more than "script" - script sounds a bit non-descript (pun intended!). Since matz said that it should be indicative of unicode, e. g. with a unicode_prefix, the example by Jan Lelis would seem good: "string here".unicode_properties(optional_args) Other name suggestions: .unciode_category .unciode_categories .unciode_tokenset .unciode_token_set .unciode_tokens And similar perhaps. PS: By the way, what should it return for an empty string like ""? Or numbers or similar semi-common tokens? ---------------------------------------- Feature #13241: Method(s) to access Unicode properties for characters/strings https://bugs.ruby-lang.org/issues/13241#change-63114 * Author: Martin D��rst * Status: Open * Priority: Normal * Assignee: * Target version: ---------------------------------------- [This is currently an exploratory proposal.] Onigmo allows Unicode properties in regular expressions. With this, it's e.g. possible to check whether a string contains some Hiragana: ``` "ABC ��� DEF" =~ /\p{hiragana}/ ``` However, it is currently impossible to ask for e.g. the script of a character. I propose to add a method (or some methods) to String to be able to get such properties. Various (to some extent conflicting) examples: ``` "A������".script => :latin # returns script of first character only "A������".script => [:latin, :hiragana, :katakana] # returns array of property values "A������".property(:script) => :latin # returns specified property of first character only "A������".property(:script) => [:latin, :hiragana, :katakana] # returns array of specified properties' values "A������".properties([:script, :general_category]) => [[:latin, :Lu], [:hiragana, :Lo], [:katakana, :Lo]] # returns arrays of property values, one array per character ``` The interface is still in flux, comments welcome! Implementation depends on #13240. In Python, such functionality (however, quite limited in property coverage, and not directly on String) is available in the standard library (see https://docs.python.org/3/library/unicodedata.html). -- https://bugs.ruby-lang.org/ Unsubscribe: