[#65451] [ruby-trunk - Feature #10333] [PATCH 3/1] optimize: "yoda literal" == string — ko1@...

Issue #10333 has been updated by Koichi Sasada.

9 messages 2014/10/07

[ruby-core:65635] [CommonRuby - Feature #10084] Add Unicode String Normalization to String class

From: duerst@...
Date: 2014-10-13 00:49:13 UTC
List: ruby-core #65635
Issue #10084 has been updated by Martin Dテシrst.

Assignee changed from Yukihiro Matsumoto to Martin Dテシrst

Not getting any feedback on implementation details, I'm assuming that nobody cares too much, and will therefore proceed. I have tried a refinement (proposal 5); I didn't see any effects on performance. But using a refinement would make it more difficult to backport this to earlier versions or make it available as a gem.

I'm therefore going to take the easiest way forward and use solution 1), with a module name of UnicodeNormalize (exactly corresponding to primary method name on string). If anybody still has comments, please don't hesitate to add them here, so that we can discuss them.

----------------------------------------
Feature #10084: Add Unicode String Normalization to String class
https://bugs.ruby-lang.org/issues/10084#change-49367

* Author: Martin Dテシrst
* Status: Open
* Priority: Normal
* Assignee: Martin Dテシrst
* Category: 
* Target version: Ruby 2.2.0
----------------------------------------
Unicode string normalization is a frequent operation when comparing or normalizing strings.

This should be available directly on the String class.

The proposed syntax is:

   'string'.normalize       # normalize 'string' according to NFC (most frequent on the Web)
   'string'.normalize :nfc  # normalize 'string' according to NFC; :nfd, :nfkc, :nfkd also usable
   'string'.nfc             # shorter variant, but maybe too many methods

There are several "unofficial" but convenient normalization variants that could be offered, e.g.:
                           
   'string'.normalize :mac  # use MacIntosh file system normalization variant

Implementations are already available in pure Ruby (easy for other Ruby implementations; e.g. eprun: https://github.com/duerst/eprun) and in C (unf,窶ヲ, http://bibwild.wordpress.com/2013/11/19/benchmarking-ruby-unicode-normalization-alternatives/)

---Files--------------------------------
Normalization.pdf (576 KB)


-- 
https://bugs.ruby-lang.org/

In This Thread

Prev Next