[#79440] [Ruby trunk Bug#13188] Reinitialize Ruby VM. — shyouhei@...
Issue #13188 has been updated by Shyouhei Urabe.
6 messages
2017/02/06
[#79441] Re: [Ruby trunk Bug#13188] Reinitialize Ruby VM.
— SASADA Koichi <ko1@...>
2017/02/06
On 2017/02/06 10:10, shyouhei@ruby-lang.org wrote:
[#79532] Immutable Strings vs Symbols — Daniel Ferreira <subtileos@...>
Hi,
15 messages
2017/02/15
[#79541] Re: Immutable Strings vs Symbols
— Rodrigo Rosenfeld Rosas <rr.rosas@...>
2017/02/15
Em 15-02-2017 05:05, Daniel Ferreira escreveu:
[#79543] Re: Immutable Strings vs Symbols
— Daniel Ferreira <subtileos@...>
2017/02/16
Hi Rodrigo,
[#79560] Re: Immutable Strings vs Symbols
— Rodrigo Rosenfeld Rosas <rr.rosas@...>
2017/02/16
Em 15-02-2017 22:39, Daniel Ferreira escreveu:
[ruby-core:79666] [Ruby trunk Feature#13240] Change Unicode property implementation in Onigmo from inversion lists to direct lookup
From:
duerst@...
Date:
2017-02-22 08:01:22 UTC
List:
ruby-core #79666
Issue #13240 has been reported by Martin D端rst.
----------------------------------------
Feature #13240: Change Unicode property implementation in Onigmo from inversion lists to direct lookup
https://bugs.ruby-lang.org/issues/13240
* Author: Martin D端rst
* Status: Open
* Priority: Normal
* Assignee:
* Target version:
----------------------------------------
For Unicode property checks (e.g. `/\p{hiragana}/`), Onigmo is currently using inversion lists. See enc/unicode/9.0.0/name2ctype.h; the about 500 arrays starting with `CR_NEWLINE`, currently on line 39, are all inversion lists.
I propose to change this to use direct lookup. Takumi Koyama, a student of mine, has implemented direct lookup. Our new implementation uses less memory (213'920 vs. 240,976 bytes) while supporting more properties (76 vs. 62) and more property values (1009 vs. 554).
We are also faster on checking single properties, up to 9 times faster for the actual check depending on property value. This is because inversion lists use binary search, and so depends on the length of the inversion list (O(log n), Age3.0 is longest), whereas we just use direct lookup, which is a constant-time operation. But we are also somewhat faster for very short inversion lists, i.e. blocks (which by definition have only one range).
Where we may get slower is for character classes with multiple properties (e.g. `/[\p{han}\p{hiragana}\p{katakana}...]/`). This is because inversion lists are easily mergeable (when compiling the regular expression), and can also be combined with character class ranges. On the other hand, direct lookup isn't easily mergeable. This may need further investigation (what kinds of uses for Unicode properties in Ruby regular expressions are popular/frequent).
--
https://bugs.ruby-lang.org/
Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>