[#25936] [Bug:1.9] [rubygems] $LOAD_PATH includes bin directory — Nobuyoshi Nakada <nobu@...>

Hi,

10 messages 2009/10/05

[#25943] Disabling tainting — Tony Arcieri <tony@...>

Would it make sense to have a flag passed to the interpreter on startup that

16 messages 2009/10/05

[#26028] [Bug #2189] Math.atanh(1) & Math.atanh(-1) should not raise an error — Marc-Andre Lafortune <redmine@...>

Bug #2189: Math.atanh(1) & Math.atanh(-1) should not raise an error

14 messages 2009/10/10

[#26222] [Bug #2250] IO::for_fd() objects' finalization dangerously closes underlying fds — Mike Pomraning <redmine@...>

Bug #2250: IO::for_fd() objects' finalization dangerously closes underlying fds

11 messages 2009/10/22

[#26244] [Bug #2258] Kernel#require inside rb_require() inside rb_protect() inside SysV context fails — Suraj Kurapati <redmine@...>

Bug #2258: Kernel#require inside rb_require() inside rb_protect() inside SysV context fails

24 messages 2009/10/22

[#26361] [Feature #2294] [PATCH] ruby_bind_stack() to embed Ruby in coroutine — Suraj Kurapati <redmine@...>

Feature #2294: [PATCH] ruby_bind_stack() to embed Ruby in coroutine

42 messages 2009/10/27

[#26371] [Bug #2295] segmentation faults — tomer doron <redmine@...>

Bug #2295: segmentation faults

16 messages 2009/10/27

[ruby-core:26153] [Feature #2034] Consider the ICU Library for Improving and Expanding Unicode Support

From: Yui NARUSE <redmine@...>
Date: 2009-10-18 19:42:16 UTC
List: ruby-core #26153
Issue #2034 has been updated by Yui NARUSE.


> needs to translate EBCDIC encoded Japanese characters

What is the encoding and do you the converter for the encoding should be included?
I guess, the converter can convert the encoding to EUC-JP by an algorithm.

> I'm assuming a few things here.  One is that this:
> http://yokolet.blogspot.com/2009/07/design-and-implementation-of-ruby-m17n.html
> is accurate for the most part.

I wrote it.

> It could be that "aaa" + "bbb" yields String that is a list of
> SubStrings.  I'll write as x = [ "aaa", "bbb" ].  That would have many
> useful concepts: length would be the sum of the length of all the
> SubStrings. x[1] would be "a".  x[4] would be "b".  x[2,2] would yield
> a String with two SubStrings (again, this is just how I'm representing
> it) [ "a", "b" ].  x.encoding would return Mixed in these cases.
> Encoding would be a concept attached to a SubString rather than
> String.  x.each would return a sequence of "a", "a", "a", "b", "b",
> "b" each with a encoding of A for the "a"s and B for the "b"s.  String
> would still be what most applications use.  Rarely would they need to
> know about the SubStrings.

That consept is sometimes introduced as rope.

http://jp.rubyist.net/magazine/?0022-RubyConf2007Report#l13
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.14.9450
http://www.kmonos.net/wlog/39.php#_1841040529 in Japanese
http://d.hatena.ne.jp/ku-ma-me/20070730/p1 in Japanese

Rope is:
* fast string concatnation
* fast substring get
* can't change substring
* slow index access to a character

But Ruby's string is mutable.
This seems a critical issue for rope.
Moreover Ruby users often use regexp match to strings.
I don't think rope has enough merit to implement despite such tough environment.

> I believe that if Ruby wants to hold strongly to the
> CSI model that encoding agnostic string manipulations should be
> implemented.

Ruby is practical languate, although Ruby use CSI model :-)
In current situation, such consept is hard to realize in Ruby
because of performance, difficulty of implementation, and needs.

> Sorting not only depends upon the encoding but also the language.
> Sorting could be done with routines specific to an encoding plus
> language but I believe that is impractical to implement.

Yes, String needs language.
This is an open problem.
We may have to implement rope for languages.

> It is more portable than any iconv implementation (because iconv has
> been stuff into the libc implemntation and pulling it back apart
> looked really hard to me).

For String, core of Ruby, iconv is out of the question.
Core library and its dependency must as portable as Ruby.

> The fact that it is hugh is just a reflection of the size of the problem.

I think the problem is too heavy to treat by current Ruby.
And Ruby 1.9 uses CSI model; it is beyond ICU, which uses UCS model.
----------------------------------------
http://redmine.ruby-lang.org/issues/show/2034

----------------------------------------
http://redmine.ruby-lang.org

In This Thread