[#25936] [Bug:1.9] [rubygems] $LOAD_PATH includes bin directory — Nobuyoshi Nakada <nobu@...>

Hi,

10 messages 2009/10/05

[#25943] Disabling tainting — Tony Arcieri <tony@...>

Would it make sense to have a flag passed to the interpreter on startup that

16 messages 2009/10/05

[#26028] [Bug #2189] Math.atanh(1) & Math.atanh(-1) should not raise an error — Marc-Andre Lafortune <redmine@...>

Bug #2189: Math.atanh(1) & Math.atanh(-1) should not raise an error

14 messages 2009/10/10

[#26222] [Bug #2250] IO::for_fd() objects' finalization dangerously closes underlying fds — Mike Pomraning <redmine@...>

Bug #2250: IO::for_fd() objects' finalization dangerously closes underlying fds

11 messages 2009/10/22

[#26244] [Bug #2258] Kernel#require inside rb_require() inside rb_protect() inside SysV context fails — Suraj Kurapati <redmine@...>

Bug #2258: Kernel#require inside rb_require() inside rb_protect() inside SysV context fails

24 messages 2009/10/22

[#26361] [Feature #2294] [PATCH] ruby_bind_stack() to embed Ruby in coroutine — Suraj Kurapati <redmine@...>

Feature #2294: [PATCH] ruby_bind_stack() to embed Ruby in coroutine

42 messages 2009/10/27

[#26371] [Bug #2295] segmentation faults — tomer doron <redmine@...>

Bug #2295: segmentation faults

16 messages 2009/10/27

[ruby-core:26278] [Feature #2034] Consider the ICU Library for Improving and Expanding Unicode Support

From: Yui NARUSE <redmine@...>
Date: 2009-10-24 17:58:01 UTC
List: ruby-core #26278
Issue #2034 has been updated by Yui NARUSE.

Target version set to 2.0

> If you think this is necessary, please start implementing. In my 
> opinion, it will take you a lot of time, with very little advantage over 
> a single-encoding sorting implementation.

Unicode strings need language to decide glyph.
This is implied problem of Unicode Han Unification.
For example U+9AA8's difference between China and Japanese glyph.
http://www.atmarkit.co.jp/fxml/rensai/xmlwomanabou11/learning-xml11.html in Japanese but images are showed in

> When I fetch text from the legacy system, it has a two byte CCSID in
> front of it.  I have a table that translates the CCSID to the name of
> the encoding.  It is much like:
>
> http://www-01.ibm.com/software/globalization/ccsid/ccsid_registered.jsp

CCSID is a part of IBM's encoding framework: CDRA.
IBM's "Code Page" is a CCS (Coded Character Set).
And CCSID ties up Code Pages with encoding schems.
So IBM's CCSID is the same as an encoding, a charset and Microsoft's Code Page.

So a CCSID can be as an encoding if needed.

> There is one thing that confused me at the end of Martin's post.  To
> me, data never has a language.  Perhaps I'm mistaken.  The data only
> have a language when viewed by a user.  As he points out, a sort can
> only be properly done when the language of the user is taken into
> account.  At least, that is how I would rephrase what he said.

Unicode unifies characters between some languages, for example above U+9AA8.
Another critical example is capital letter of i, it is not I in Turkish.
http://unicode.org/Public/UNIDATA/SpecialCasing.txt
(this is one reason why String#upcase is not Unicode sensitive)

More example is following:

* http://unicode.org/reports/tr10/ UNICODE COLLATION ALGORITHM; effects sort
* http://unicode.org/reports/tr11/ East Asian Width; effects String#center
* http://unicode.org/reports/tr18/ UNICODE REGULAR EXPRESSIONS; Tailored Support: Level 3 effects String#upcase and /i/i
----------------------------------------
http://redmine.ruby-lang.org/issues/show/2034

----------------------------------------
http://redmine.ruby-lang.org

In This Thread

Prev Next