[#11439] comments needed for Random class — "NAKAMURA, Hiroshi" <nakahiro@...>

-----BEGIN PGP SIGNED MESSAGE-----

15 messages 2007/06/12

[#11450] Re: new method dispatch rule (matz' proposal) — David Flanagan <david@...>

This is a late response to the very long thread that started back in

17 messages 2007/06/13

[#11482] Ruby Changes Its Mind About Non-Word Characters — James Edward Gray II <james@...>

Does this look like a bug to anyone else?

10 messages 2007/06/16

[#11505] Question about the patchlevel release cycle — Sylvain Joyeux <sylvain.joyeux@...4x.org>

1.8.6 thread support was broken in bad ways. It stayed for three months

20 messages 2007/06/20
[#11512] Re: Question about the patchlevel release cycle — Urabe Shyouhei <shyouhei@...> 2007/06/20

Hi, I'm the 1.8.6 branch manager.

[#11543] Re: Apple reportedly to ship with ruby 1.8.6-p36 unless informed what to patch — James Edward Gray II <james@...>

On Jun 27, 2007, at 4:47 PM, Bill Kelly wrote:

10 messages 2007/06/27

Re: Ruby Changes Its Mind About Non-Word Characters

From: "Vincent Isambart" <vincent.isambart@...>
Date: 2007-06-17 10:22:37 UTC
List: ruby-core #11496
> I agree that the and サ are logically quotes (after all I'm French
> and they are used a lot in French), but what I said is that the
> Unicode support for Ruby 1.8 is poor (in fact except split(//) you
> can't do much) so I am not that suprised by the fact it does not work.

I just checked in ruby 1.8.6's code what was done and it is indeed
quite simple. Everything is based on whether a character "is a letter"
or not, and "is a letter" is defined by (I simplified it a bit):
- if the characters is an ASCII character (ASCII code <= 127) or the
multibyte character mode is ASCII (the default if you did not change
it), the result is what says the standard C function isalnum (is alpha
numeric)
- if the characters code is bigger that 127 and the multibyte
character mode is not ASCII, it "is a letter" if the character takes
more than one byte. And in UTF-8, _all_ characters with a code bigger
than 127 take at least 2 bytes.

So in ruby 1.8 in UTF-8 mode, all non ASCII characters are considered a letter.

Yes it is not a good thing but think that this is not so simple,
knowing if a Unicode character is a letter or not requires a better
regexp engine, including tables with the attributes of all Unicode
characters. But for that you will need to use ruby 1.9. Tim Bray had
reasons to say that Unicode support in ruby 1.8 sucked. And Matz knows
it, that's what ruby 1.9 will be better on this point.


In This Thread