[#12372] Release compatibility/train — Prashant Srinivasan <Prashant.Srinivasan@...>

Hello all,

28 messages 2007/10/03
[#12373] Re: Release compatibility/train — Yukihiro Matsumoto <matz@...> 2007/10/03

Hi,

[#12374] Re: Release compatibility/train — David Flanagan <david@...> 2007/10/03

Yukihiro Matsumoto wrote:

[#12376] Re: Release compatibility/train — Prashant Srinivasan <Prashant.Srinivasan@...> 2007/10/03

[#12377] Re: Release compatibility/train — Yukihiro Matsumoto <matz@...> 2007/10/03

Hi,

[#12382] Re: Release compatibility/train — Charles Oliver Nutter <charles.nutter@...> 2007/10/03

Yukihiro Matsumoto wrote:

[#12385] Re: Release compatibility/train — Yukihiro Matsumoto <matz@...> 2007/10/03

Hi,

[#12388] Re: Release compatibility/train — Charles Oliver Nutter <charles.nutter@...> 2007/10/03

Yukihiro Matsumoto wrote:

[#12389] Re: Release compatibility/train — Yukihiro Matsumoto <matz@...> 2007/10/03

Hi,

[#12406] Re: Release compatibility/train — "David A. Black" <dblack@...> 2007/10/03

Hi --

[#12383] Include Rake in Ruby 1.9 — "NAKAMURA, Hiroshi" <nakahiro@...>

-----BEGIN PGP SIGNED MESSAGE-----

20 messages 2007/10/03

[#12539] Ordered Hashes in 1.9? — Michael Neumann <mneumann@...>

Hi all,

17 messages 2007/10/08
[#12542] Re: Ordered Hashes in 1.9? — Yukihiro Matsumoto <matz@...> 2007/10/08

Hi,

[#12681] Unicode: Progress? — murphy <murphy@...>

Hello!

17 messages 2007/10/15

[#12693] retry: revised 1.9 http patch — Hugh Sasse <hgs@...>

I'm reposting this because I've had little response to this version

11 messages 2007/10/15

[#12697] Range.first is incompatible with Enumerable.first — David Flanagan <david@...>

The new Enumerable.first method is a generalization of Array.first to

11 messages 2007/10/16

[#12754] Improving 'syntax error, unexpected $end, expecting kEND'? — Hugh Sasse <hgs@...>

I've had a look at this, but can't see how to do it: When I get

17 messages 2007/10/18
[#12886] Re: Improving 'syntax error, unexpected $end, expecting kEND'? — David Flanagan <david@...> 2007/10/23

The patch below changes this message to:

[#12758] Encoding::primary_encoding — David Flanagan <david@...>

Hi,

25 messages 2007/10/18
[#12763] Re: Encoding::primary_encoding — Nobuyoshi Nakada <nobu@...> 2007/10/19

Hi,

[#12802] Re: Encoding::primary_encoding — Wolfgang N疆asi-Donner <ed.odanow@...> 2007/10/21

Nobuyoshi Nakada schrieb:

[#12803] Re: Encoding::primary_encoding — Nobuyoshi Nakada <nobu@...> 2007/10/21

Hi,

[#12804] Re: Encoding::primary_encoding — Wolfgang N疆asi-Donner <ed.odanow@...> 2007/10/21

Nobuyoshi Nakada schrieb:

[#12808] Re: Encoding::primary_encoding — Nobuyoshi Nakada <nobu@...> 2007/10/22

Hi,

[#12818] Re: Encoding::primary_encoding — Wolfgang N疆asi-Donner <ed.odanow@...> 2007/10/22

Nobuyoshi Nakada schrieb:

[#12820] Re: Encoding::primary_encoding — "Michal Suchanek" <hramrach@...> 2007/10/22

T24gMjIvMTAvMjAwNywgV29sZmdhbmcgTsOhZGFzaS1Eb25uZXIgPGVkLm9kYW5vd0B3b25hZG8u

[#12823] Re: Encoding::primary_encoding — Wolfgang Nádasi-Donner <ed.odanow@...> 2007/10/22

Michal Suchanek schrieb:

[#12824] Re: Encoding::primary_encoding — Nobuyoshi Nakada <nobu@...> 2007/10/22

Hi,

[#12767] \u escapes in string literals: proof of concept implementation — David Flanagan <david@...>

Back at the end of August, Matz wrote (see

45 messages 2007/10/19
[#12769] Re: \u escapes in string literals: proof of concept implementation — "Nobuyoshi Nakada" <nobu@...> 2007/10/19

Hi,

[#12782] Re: \u escapes in string literals: proof of concept implementation — David Flanagan <david@...> 2007/10/20

Nobuyoshi Nakada wrote:

[#12831] Re: \u escapes in string literals: proof of concept implementation — Yukihiro Matsumoto <matz@...> 2007/10/22

Hi,

[#12841] Re: \u escapes in string literals: proof of concept implementation — David Flanagan <david@...> 2007/10/22

Yukihiro Matsumoto wrote:

[#12862] Re: \u escapes in string literals: proof of concept implementation — Martin Duerst <duerst@...> 2007/10/23

At 04:19 07/10/23, David Flanagan wrote:

[#12864] Re: \u escapes in string literals: proof of concept implementation — David Flanagan <david@...> 2007/10/23

Martin Duerst wrote:

[#12870] Re: \u escapes in string literals: proof of concept implementation — Martin Duerst <duerst@...> 2007/10/23

At 13:10 07/10/23, David Flanagan wrote:

[#12872] Re: \u escapes in string literals: proof of concept implementation — David Flanagan <david@...> 2007/10/23

Martin Duerst wrote:

[#12936] Re: \u escapes in string literals: proof of concept implementation — Yukihiro Matsumoto <matz@...> 2007/10/25

Hi,

[#12980] Re: \u escapes in string literals: proof of concept implementation — David Flanagan <david@...> 2007/10/26

Yukihiro Matsumoto wrote:

[#13028] Re: \u escapes in string literals: proof of concept implementation — Nobuyoshi Nakada <nobu@...> 2007/10/29

Hi,

[#13032] Re: \u escapes in string literals: proof of concept implementation — David Flanagan <david@...> 2007/10/29

Nobuyoshi Nakada wrote:

[#13034] Re: \u escapes in string literals: proof of concept implementation — Nobuyoshi Nakada <nobu@...> 2007/10/29

Hi,

[#13082] Re: \u escapes in string literals: proof of concept implementation — Martin Duerst <duerst@...> 2007/10/30

At 16:46 07/10/29, Nobuyoshi Nakada wrote:

[#13231] Re: \u escapes in string literals: proof of concept implementation — Nobuyoshi Nakada <nobu@...> 2007/11/06

Hi,

[#13234] Re: \u escapes in string literals: proof of concept implementation — Martin Duerst <duerst@...> 2007/11/06

At 11:29 07/11/06, Nobuyoshi Nakada wrote:

[#12825] clarification of ruby libraries installation paths? — Lucas Nussbaum <lucas@...>

Hi,

53 messages 2007/10/22
[#12830] Re: clarification of ruby libraries installation paths? — Ben Bleything <ben@...> 2007/10/22

On Mon, Oct 22, 2007, Lucas Nussbaum wrote:

[#12833] Re: clarification of ruby libraries installation paths? — Lucas Nussbaum <lucas@...> 2007/10/22

On 23/10/07 at 00:13 +0900, Ben Bleything wrote:

[#12835] Re: clarification of ruby libraries installation paths? — "Austin Ziegler" <halostatue@...> 2007/10/22

On 10/22/07, Lucas Nussbaum <lucas@lucas-nussbaum.net> wrote:

[#12836] Re: clarification of ruby libraries installation paths? — Lucas Nussbaum <lucas@...> 2007/10/22

On 23/10/07 at 01:55 +0900, Austin Ziegler wrote:

[#12888] Re: clarification of ruby libraries installation paths? — Gonzalo Garramu <ggarra@...> 2007/10/23

Lucas Nussbaum wrote:

[#12894] Re: clarification of ruby libraries installation paths? — Lucas Nussbaum <lucas@...> 2007/10/24

On 24/10/07 at 05:14 +0900, Gonzalo Garramu wrote:

[#13057] Re: clarification of ruby libraries installation paths? — Gonzalo Garramu <ggarra@...> 2007/10/29

Lucas Nussbaum wrote:

[#13058] Re: clarification of ruby libraries installation paths? — Lucas Nussbaum <lucas@...> 2007/10/29

On 30/10/07 at 07:28 +0900, Gonzalo Garramu wrote:

[#12848] Re: clarification of ruby libraries installation paths? — Sam Roberts <sroberts@...> 2007/10/22

On Tue, Oct 23, 2007 at 01:55:29AM +0900, Austin Ziegler wrote:

[#12855] Re: clarification of ruby libraries installation paths? — "Austin Ziegler" <halostatue@...> 2007/10/23

On 10/22/07, Sam Roberts <sroberts@uniserve.com> wrote:

[#13016] Re: clarification of ruby libraries installation paths? — bob@... (Bob Proulx) 2007/10/28

Austin Ziegler wrote:

[#13029] Re: clarification of ruby libraries installation paths? — "Austin Ziegler" <halostatue@...> 2007/10/29

On 10/28/07, Bob Proulx <bob@proulx.com> wrote:

[#13054] Austin Ziegler's behaviour (Was: clarification of ruby libraries installation paths?) — Lucas Nussbaum <lucas@...> 2007/10/29

Austin,

[#13055] Re: Austin Ziegler's behaviour (Was: clarification of ruby libraries installation paths?) — "Luis Lavena" <luislavena@...> 2007/10/29

On 10/29/07, Lucas Nussbaum <lucas@lucas-nussbaum.net> wrote:

[#13064] Re: Austin Ziegler's behaviour (Was: clarification of ruby libraries installation paths?) — "Austin Ziegler" <halostatue@...> 2007/10/30

On 10/29/07, Luis Lavena <luislavena@gmail.com> wrote:

[#13066] Re: Austin Ziegler's behaviour (Was: clarification of ruby libraries installation paths?) — "Luis Lavena" <luislavena@...> 2007/10/30

On 10/30/07, Austin Ziegler <halostatue@gmail.com> wrote:

[#13094] Re: Austin Ziegler's behaviour (Was: clarification of ruby libraries installation paths?) — "Rick Bradley" <rick@...> 2007/10/30

Do we think that maybe, just maybe, things went off the rails when the

[#13095] Re: Austin Ziegler's behaviour (Was: clarification of ruby libraries installation paths?) — "Luis Lavena" <luislavena@...> 2007/10/30

On 10/30/07, Rick Bradley <rick@rickbradley.com> wrote:

[#12900] Hopefully Complete List of Possible Encoding Specifications - Existing Ones — Wolfgang Nádasi-Donner <ed.odanow@...>

Dear Ruby 1.9 architects, developers, and testers!

31 messages 2007/10/24
[#12905] Re: Hopefully Complete List of Possible Encoding Specifications - Existing Ones — Yukihiro Matsumoto <matz@...> 2007/10/24

Hi,

[#12907] Re: Hopefully Complete List of Possible Encoding Specifications - Existing Ones — Wolfgang Nádasi-Donner <ed.odanow@...> 2007/10/24

Yukihiro Matsumoto schrieb:

[#12909] Re: Hopefully Complete List of Possible Encoding Specifications - Existing Ones — Yukihiro Matsumoto <matz@...> 2007/10/24

Hi,

[#12940] Re: Hopefully Complete List of Possible Encoding Specifications - Existing Ones — Wolfgang Nádasi-Donner <ed.odanow@...> 2007/10/25
[#12942] Re: Hopefully Complete List of Possible Encoding Specifications - Existing Ones — Wolfgang Nádasi-Donner <ed.odanow@...> 2007/10/25

I have a (hopefully) final question before testing all

[#12948] Re: Hopefully Complete List of Possible Encoding Specifications - Existing Ones — Nobuyoshi Nakada <nobu@...> 2007/10/26

Hi,

[#12951] Fluent programming in Ruby — David Flanagan <david@...>

From the ChangeLog:

16 messages 2007/10/26

[#12996] General hash keys for colon notation — murphy <murphy@...>

Dear language designer(s) and parser wizards,

16 messages 2007/10/28

[#13027] Implementation of "guessUTF" method - final questions — Wolfgang Nádasi-Donner <ed.odanow@...>

Dear Ruby designers, developers, and testers!

22 messages 2007/10/29

[#13069] new Enumerable.butfirst method — David Flanagan <david@...>

Matz,

17 messages 2007/10/30

Re: \u escapes in string literals: proof of concept implementation

From: Martin Duerst <duerst@...>
Date: 2007-10-26 08:40:32 UTC
List: ruby-core #12962
At 06:31 07/10/26, David Flanagan wrote:
>Yukihiro Matsumoto wrote:
>> Hi,
>> In message "Re: \u escapes in string literals: proof of concept   implementation"
>>     on Tue, 23 Oct 2007 16:53:57 +0900, David Flanagan <david@davidflanagan.com> writes:
>> |I like the \Uxxxxxx escape instead of \u{}.  Would you consider this, Matz?
>> Actually I hate counting digits.  When I am forced to put sufficient
>> number of preceding zeros to specify non-BMP character, I'd go mad.
>> Is there any reason \U<8ditits> is better than \u{}?  If it's
>> sufficient reason, it's OK to allow \U as well.
>>                                                      matz.
>
>I've been meaning to ask you about the 8 digits.  Unicode only uses 6 digits currently: the highest allowed codepoint is 10FFFF.  So even if Unicode grew to have 16 times then number of codepoints 6 hex digits would still be enough.

As for \U, I think we should stay with the forms we have currently,
and we can still introduce new ones if there is a great demand.

Please note that although the codepoints with five or six hex digits
are in the majority when compared to those with only four digits,
most of that area is completely empty, and the characters assinged
are extremely rare, so the chance of actually using one of these
escapes is extremely low.

Also, please note that at the speed the Unicode consortium and
the corresponding ISO WG are working, it will take centuries to
fill up all that space, if ever. In my opinion, this space will
only ever be filled up in case we get invaded by some extraterrestrial
beings that happen to have a writing system with way more characters
than the most complex writing system on earth (Han ideographs).


>What I was proposing was \U with exactly 6 digits after it.  And you'd only use it for those rare codepoints with 5 or 6 digits.  Without the curly braces it is shorter.  I don't feel actually feel strongly about \u{} versus \U however.  And reducing the number of special characters after slash is probably a good thing.
>
>Unless I'm missing the point, however, I don't think there is any reason to allow 4-byte codepoints.  I read somewhere that although the UTF-8 encoding scheme can be extended to encode 32 bits in 6 bytes, this is actually forbidden by the UTF-8 spec.  (I haven't verified that, but I think I saw it on Wikipedia.)  So if Ruby allows \u{xxxxxxxx} (8 hex digits) it will generate invalid codepoints in an illegal extension of UTF-8.

Yes indeed. You don't need to go to Wikipedia for this, you can
go straight to the source. As I posted earlier, it's in Section 3.9
of the Unicode Standard. Look for page 103 and 104 at
http://www.unicode.org/versions/Unicode5.0.0/ch03.pdf.

I'm very sure that ISO 10646 will adjust to this also,
and probably already has done so. I can contact the
Convener of the responsible WG or the editor of the
spec directly if you want, I  know both of them well.

The IETF also has RFC 3629, and again, the definition is
the same as in Unicode, although it's worded differently.
(http://www.ietf.org/rfc/rfc3629.txt; please note that
this is a full IETF Standard, which is very rare in the
IETF.)

In Ruby, I have found relevant code in pack.c and in enc/utf8.c.
The decoding code in pack.c is slightly better than the rest
in that it checks and rejects overlong sequences, but otherwise,
all the code is in pretty bad shape with respect to the standards.

I'd gladly provide patches for both the above files, and anything
else if necessary. pack.c is really simple. enc/utf8.c is a bit
more complex, because this essentially belongs to Oniguruma, and
because it even has #defines to pass through 0xFE and 0xFF bytes
if one wishes to do so (I hope Ruby doesn't allow these).

I think we should go ahead and tighten these pieces of code, because
some of these limitations are related to security issues, and we
should make sure we follow the standards. If bytes not conforming
to the standards accidentally creep in, then this is a mistake anyway.
If somebody wants to include such bytes on purpose, they have to
use the binary encoding.

Regards,    Martin.


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp     


In This Thread