[#12372] Release compatibility/train — Prashant Srinivasan <Prashant.Srinivasan@...>

Hello all,

28 messages 2007/10/03
[#12373] Re: Release compatibility/train — Yukihiro Matsumoto <matz@...> 2007/10/03

Hi,

[#12374] Re: Release compatibility/train — David Flanagan <david@...> 2007/10/03

Yukihiro Matsumoto wrote:

[#12376] Re: Release compatibility/train — Prashant Srinivasan <Prashant.Srinivasan@...> 2007/10/03

[#12377] Re: Release compatibility/train — Yukihiro Matsumoto <matz@...> 2007/10/03

Hi,

[#12382] Re: Release compatibility/train — Charles Oliver Nutter <charles.nutter@...> 2007/10/03

Yukihiro Matsumoto wrote:

[#12385] Re: Release compatibility/train — Yukihiro Matsumoto <matz@...> 2007/10/03

Hi,

[#12388] Re: Release compatibility/train — Charles Oliver Nutter <charles.nutter@...> 2007/10/03

Yukihiro Matsumoto wrote:

[#12389] Re: Release compatibility/train — Yukihiro Matsumoto <matz@...> 2007/10/03

Hi,

[#12406] Re: Release compatibility/train — "David A. Black" <dblack@...> 2007/10/03

Hi --

[#12383] Include Rake in Ruby 1.9 — "NAKAMURA, Hiroshi" <nakahiro@...>

-----BEGIN PGP SIGNED MESSAGE-----

20 messages 2007/10/03

[#12539] Ordered Hashes in 1.9? — Michael Neumann <mneumann@...>

Hi all,

17 messages 2007/10/08
[#12542] Re: Ordered Hashes in 1.9? — Yukihiro Matsumoto <matz@...> 2007/10/08

Hi,

[#12681] Unicode: Progress? — murphy <murphy@...>

Hello!

17 messages 2007/10/15

[#12693] retry: revised 1.9 http patch — Hugh Sasse <hgs@...>

I'm reposting this because I've had little response to this version

11 messages 2007/10/15

[#12697] Range.first is incompatible with Enumerable.first — David Flanagan <david@...>

The new Enumerable.first method is a generalization of Array.first to

11 messages 2007/10/16

[#12754] Improving 'syntax error, unexpected $end, expecting kEND'? — Hugh Sasse <hgs@...>

I've had a look at this, but can't see how to do it: When I get

17 messages 2007/10/18
[#12886] Re: Improving 'syntax error, unexpected $end, expecting kEND'? — David Flanagan <david@...> 2007/10/23

The patch below changes this message to:

[#12758] Encoding::primary_encoding — David Flanagan <david@...>

Hi,

25 messages 2007/10/18
[#12763] Re: Encoding::primary_encoding — Nobuyoshi Nakada <nobu@...> 2007/10/19

Hi,

[#12802] Re: Encoding::primary_encoding — Wolfgang N疆asi-Donner <ed.odanow@...> 2007/10/21

Nobuyoshi Nakada schrieb:

[#12803] Re: Encoding::primary_encoding — Nobuyoshi Nakada <nobu@...> 2007/10/21

Hi,

[#12804] Re: Encoding::primary_encoding — Wolfgang N疆asi-Donner <ed.odanow@...> 2007/10/21

Nobuyoshi Nakada schrieb:

[#12808] Re: Encoding::primary_encoding — Nobuyoshi Nakada <nobu@...> 2007/10/22

Hi,

[#12818] Re: Encoding::primary_encoding — Wolfgang N疆asi-Donner <ed.odanow@...> 2007/10/22

Nobuyoshi Nakada schrieb:

[#12820] Re: Encoding::primary_encoding — "Michal Suchanek" <hramrach@...> 2007/10/22

T24gMjIvMTAvMjAwNywgV29sZmdhbmcgTsOhZGFzaS1Eb25uZXIgPGVkLm9kYW5vd0B3b25hZG8u

[#12823] Re: Encoding::primary_encoding — Wolfgang Nádasi-Donner <ed.odanow@...> 2007/10/22

Michal Suchanek schrieb:

[#12824] Re: Encoding::primary_encoding — Nobuyoshi Nakada <nobu@...> 2007/10/22

Hi,

[#12767] \u escapes in string literals: proof of concept implementation — David Flanagan <david@...>

Back at the end of August, Matz wrote (see

45 messages 2007/10/19
[#12769] Re: \u escapes in string literals: proof of concept implementation — "Nobuyoshi Nakada" <nobu@...> 2007/10/19

Hi,

[#12782] Re: \u escapes in string literals: proof of concept implementation — David Flanagan <david@...> 2007/10/20

Nobuyoshi Nakada wrote:

[#12831] Re: \u escapes in string literals: proof of concept implementation — Yukihiro Matsumoto <matz@...> 2007/10/22

Hi,

[#12841] Re: \u escapes in string literals: proof of concept implementation — David Flanagan <david@...> 2007/10/22

Yukihiro Matsumoto wrote:

[#12862] Re: \u escapes in string literals: proof of concept implementation — Martin Duerst <duerst@...> 2007/10/23

At 04:19 07/10/23, David Flanagan wrote:

[#12864] Re: \u escapes in string literals: proof of concept implementation — David Flanagan <david@...> 2007/10/23

Martin Duerst wrote:

[#12870] Re: \u escapes in string literals: proof of concept implementation — Martin Duerst <duerst@...> 2007/10/23

At 13:10 07/10/23, David Flanagan wrote:

[#12872] Re: \u escapes in string literals: proof of concept implementation — David Flanagan <david@...> 2007/10/23

Martin Duerst wrote:

[#12936] Re: \u escapes in string literals: proof of concept implementation — Yukihiro Matsumoto <matz@...> 2007/10/25

Hi,

[#12980] Re: \u escapes in string literals: proof of concept implementation — David Flanagan <david@...> 2007/10/26

Yukihiro Matsumoto wrote:

[#13028] Re: \u escapes in string literals: proof of concept implementation — Nobuyoshi Nakada <nobu@...> 2007/10/29

Hi,

[#13032] Re: \u escapes in string literals: proof of concept implementation — David Flanagan <david@...> 2007/10/29

Nobuyoshi Nakada wrote:

[#13034] Re: \u escapes in string literals: proof of concept implementation — Nobuyoshi Nakada <nobu@...> 2007/10/29

Hi,

[#13082] Re: \u escapes in string literals: proof of concept implementation — Martin Duerst <duerst@...> 2007/10/30

At 16:46 07/10/29, Nobuyoshi Nakada wrote:

[#13231] Re: \u escapes in string literals: proof of concept implementation — Nobuyoshi Nakada <nobu@...> 2007/11/06

Hi,

[#13234] Re: \u escapes in string literals: proof of concept implementation — Martin Duerst <duerst@...> 2007/11/06

At 11:29 07/11/06, Nobuyoshi Nakada wrote:

[#12825] clarification of ruby libraries installation paths? — Lucas Nussbaum <lucas@...>

Hi,

53 messages 2007/10/22
[#12830] Re: clarification of ruby libraries installation paths? — Ben Bleything <ben@...> 2007/10/22

On Mon, Oct 22, 2007, Lucas Nussbaum wrote:

[#12833] Re: clarification of ruby libraries installation paths? — Lucas Nussbaum <lucas@...> 2007/10/22

On 23/10/07 at 00:13 +0900, Ben Bleything wrote:

[#12835] Re: clarification of ruby libraries installation paths? — "Austin Ziegler" <halostatue@...> 2007/10/22

On 10/22/07, Lucas Nussbaum <lucas@lucas-nussbaum.net> wrote:

[#12836] Re: clarification of ruby libraries installation paths? — Lucas Nussbaum <lucas@...> 2007/10/22

On 23/10/07 at 01:55 +0900, Austin Ziegler wrote:

[#12888] Re: clarification of ruby libraries installation paths? — Gonzalo Garramu <ggarra@...> 2007/10/23

Lucas Nussbaum wrote:

[#12894] Re: clarification of ruby libraries installation paths? — Lucas Nussbaum <lucas@...> 2007/10/24

On 24/10/07 at 05:14 +0900, Gonzalo Garramu wrote:

[#13057] Re: clarification of ruby libraries installation paths? — Gonzalo Garramu <ggarra@...> 2007/10/29

Lucas Nussbaum wrote:

[#13058] Re: clarification of ruby libraries installation paths? — Lucas Nussbaum <lucas@...> 2007/10/29

On 30/10/07 at 07:28 +0900, Gonzalo Garramu wrote:

[#12848] Re: clarification of ruby libraries installation paths? — Sam Roberts <sroberts@...> 2007/10/22

On Tue, Oct 23, 2007 at 01:55:29AM +0900, Austin Ziegler wrote:

[#12855] Re: clarification of ruby libraries installation paths? — "Austin Ziegler" <halostatue@...> 2007/10/23

On 10/22/07, Sam Roberts <sroberts@uniserve.com> wrote:

[#13016] Re: clarification of ruby libraries installation paths? — bob@... (Bob Proulx) 2007/10/28

Austin Ziegler wrote:

[#13029] Re: clarification of ruby libraries installation paths? — "Austin Ziegler" <halostatue@...> 2007/10/29

On 10/28/07, Bob Proulx <bob@proulx.com> wrote:

[#13054] Austin Ziegler's behaviour (Was: clarification of ruby libraries installation paths?) — Lucas Nussbaum <lucas@...> 2007/10/29

Austin,

[#13055] Re: Austin Ziegler's behaviour (Was: clarification of ruby libraries installation paths?) — "Luis Lavena" <luislavena@...> 2007/10/29

On 10/29/07, Lucas Nussbaum <lucas@lucas-nussbaum.net> wrote:

[#13064] Re: Austin Ziegler's behaviour (Was: clarification of ruby libraries installation paths?) — "Austin Ziegler" <halostatue@...> 2007/10/30

On 10/29/07, Luis Lavena <luislavena@gmail.com> wrote:

[#13066] Re: Austin Ziegler's behaviour (Was: clarification of ruby libraries installation paths?) — "Luis Lavena" <luislavena@...> 2007/10/30

On 10/30/07, Austin Ziegler <halostatue@gmail.com> wrote:

[#13094] Re: Austin Ziegler's behaviour (Was: clarification of ruby libraries installation paths?) — "Rick Bradley" <rick@...> 2007/10/30

Do we think that maybe, just maybe, things went off the rails when the

[#13095] Re: Austin Ziegler's behaviour (Was: clarification of ruby libraries installation paths?) — "Luis Lavena" <luislavena@...> 2007/10/30

On 10/30/07, Rick Bradley <rick@rickbradley.com> wrote:

[#12900] Hopefully Complete List of Possible Encoding Specifications - Existing Ones — Wolfgang Nádasi-Donner <ed.odanow@...>

Dear Ruby 1.9 architects, developers, and testers!

31 messages 2007/10/24
[#12905] Re: Hopefully Complete List of Possible Encoding Specifications - Existing Ones — Yukihiro Matsumoto <matz@...> 2007/10/24

Hi,

[#12907] Re: Hopefully Complete List of Possible Encoding Specifications - Existing Ones — Wolfgang Nádasi-Donner <ed.odanow@...> 2007/10/24

Yukihiro Matsumoto schrieb:

[#12909] Re: Hopefully Complete List of Possible Encoding Specifications - Existing Ones — Yukihiro Matsumoto <matz@...> 2007/10/24

Hi,

[#12940] Re: Hopefully Complete List of Possible Encoding Specifications - Existing Ones — Wolfgang Nádasi-Donner <ed.odanow@...> 2007/10/25
[#12942] Re: Hopefully Complete List of Possible Encoding Specifications - Existing Ones — Wolfgang Nádasi-Donner <ed.odanow@...> 2007/10/25

I have a (hopefully) final question before testing all

[#12948] Re: Hopefully Complete List of Possible Encoding Specifications - Existing Ones — Nobuyoshi Nakada <nobu@...> 2007/10/26

Hi,

[#12951] Fluent programming in Ruby — David Flanagan <david@...>

From the ChangeLog:

16 messages 2007/10/26

[#12996] General hash keys for colon notation — murphy <murphy@...>

Dear language designer(s) and parser wizards,

16 messages 2007/10/28

[#13027] Implementation of "guessUTF" method - final questions — Wolfgang Nádasi-Donner <ed.odanow@...>

Dear Ruby designers, developers, and testers!

22 messages 2007/10/29

[#13069] new Enumerable.butfirst method — David Flanagan <david@...>

Matz,

17 messages 2007/10/30

Re: \u escapes in string literals: proof of concept implementation

From: David Flanagan <david@...>
Date: 2007-10-20 00:50:58 UTC
List: ruby-core #12782
Nobuyoshi Nakada wrote:

> If the current encoding is not based on the Unicode character
> set, I think \u should:
> 
> a. raise compile error,
> b. be converted from Unicode to the encoding, or
> c. make the whole string UTF-8 encoding (and raise compile error
>    if other non-ascii characters are there).

The new patch attached here does c.  Any string with a \u in it is 
forced to utf-8 encoding.  Unless the \u encodes a character in the 
ASCII range, in which case it leaves the encoding alone.

>> +                       int maxbytes = rb_enc_mbmaxlen(parser->enc);
>> +                       UChar buf[maxbytes];
> 
> C99 feature can't compile with C90 compilers.

This code is gone.

\u{...} no longer accepts codepoints > 10FFFF.

This patch seems to work with regexps and %w{} as well as double-quoted 
strings and here docs.

	David

Attachments (1)

unicode_patch3 (3.79 KB, text/x-diff)
Index: parse.y
===================================================================
--- parse.y	(revision 13739)
+++ parse.y	(working copy)
@@ -237,6 +237,8 @@
     int has_shebang;
     int parser_ruby_sourceline;	/* current line no. */
     rb_encoding *enc;
+    rb_encoding *utf8;
+    int has_utf8; /* if a string contains \u escape, force encoding to utf-8*/
 
 #ifndef RIPPER
     /* Ruby core only */
@@ -264,7 +266,7 @@
 #define STR_NEW0() rb_enc_str_new(0,0,rb_enc_from_index(0))
 #define STR_NEW2(p) rb_enc_str_new((p),strlen(p),parser->enc)
 #define STR_NEW3(p,n,m) parser_str_new((p),(n),STR_ENC(!ENC_SINGLE(m)),(m))
-#define STR_ENC(m) ((m)?parser->enc:rb_enc_from_index(0))
+#define STR_ENC(m) (parser->has_utf8?parser->utf8:((m)?parser->enc:rb_enc_from_index(0)))
 #define ENC_SINGLE(cr) ((cr)==ENC_CODERANGE_SINGLE)
 #define TOK_INTERN(mb) rb_intern3(tok(), toklen(), STR_ENC(mb))
 
@@ -4675,6 +4677,7 @@
     }
 
     parser->enc = rb_enc_get(lex_input);
+    parser->utf8 = rb_enc_find("utf-8");
     ruby_sourcefile = rb_source_filename(f);
     ruby_sourceline = line - 1;
     parser_prepare(parser);
@@ -5152,7 +5155,7 @@
 #define STR_FUNC_INDENT 0x20
 
 enum string_type {
-    str_squote = (0),
+    str_squote= (0),
     str_dquote = (STR_FUNC_EXPAND),
     str_xquote = (STR_FUNC_EXPAND),
     str_regexp = (STR_FUNC_REGEXP|STR_FUNC_ESCAPE|STR_FUNC_EXPAND),
@@ -5180,6 +5183,7 @@
     } while (--len > 0 && (c = nextc()) != -1);
 }
 
+
 #define tokadd_mbchar(c) parser_tokadd_mbchar(parser, c)
 
 static int
@@ -5219,6 +5223,72 @@
 		if (func & STR_FUNC_ESCAPE) tokadd(c);
 		break;
 
+	      case 'u':
+		if ((func & STR_FUNC_EXPAND) == 0) {
+		    tokadd('\\');
+		    break;
+		}
+		else {
+		    int numlen, brace, codepoint;
+		    brace = nextc();
+		    if (brace == '{') {  /* handle \u{...} form */
+			codepoint = scan_hex(lex_p, 6, &numlen);
+			if (numlen == 0)  {
+			    yyerror("Invalid Unicode escape");
+			    return 0;
+			}
+			if (codepoint > 0x10ffff) {
+			    yyerror("Illegal Unicode codepoint (too large)");
+			    return 0;
+			}
+			lex_p += numlen;
+			
+			if ((brace = nextc()) != '}') {
+			    pushback(brace);
+			    yyerror("Unterminated Unicode escape");
+			    return 0;
+			}
+		    }
+		    else {                /* handle \uxxxx form */
+			pushback(brace);
+			codepoint = scan_hex(lex_p, 4, &numlen);
+			if (numlen < 4) {
+			    yyerror("Invalid Unicode escape");
+			    return 0;
+			}
+			lex_p += 4;
+		    }
+		
+		    if (codepoint >= 0x80) {
+			if (mb) *mb = ENC_CODERANGE_UNKNOWN;
+			parser->has_utf8 = 1;
+		    }
+		    
+		    if (codepoint < 0x80) { /* this case shouldn't happen */
+			tokadd(codepoint);
+		    }
+		    else if (codepoint < 0x800) {
+			tokadd(((codepoint >> 6)&0x1f) | 0xC0);
+			tokadd((codepoint & 0x3F) | 0x80);
+		    }
+		    else if (codepoint < 0x10000) {
+			tokadd(((codepoint >> 12) & 0x0f) | 0xe0);
+			tokadd(((codepoint >> 6)&0x3f) | 0x80);
+			tokadd((codepoint & 0x3F) | 0x80);
+		    }
+		    else if (codepoint < 0x110000) {
+			tokadd(((codepoint >> 18) & 0x07) | 0xf0);
+			tokadd(((codepoint >> 12) & 0x3f) | 0x80);
+			tokadd(((codepoint >> 6)&0x3f) | 0x80);
+			tokadd((codepoint & 0x3F) | 0x80);
+		    }
+		    else {
+			/* should not happen */
+			yyerror("Invalid Unicode codepoint");  
+		    }
+		    continue;
+		}
+
 	      default:
 		if (func & STR_FUNC_REGEXP) {
 		    pushback(c);
@@ -5301,6 +5371,7 @@
 	tokadd('#');
     }
     pushback(c);
+    parser->has_utf8 = 0;
     if (tokadd_string(func, term, paren, &quote->nd_nest, &mb) == -1) {
 	if (func & STR_FUNC_REGEXP) {
 	    ruby_sourceline = nd_line(quote);
@@ -5493,6 +5564,7 @@
 	}
 	do {
 	    pushback(c);
+	    parser->has_utf8 = 0;
 	    if ((c = tokadd_string(func, '\n', 0, NULL, mbp)) == -1) goto error;
 	    if (c != '\n') {
 		set_yylval_str(STR_NEW3(tok(), toklen(), mb));

In This Thread