[#12312] Need Japanese Help - VRuby & new One-Click Ruby Installer with patch 110 — "Curt Hibbs" <curt.hibbs@...>
I'm trying to build a new release of the One-Click Ruby Installer for
Hello,
Hello,
[#12328] Dir.chdir patch for MS Windows — "Berger, Daniel" <Daniel.Berger@...>
Hi,
[#12344] patch to implement Array.permutation — David Flanagan <david@...>
Hi,
[#12372] Release compatibility/train — Prashant Srinivasan <Prashant.Srinivasan@...>
Hello all,
Hi,
Yukihiro Matsumoto wrote:
Hi,
Yukihiro Matsumoto wrote:
Hi,
Yukihiro Matsumoto wrote:
Hi,
Hi --
On 10/3/07, David A. Black <dblack@rubypal.com> wrote:
Rick DeNatale wrote:
[#12383] Include Rake in Ruby 1.9 — "NAKAMURA, Hiroshi" <nakahiro@...>
-----BEGIN PGP SIGNED MESSAGE-----
On 10/3/07, NAKAMURA, Hiroshi <nakahiro@sarion.co.jp> wrote:
On Oct 3, 2007, at 08:59 , Jacob Fugal wrote:
-----BEGIN PGP SIGNED MESSAGE-----
On 10/15/07, NAKAMURA, Hiroshi <nakahiro@sarion.co.jp> wrote:
[#12539] Ordered Hashes in 1.9? — Michael Neumann <mneumann@...>
Hi all,
Hi,
Yukihiro Matsumoto wrote:
[#12568] $" and require — "Tim Morgan" <tmorgan99@...>
Hello!
[#12578] Possible memory leak in ruby-1.8.6-p110?? — "M. Edward (Ed) Borasky" <znmeb@...>
I haven't had a chance to narrow this down in enough detail yet, but
M. Edward (Ed) Borasky wrote:
On Thu, 11 Oct 2007, M. Edward (Ed) Borasky wrote:
[#12579] iconv enhancement in Ruby 1.9 — "Eugene Ossintsev" <eugoss@...>
Hi,
[#12587] Confusion about arities — Charles Oliver Nutter <charles.nutter@...>
It seems like a number of methods have unexpected arities. For example,
On Oct 10, 2007, at 22:44 , Charles Oliver Nutter wrote:
Eric Hodel wrote:
[#12588] MatchData#select rdoc and arity incorrect — Charles Oliver Nutter <charles.nutter@...>
Rdoc is here:
[#12617] Question about heap_slots in gc.c — Hongli Lai <h.lai@...>
I'm trying to modify the Ruby interpreter's garbage collector. At the
[#12618] StringIO is not IO? — Hongli Lai <h.lai@...>
According to irb,
[#12629] file encoding comments and a patch to parse.y — David Flanagan <david@...>
Matz, Nobu:
[#12632] Defining unicode methods — "Daniel Berger" <djberg96@...>
Hi all,
[#12670] Bug in Numeric#divmod — "Dirk Traulsen" <dirk.traulsen@...>
Hi all!
[#12681] Unicode: Progress? — murphy <murphy@...>
Hello!
murphy schrieb:
Hi,
Yukihiro Matsumoto wrote:
[#12693] retry: revised 1.9 http patch — Hugh Sasse <hgs@...>
I'm reposting this because I've had little response to this version
On Tue, Oct 16, 2007 at 01:32:42AM +0900, Hugh Sasse wrote:
Would this require that zlib be installed? I know that it's possible to
On Wed, 31 Oct 2007, Roger Pack wrote:
-----BEGIN PGP SIGNED MESSAGE-----
[#12697] Range.first is incompatible with Enumerable.first — David Flanagan <david@...>
The new Enumerable.first method is a generalization of Array.first to
Hi,
[#12703] Long encoding names with -K and bad error message — David Flanagan <david@...>
I noticed the following line in the change log:
Hi,
Nobuyoshi Nakada wrote:
Nobu,
At 16:04 07/10/17, David Flanagan wrote:
[#12706] Re: A couple of bugs? — "Gavin Kistner" <gavin.kistner@...>
From: John Lam (DLR) [mailto:jflam@microsoft.com]=20
On Wed, Oct 17, 2007 at 03:10:07AM +0900, Gavin Kistner wrote:
Well, that's interesting. Then this seems to be the only assignment that ha=
[#12710] enum.c patch: fixes Enumerable.cycle and rdoc bugs — David Flanagan <david@...>
The attached patch fixes:
Hi,
[#12714] Re: A couple of bugs? — "Gavin Kistner" <gavin.kistner@...>
> Well, that's interesting. Then this seems to be the only=20
[#12754] Improving 'syntax error, unexpected $end, expecting kEND'? — Hugh Sasse <hgs@...>
I've had a look at this, but can't see how to do it: When I get
On Fri, Oct 19, 2007 at 03:01:55AM +0900, Hugh Sasse wrote:
The patch below changes this message to:
At 04:15 07/10/24, David Flanagan wrote:
Thanks for filling these in Martin. I worry that this is such a simple
At 16:57 07/10/24, David Flanagan wrote:
Martin Duerst schrieb:
Hi,
[#12758] Encoding::primary_encoding — David Flanagan <david@...>
Hi,
Hi,
Nobuyoshi Nakada schrieb:
Hi,
Nobuyoshi Nakada schrieb:
Hi,
Nobuyoshi Nakada schrieb:
T24gMjIvMTAvMjAwNywgV29sZmdhbmcgTsOhZGFzaS1Eb25uZXIgPGVkLm9kYW5vd0B3b25hZG8u
Michal Suchanek schrieb:
Hi,
Nobuyoshi Nakada schrieb:
I made some tests with UFT-8, option "-Ku", option "-Ka" and both types of magic
[#12767] \u escapes in string literals: proof of concept implementation — David Flanagan <david@...>
Back at the end of August, Matz wrote (see
Hi,
Nobuyoshi Nakada wrote:
Hi,
Yukihiro Matsumoto wrote:
At 04:19 07/10/23, David Flanagan wrote:
Martin Duerst wrote:
Hi,
At 13:10 07/10/23, David Flanagan wrote:
Martin Duerst wrote:
Hi,
Yukihiro Matsumoto wrote:
Hi,
Nobuyoshi Nakada wrote:
Hi,
At 16:46 07/10/29, Nobuyoshi Nakada wrote:
Hi,
At 11:29 07/11/06, Nobuyoshi Nakada wrote:
Hi,
Yukihiro Matsumoto wrote:
[#12787] How to specify in Ruby 1.9 the expected file encoding — =?ISO-8859-15?Q?Wolfgang_N=E1dasi-Donner?= <ed.odanow@...>
Dear Ruby developers!
Wolfgang N疆asi-Donner wrote:
Gonzalo Garramu schrieb:
Hi,
Yukihiro Matsumoto schrieb:
I wouldn't want a program to write a BOM at the start of a file
[#12795] patch for String.concat — David Flanagan <david@...>
I don't think that String.<< currently handles appending codepoints
[#12825] clarification of ruby libraries installation paths? — Lucas Nussbaum <lucas@...>
Hi,
On Mon, Oct 22, 2007, Lucas Nussbaum wrote:
On 23/10/07 at 00:13 +0900, Ben Bleything wrote:
On 10/22/07, Lucas Nussbaum <lucas@lucas-nussbaum.net> wrote:
On 23/10/07 at 01:55 +0900, Austin Ziegler wrote:
Lucas Nussbaum wrote:
On 24/10/07 at 05:14 +0900, Gonzalo Garramu wrote:
Lucas Nussbaum wrote:
On 30/10/07 at 07:28 +0900, Gonzalo Garramu wrote:
On 10/29/07, Lucas Nussbaum <lucas@lucas-nussbaum.net> wrote:
Austin Ziegler wrote:
On 10/30/07, Mathieu Blondel <mblondel@rubyforge.org> wrote:
On Tue, Oct 23, 2007 at 01:55:29AM +0900, Austin Ziegler wrote:
On 10/22/07, Sam Roberts <sroberts@uniserve.com> wrote:
Austin Ziegler wrote:
On 10/28/07, Bob Proulx <bob@proulx.com> wrote:
Austin,
On 10/29/07, Lucas Nussbaum <lucas@lucas-nussbaum.net> wrote:
On 10/29/07, Luis Lavena <luislavena@gmail.com> wrote:
On 10/30/07, Austin Ziegler <halostatue@gmail.com> wrote:
Do we think that maybe, just maybe, things went off the rails when the
On 10/30/07, Rick Bradley <rick@rickbradley.com> wrote:
On Tue, 30 Oct 2007 22:52:29 +0900, "Luis Lavena" <luislavena@gmail.com> wrote:
[#12849] Problem reported in Rdoc (Ruby 1.9) Rdoc for Ruby 1.8 works — =?ISO-8859-15?Q?Wolfgang_N=E1dasi-Donner?= <ed.odanow@...>
Hi!
[#12867] constant lookup rules in 1.9 — David Flanagan <david@...>
Hi,
[#12895] OSX patches — "Laurent Sansonetti" <laurent.sansonetti@...>
Hi ruby-core,
[#12900] Hopefully Complete List of Possible Encoding Specifications - Existing Ones — Wolfgang Nádasi-Donner <ed.odanow@...>
Dear Ruby 1.9 architects, developers, and testers!
Hi,
Yukihiro Matsumoto schrieb:
Hi,
Yukihiro Matsumoto schrieb:
I have a (hopefully) final question before testing all
Hi,
Wolfgang N叩dasi-Donner wrote:
David Flanagan schrieb:
At 10:30 07/10/26, Nobuyoshi Nakada wrote:
Yukihiro Matsumoto wrote:
On 10/25/07, Yukihiro Matsumoto <matz@ruby-lang.org> wrote:
[#12951] Fluent programming in Ruby — David Flanagan <david@...>
From the ChangeLog:
At 14:01 07/10/26, David Flanagan wrote:
Martin Duerst schrieb:
[#12971] Re: Fluent programming in Ruby — Brent Roman <brent@...>
I suppose you could have irb require a terminating ';'
> -----Original Message-----
On 10/26/07, Berger, Daniel <Daniel.Berger@qwest.com> wrote:
[#12996] General hash keys for colon notation — murphy <murphy@...>
Dear language designer(s) and parser wizards,
On 10/28/07, murphy <murphy@rubychan.de> wrote:
On 10/28/07, Rick DeNatale <rick.denatale@gmail.com> wrote:
Rick DeNatale wrote:
[#13027] Implementation of "guessUTF" method - final questions — Wolfgang Nádasi-Donner <ed.odanow@...>
Dear Ruby designers, developers, and testers!
On 10/29/07, Wolfgang N=E1dasi-Donner <ed.odanow@wonado.de> wrote:
Nikolai Weibull schrieb:
On 10/29/07, Wolfgang N=E1dasi-Donner <ed.odanow@wonado.de> wrote:
Nikolai Weibull schrieb:
Hello Wolfgang,
At 17:50 07/10/29, Nikolai Weibull wrote:
On 10/29/07, Martin Duerst <duerst@it.aoyama.ac.jp> wrote:
[#13069] new Enumerable.butfirst method — David Flanagan <david@...>
Matz,
Hi,
Yukihiro Matsumoto wrote:
Hi,
[#13083] Didn't find String#subseq — Wolfgang Nádasi-Donner <ed.odanow@...>
Hi!
[#13096] 1.8.6 gc.c thoughts — "Roger Pack" <rogerpack2005@...>
After examining how the 1.8.6 gc works, I had a few thoughts:
[#13107] %s and utf8 ? — hadmut@... (Hadmut Danisch)
Hi,
[#13135] patch for lib/net/http.rb, self['User-Agent'] ||= 'Ruby' — Stephen Bannasch <stephen.bannasch@...>
I posted this patch before in the middle of another thread and didn't
Hi Stephen,
In article <9079DC13-476F-4C12-922E-E197BD5AAA5C@loveruby.net>,
[#13139] Required Space for Unicode Character Attribute Tables — Wolfgang Nádasi-Donner <ed.odanow@...>
Hi!
[#13143] Two Issues (open-uri's respond_to? and autoload's require) — Trans <transfire@...>
Hi--
-----BEGIN PGP SIGNED MESSAGE-----
Re: \u escapes in string literals: proof of concept implementation
Hi,
At Tue, 23 Oct 2007 13:10:40 +0900,
David Flanagan wrote in [ruby-core:12864]:
> > That things are simpler is quite clear. But option a wouldn't be
> > difficult to implement, either, I guess. My suggestion is to stay
> > with option a until we have a better idea of which of b and c is
> > really needed/implementable/...
>
> The appeal, to me, of my current version is that it is independent of
> the primary encoding. \u escapes work no matter what -K option you
> specify, and they always translate to a specific byte sequence. On the
> other hand, I really don't know how to handle strings that mix
> sjis-encoded Kanji characters with \u escapes. What should the encoding
> of the resulting string be?
That's garbage.
> So chosing option a might be the best bet: \u escapes cause an error
> with -Ks or -Ke. They are only allowed when the primary encoding is
> ascii or utf-8. I think that would mean that no transcoding would be
> necessary. I also think that my current patch doesn't need much
> modification: just the addition of errors when the encoding does not
> allow \u.
Another option is c of [ruby-core:12769].
Index: parse.y
===================================================================
--- parse.y (revision 13774)
+++ parse.y (working copy)
@@ -238,4 +238,5 @@ struct parser_params {
int parser_ruby_sourceline; /* current line no. */
rb_encoding *enc;
+ rb_encoding *utf8;
#ifndef RIPPER
@@ -261,8 +262,11 @@ struct parser_params {
};
+#define UTF8_ENC() (parser->utf8 ? parser->utf8 : \
+ (parser->utf8 = rb_enc_find("utf-8")))
#define STR_NEW(p,n) rb_enc_str_new((p),(n),parser->enc)
#define STR_NEW0() rb_str_new(0,0)
#define STR_NEW2(p) rb_enc_str_new((p),strlen(p),parser->enc)
#define STR_NEW3(p,n,m) parser_str_new((p),(n),STR_ENC(!ENC_SINGLE(m)),(m))
+#define STR_NEW4(p,n,e,m) parser_str_new((p),(n), (e), (m))
#define STR_ENC(m) ((m)?parser->enc:rb_enc_from_index(0))
#define ENC_SINGLE(cr) ((cr)==ENC_CODERANGE_SINGLE)
@@ -4488,5 +4492,5 @@ none : /* none */
static int parser_regx_options(struct parser_params*);
-static int parser_tokadd_string(struct parser_params*,int,int,int,long*,int*);
+static int parser_tokadd_string(struct parser_params*,int,int,int,long*,int*,rb_encoding**);
static int parser_parse_string(struct parser_params*,NODE*);
static int parser_here_document(struct parser_params*,NODE*);
@@ -4497,8 +4501,10 @@ static int parser_here_document(struct p
# define tokspace(n) parser_tokspace(parser, n)
# define tokadd(c) parser_tokadd(parser, c)
-# define read_escape(m) parser_read_escape(parser, m)
-# define tokadd_escape(t,m) parser_tokadd_escape(parser, t, m)
+# define tok_hex(numlen) parser_tok_hex(parser, numlen)
+# define tok_utf8(numlen,e) parser_tok_utf8(parser, numlen, e)
+# define read_escape(flags,m,e) parser_read_escape(parser, flags, m, e)
+# define tokadd_escape(t,m,e) parser_tokadd_escape(parser, t, m, e)
# define regx_options() parser_regx_options(parser)
-# define tokadd_string(f,t,p,n,m) parser_tokadd_string(parser,f,t,p,n,m)
+# define tokadd_string(f,t,p,n,m,e) parser_tokadd_string(parser,f,t,p,n,m, e)
# define parse_string(n) parser_parse_string(parser,n)
# define here_document(n) parser_here_document(parser,n)
@@ -4938,5 +4944,73 @@ parser_tokadd(struct parser_params *pars
static int
-parser_read_escape(struct parser_params *parser, int *mb)
+parser_tok_hex(struct parser_params *parser, int *numlen)
+{
+ int c;
+
+ if (peek('{')) {
+ nextc();
+ c = scan_hex(lex_p, 8, numlen);
+ if (!*numlen) goto invalid;
+ if (!peek('}')) {
+ yyerror("unterminated hex escape");
+ return 0;
+ }
+ nextc();
+ *numlen += 2;
+ }
+ else {
+ c = scan_hex(lex_p, 2, numlen);
+ if (!*numlen) {
+ invalid:
+ yyerror("invalid hex escape");
+ return 0;
+ }
+ }
+ return c;
+}
+
+static int
+parser_tok_utf8(struct parser_params *parser, int *numlen, rb_encoding **encp)
+{
+ int codepoint;
+
+ if (peek('{')) { /* handle \u{...} form */
+ nextc();
+ codepoint = scan_hex(lex_p, 8, numlen);
+ if (*numlen == 0) {
+ yyerror("invalid Unicode escape");
+ return 0;
+ }
+ if (codepoint > 0x7fffffff) {
+ yyerror("illegal Unicode codepoint (too large)");
+ return 0;
+ }
+ lex_p += *numlen;
+ if (!peek('}')) {
+ yyerror("unterminated Unicode escape");
+ return 0;
+ }
+ nextc();
+ }
+ else { /* handle \uxxxx form */
+ codepoint = scan_hex(lex_p, 4, numlen);
+ if (*numlen < 4) {
+ yyerror("invalid Unicode escape");
+ return 0;
+ }
+ lex_p += 4;
+ }
+ if (codepoint >= 0x80) {
+ *encp = UTF8_ENC();
+ }
+
+ return codepoint;
+}
+
+#define ESCAPE_CONTROL 1
+#define ESCAPE_META 2
+
+static int
+parser_read_escape(struct parser_params *parser, int flags, int *mb, rb_encoding **encp)
{
int c;
@@ -4969,4 +5043,5 @@ parser_read_escape(struct parser_params
case '0': case '1': case '2': case '3': /* octal constant */
case '4': case '5': case '6': case '7':
+ if (flags & (ESCAPE_CONTROL|ESCAPE_META)) goto eof;
{
int numlen;
@@ -4980,13 +5055,21 @@ parser_read_escape(struct parser_params
case 'x': /* hex constant */
+ if (flags & (ESCAPE_CONTROL|ESCAPE_META)) goto eof;
{
int numlen;
- c = scan_hex(lex_p, 2, &numlen);
- if (numlen == 0) {
- yyerror("Invalid escape character syntax");
- return 0;
- }
- lex_p += numlen;
+ c = tok_hex(&numlen);
+ if (numlen == 0) goto eof;
+ }
+ if (mb && (c >= 0x80)) *mb = ENC_CODERANGE_UNKNOWN;
+ return c;
+
+ case 'u': /* hex constant */
+ if (flags & (ESCAPE_CONTROL|ESCAPE_META)) goto eof;
+ {
+ int numlen;
+
+ c = tok_utf8(&numlen, encp);
+ if (numlen == 0) goto eof;
}
if (mb && (c >= 0x80)) *mb = ENC_CODERANGE_UNKNOWN;
@@ -5000,12 +5083,12 @@ parser_read_escape(struct parser_params
case 'M':
+ if (flags & ESCAPE_META) goto eof;
if ((c = nextc()) != '-') {
- yyerror("Invalid escape character syntax");
pushback(c);
- return '\0';
+ goto eof;
}
if ((c = nextc()) == '\\') {
if (mb) *mb = ENC_CODERANGE_UNKNOWN;
- return read_escape(0) | 0x80;
+ return read_escape(flags|ESCAPE_META, 0, encp) | 0x80;
}
else if (c == -1) goto eof;
@@ -5017,11 +5100,11 @@ parser_read_escape(struct parser_params
case 'C':
if ((c = nextc()) != '-') {
- yyerror("Invalid escape character syntax");
pushback(c);
- return '\0';
+ goto eof;
}
case 'c':
+ if (flags & ESCAPE_CONTROL) goto eof;
if ((c = nextc())== '\\') {
- c = read_escape(mb);
+ c = read_escape(flags|ESCAPE_CONTROL, mb, encp);
}
else if (c == '?')
@@ -5040,9 +5123,13 @@ parser_read_escape(struct parser_params
}
+#define tokcopy(n) memcpy(tokspace(n), lex_p - (n), (n))
+
static int
-parser_tokadd_escape(struct parser_params *parser, int term, int *mb)
+parser_tokadd_escape(struct parser_params *parser, int term, int *mb, rb_encoding **encp)
{
int c;
+ int flags = 0;
+ first:
switch (c = nextc()) {
case '\n':
@@ -5051,17 +5138,13 @@ parser_tokadd_escape(struct parser_param
case '0': case '1': case '2': case '3': /* octal constant */
case '4': case '5': case '6': case '7':
+ if (flags & (ESCAPE_CONTROL|ESCAPE_META)) goto eof;
{
int numlen;
int oct;
- tokadd('\\');
- pushback(c);
- oct = scan_oct(lex_p, 3, &numlen);
- if (numlen == 0) {
- yyerror("Invalid escape character syntax");
- return -1;
- }
- while (numlen--)
- tokadd(nextc());
+ oct = scan_oct(--lex_p, 3, &numlen);
+ if (numlen == 0) goto eof;
+ lex_p += numlen;
+ tokcopy(numlen + 1);
if (mb && (oct >= 0200)) *mb = ENC_CODERANGE_UNKNOWN;
}
@@ -5069,45 +5152,59 @@ parser_tokadd_escape(struct parser_param
case 'x': /* hex constant */
+ if (flags & (ESCAPE_CONTROL|ESCAPE_META)) goto eof;
{
int numlen;
int hex;
- tokadd('\\');
- tokadd(c);
- hex = scan_hex(lex_p, 2, &numlen);
- if (numlen == 0) {
- yyerror("Invalid escape character syntax");
- return -1;
- }
- while (numlen--)
- tokadd(nextc());
+ hex = tok_hex(&numlen);
+ if (numlen == 0) goto eof;
+ lex_p += numlen;
+ tokcopy(numlen + 2);
if (mb && (hex >= 0x80)) *mb = ENC_CODERANGE_UNKNOWN;
}
return 0;
+ case 'u': /* Unicode constant */
+ if (flags & (ESCAPE_CONTROL|ESCAPE_META)) goto eof;
+ {
+ int numlen;
+ int uc;
+
+ uc = tok_utf8(&numlen, encp);
+ if (numlen == 0) goto eof;
+ lex_p += numlen;
+ tokcopy(numlen + 2);
+ if (mb && (uc >= 0x80)) *mb = ENC_CODERANGE_MULTI;
+ if (uc >= 0x80) return 1;
+ }
+ return 0;
+
case 'M':
+ if (flags & ESCAPE_META) goto eof;
if ((c = nextc()) != '-') {
- yyerror("Invalid escape character syntax");
pushback(c);
- return 0;
+ goto eof;
}
- tokadd('\\'); tokadd('M'); tokadd('-');
+ tokcopy(3);
if (mb) *mb = ENC_CODERANGE_UNKNOWN;
+ flags |= ESCAPE_META;
goto escaped;
case 'C':
+ if (flags & ESCAPE_CONTROL) goto eof;
if ((c = nextc()) != '-') {
- yyerror("Invalid escape character syntax");
pushback(c);
- return 0;
+ goto eof;
}
- tokadd('\\'); tokadd('C'); tokadd('-');
+ tokcopy(3);
goto escaped;
case 'c':
- tokadd('\\'); tokadd('c');
+ if (flags & ESCAPE_CONTROL) goto eof;
+ tokcopy(2);
+ flags |= ESCAPE_CONTROL;
escaped:
if ((c = nextc()) == '\\') {
- return tokadd_escape(term, mb);
+ goto first;
}
else if (c == -1) goto eof;
@@ -5190,16 +5287,47 @@ parser_tokadd_mbchar(struct parser_param
{
int len = parser_mbclen();
- do {
- tokadd(c);
- } while (--len > 0 && (c = nextc()) != -1);
+ tokadd(c);
+ lex_p += --len;
+ if (len > 0) tokcopy(len);
}
#define tokadd_mbchar(c) parser_tokadd_mbchar(parser, c)
+static void
+parser_tokaddmbc(struct parser_params *parser, int c, rb_encoding *enc)
+{
+ int len = rb_enc_codelen(c, enc);
+ rb_enc_mbcput(c, tokspace(len), enc);
+}
+
+#define tokaddmbc(c, enc) parser_tokaddmbc(parser, c, enc)
+
static int
parser_tokadd_string(struct parser_params *parser,
- int func, int term, int paren, long *nest, int *mb)
+ int func, int term, int paren, long *nest,
+ int *mb, rb_encoding **encp)
{
int c;
+ int has_mb = 0;
+ rb_encoding *enc = *encp;
+ char *errbuf = 0;
+ static const char mixed_msg[] = "%s mixed within %s source";
+
+#define mixed_error(enc1, enc2) if (!errbuf) { \
+ int len = sizeof(mixed_msg) - 4; \
+ len += strlen(rb_enc_name(enc1)); \
+ len += strlen(rb_enc_name(enc2)); \
+ errbuf = ALLOCA_N(char, len); \
+ snprintf(errbuf, len, mixed_msg, \
+ rb_enc_name(enc1), \
+ rb_enc_name(enc2)); \
+ yyerror(errbuf); \
+ }
+#define mixed_escape(beg, enc1, enc2) do { \
+ const char *pos = lex_p; \
+ lex_p = beg; \
+ mixed_error(enc1, enc2); \
+ lex_p = pos; \
+ } while (0)
while ((c = nextc()) != -1) {
@@ -5222,4 +5350,5 @@ parser_tokadd_string(struct parser_param
}
else if (c == '\\') {
+ const char *beg = lex_p - 1;
c = nextc();
switch (c) {
@@ -5237,6 +5366,9 @@ parser_tokadd_string(struct parser_param
if (func & STR_FUNC_REGEXP) {
pushback(c);
- if (tokadd_escape(term, mb) < 0)
+ if ((c = tokadd_escape(term, mb, &enc)) < 0)
return -1;
+ if (has_mb && enc != *encp) {
+ mixed_escape(beg, enc, *encp);
+ }
continue;
}
@@ -5244,5 +5376,13 @@ parser_tokadd_string(struct parser_param
pushback(c);
if (func & STR_FUNC_ESCAPE) tokadd('\\');
- c = read_escape(mb);
+ c = read_escape(0, mb, &enc);
+ if (has_mb && enc != *encp) {
+ mixed_escape(beg, enc, *encp);
+ continue;
+ }
+ if (c >= 0x80) {
+ tokaddmbc(c, enc);
+ continue;
+ }
}
else if ((func & STR_FUNC_QWORDS) && ISSPACE(c)) {
@@ -5255,4 +5395,9 @@ parser_tokadd_string(struct parser_param
}
else if (parser_ismbchar()) {
+ has_mb = 1;
+ if (enc != *encp) {
+ mixed_error(enc, *encp);
+ continue;
+ }
tokadd_mbchar(c);
if (mb) *mb = ENC_CODERANGE_MULTI;
@@ -5270,4 +5415,5 @@ parser_tokadd_string(struct parser_param
tokadd(c);
}
+ *encp = enc;
return c;
}
@@ -5283,4 +5429,5 @@ parser_parse_string(struct parser_params
int paren = nd_paren(quote);
int c, space = 0, mb = ENC_CODERANGE_SINGLE;
+ rb_encoding *enc = parser->enc;
if (func == -1) return tSTRING_END;
@@ -5316,12 +5463,11 @@ parser_parse_string(struct parser_params
}
pushback(c);
- if (tokadd_string(func, term, paren, "e->nd_nest, &mb) == -1) {
+ if (tokadd_string(func, term, paren, "e->nd_nest, &mb, &enc) == -1) {
+ ruby_sourceline = nd_line(quote);
if (func & STR_FUNC_REGEXP) {
- ruby_sourceline = nd_line(quote);
compile_error(PARSER_ARG "unterminated regexp meets end of file");
return tREGEXP_END;
}
else {
- ruby_sourceline = nd_line(quote);
compile_error(PARSER_ARG "unterminated string meets end of file");
return tSTRING_END;
@@ -5330,5 +5476,5 @@ parser_parse_string(struct parser_params
tokfix();
- set_yylval_str(STR_NEW3(tok(), toklen(), mb));
+ set_yylval_str(STR_NEW4(tok(), toklen(), enc, mb));
return tSTRING_CONTENT;
}
@@ -5494,4 +5640,5 @@ parser_here_document(struct parser_param
else {
int mb = ENC_CODERANGE_SINGLE, *mbp = &mb;
+ rb_encoding *enc = parser->enc;
newtok();
if (c == '#') {
@@ -5508,7 +5655,7 @@ parser_here_document(struct parser_param
do {
pushback(c);
- if ((c = tokadd_string(func, '\n', 0, NULL, mbp)) == -1) goto error;
+ if ((c = tokadd_string(func, '\n', 0, NULL, mbp, &enc)) == -1) goto error;
if (c != '\n') {
- set_yylval_str(STR_NEW3(tok(), toklen(), mb));
+ set_yylval_str(STR_NEW4(tok(), toklen(), enc, mb));
return tSTRING_CONTENT;
}
@@ -5517,5 +5664,5 @@ parser_here_document(struct parser_param
if ((c = nextc()) == -1) goto error;
} while (!whole_match_p(eos, len, indent));
- str = STR_NEW3(tok(), toklen(), mb);
+ str = STR_NEW4(tok(), toklen(), enc, mb);
}
heredoc_restore(lex_strterm);
@@ -5778,4 +5925,5 @@ parser_yylex(struct parser_params *parse
enum lex_state_e last_state;
int mb;
+ rb_encoding *enc;
#ifdef RIPPER
int fallthru = Qfalse;
@@ -6099,4 +6247,5 @@ parser_yylex(struct parser_params *parse
}
newtok();
+ enc = parser->enc;
if (parser_ismbchar()) {
mb = ENC_CODERANGE_MULTI;
@@ -6107,8 +6256,7 @@ parser_yylex(struct parser_params *parse
goto ternary;
}
- else if (c == '\\' && (c = read_escape(0)) >= 0x80) {
- rb_encoding *enc = parser->enc;
+ else if (c == '\\' && (c = read_escape(0, 0, &enc)) >= 0x80) {
mb = ENC_CODERANGE_UNKNOWN;
- rb_enc_mbcput(c, tokspace(rb_enc_codelen(c, enc)), enc);
+ tokaddmbc(c, enc);
}
else {
@@ -6117,5 +6265,5 @@ parser_yylex(struct parser_params *parse
}
tokfix();
- set_yylval_str(STR_NEW3(tok(), toklen(), mb));
+ set_yylval_str(parser_str_new(tok(), toklen(), enc, mb));
lex_state = EXPR_ENDARG;
return tCHAR;
@@ -7187,4 +7335,15 @@ list_concat_gen(struct parser_params *pa
}
+static void
+literal_concat0(struct parser_params *parser, VALUE head, VALUE tail)
+{
+ if (!rb_enc_compatible(head, tail)) {
+ compile_error(PARSER_ARG "string literal encodings differ (%s / %s)",
+ rb_enc_name(rb_enc_get(head)),
+ rb_enc_name(rb_enc_get(tail)));
+ }
+ rb_str_buf_append(head, tail);
+}
+
/* concat two string literals */
static NODE *
@@ -7204,5 +7363,5 @@ literal_concat_gen(struct parser_params
case NODE_STR:
if (htype == NODE_STR) {
- rb_str_concat(head->nd_lit, tail->nd_lit);
+ literal_concat0(parser, head->nd_lit, tail->nd_lit);
rb_gc_force_recycle((VALUE)tail);
}
@@ -7214,5 +7373,5 @@ literal_concat_gen(struct parser_params
case NODE_DSTR:
if (htype == NODE_STR) {
- rb_str_concat(head->nd_lit, tail->nd_lit);
+ literal_concat0(parser, head->nd_lit, tail->nd_lit);
tail->nd_lit = head->nd_lit;
rb_gc_force_recycle((VALUE)head);
--
Nobu Nakada