[#18436] [ANN] Ruby 1.9.1 feature freeze — "Yugui (Yuki Sonoda)" <yugui@...>
Hi all,
On Tue, Sep 2, 2008 at 3:09 PM, Yugui (Yuki Sonoda) <yugui@yugui.jp> wrote:
Michael Fellinger schrieb:
On 12/09/2008, Michael Neumann <mneumann@ntecs.de> wrote:
Hi,
Hi, Yusuke
Hi,
Ryan Davis wrote:
Dave Thomas wrote:
Jim Weirich wrote:
On Wed, Oct 8, 2008 at 3:05 AM, Ryan Davis <ryand-ruby@zenspider.com> wrote=
On Wed, Oct 08, 2008 at 09:28:22PM +0900, Austin Ziegler wrote:
2008/10/8 Paul Brannan <pbrannan@atdesk.com>:
T24gV2VkLCBPY3QgOCwgMjAwOCBhdCA0OjM4IFBNLCBQaXQgQ2FwaXRhaW4gPHBpdC5jYXBpdGFp
Trans wrote:
Hi,
Hi,
NARUSE, Yui wrote:
On Fri, Oct 3, 2008 at 12:01 AM, David Flanagan <david@davidflanagan.com> wrote:
[#18437] Class as second-generation singleton class — "David A. Black" <dblack@...>
Hi --
[#18444] [PATCH] remove timer signal after last ruby thread has died — Joe Damato <ice799@...>
Hi -
Hi,
[#18446] Global constants and other magic in 1.9 stdlib — "Michal Suchanek" <hramrach@...>
Hello
On Thu, Sep 4, 2008 at 05:01, Michal Suchanek <hramrach@centrum.cz> wrote:
[#18447] useless external functions — SASADA Koichi <ko1@...>
Hi,
[#18452] [ANN] Ruby 1.9.1 feature freeze — "Roger Pack" <rogerpack2005@...>
Would it be possible to have a few patches applied before freeze [if
Hi,
Hi,
Hi,
[#18454] WEBrick issue - HTTP/1.1 and IO objects — Brian Candler <B.Candler@...>
I am wondering if the following is a bug in WEBrick.
[#18486] Ruby 1.9 strings & character encoding — "Michael Selig" <michael.selig@...>
Firstly, I apologise if I am going over old ground here - I haven't been
Hi,
On Mon, 08 Sep 2008 19:45:36 +1000, Yukihiro Matsumoto
Hi,
On Sep 8, 2008, at 10:43 AM, NARUSE, Yui wrote:
# First off, I'm neutral to this issue
On Sep 8, 2008, at 9:06 PM, Urabe Shyouhei wrote:
In article <3119E5AB-AEC8-4FEE-B2FA-8C75482E0E9D@sun.com>,
At 18:07 08/09/10, Manfred Stienstra wrote:
In article <6.0.0.20.2.20080916184943.08a281f0@localhost>,
On 16/09/2008, Tanaka Akira <akr@fsij.org> wrote:
In article <a5d587fb0809170303x71ebde31r8adae082b82af182@mail.gmail.com>,
On Tue, 09 Sep 2008 03:43:54 +1000, NARUSE, Yui <naruse@airemix.jp> wrote:
In article <op.ug6ubske9245dp@kool>,
In article <9888DBB2-0FE8-4C5C-8EF0-02D7C30157FA@pragprog.com>,
[#18513] Make irb start a new line on EOF — "Daniel Luz" <dev@...>
Other interactive interpreters (namely `python`, `lua`, `psh`, and
[#18522] Warning for trailing comma in method declarations — Kornelius Kalnbach <murphy@...>
hello!
[#18525] Ruby for OS/2 Maintainer — "Brendan Oakley" <gentux2@...>
Hello.
[#18532] Ruby 1.9 string performance — "Michael Selig" <michael.selig@...>
I would like to submit the attached patch to string.c which substantially
[#18535] [Bug #557] Regexp does not match longest string — Wim Yedema <redmine@...>
Bug #557: Regexp does not match longest string
Wim Yedema schrieb:
2008/9/10 Wolfgang N=E1dasi-Donner <ed.odanow@wonado.de>:
Robert Klemme schrieb:
[#18572] Working on CSV's Encoding Support — James Gray <james@...>
I'm trying to get the standard CSV library ready for m17n in Ruby
On Sat, Sep 13, 2008 at 6:32 PM, James Gray <james@grayproductions.net> wrote:
On Sep 13, 2008, at 5:44 PM, Gregory Brown wrote:
On Sep 13, 2008, at 5:39 PM, James Gray wrote:
On Sep 13, 2008, at 11:55 PM, James Gray wrote:
At 00:43 08/09/15, James Gray wrote:
On Sun, 14 Sep 2008 14:48:47 +1000, James Gray <james@grayproductions.net>
On Sep 14, 2008, at 2:49 AM, Michael Selig wrote:
On Mon, 15 Sep 2008 04:51:55 +1000, James Gray <james@grayproductions.net>
On Sep 14, 2008, at 6:48 PM, Michael Selig wrote:
On Mon, 15 Sep 2008 10:45:52 +1000, James Gray <james@grayproductions.net>
On Sep 14, 2008, at 8:42 PM, Michael Selig wrote:
[#18594] [Bug #564] Regexp fails on UTF-16 & UTF-32 character encodings — Michael Selig <redmine@...>
Bug #564: Regexp fails on UTF-16 & UTF-32 character encodings
In article <48cddb5533ad_8725cd9524342@redmine.ruby-lang.org>,
On Mon, 15 Sep 2008 18:08:14 +1000, Tanaka Akira <akr@fsij.org> wrote:
[#18600] [Bug #566] String encoding error messages are inconsistent — Michael Selig <redmine@...>
Bug #566: String encoding error messages are inconsistent
[#18631] Request: File.binread (Or File.read_binary) — "Gregory Brown" <gregory.t.brown@...>
Just incase it got lost in the other thread, I'd like to recommend the
Hi,
On Wed, Sep 17, 2008 at 12:35 PM, Yukihiro Matsumoto <matz@ruby-lang.org> wrote:
On Sep 17, 2008, at 09:48 AM, Gregory Brown wrote:
On Sep 18, 2008, at 6:56 PM, Eric Hodel wrote:
[#18637] Reading non-ascii compatible files — "Michael Selig" <michael.selig@...>
Hi,
Hi,
[#18640] Character encodings - a radical suggestion — "Michael Selig" <michael.selig@...>
Hi,
On Sep 16, 2008, at 8:20 PM, Michael Selig wrote:
On Sep 16, 2008, at 8:20 PM, Michael Selig wrote:
On Wed, 17 Sep 2008 12:51:14 +1000, James Gray <james@grayproductions.net>
On Sep 16, 2008, at 11:21 PM, Michael Selig wrote:
Hi,
On 9/17/2008 3:39 PM, NARUSE, Yui wrote:
Hi,
Hi,
On Sep 17, 2008, at 9:45 AM, NARUSE, Yui wrote:
At 00:01 08/09/18, Yukihiro Matsumoto wrote:
Hi,
On Fri, 19 Sep 2008 18:24:41 +1000, Yukihiro Matsumoto
Oops, I misfired my mail reader; the following is the right one:
On Fri, 19 Sep 2008 19:52:30 +1000, Yukihiro Matsumoto
Hi,
On Sun, 21 Sep 2008 02:05:30 +1000, Yukihiro Matsumoto
Hello Michael,
On Sep 21, 2008, at 9:35 PM, Martin Duerst wrote:
On Mon, 22 Sep 2008 12:35:49 +1000, Martin Duerst <duerst@it.aoyama.ac.jp>
At 12:25 08/09/22, Michael Selig wrote:
On Sep 21, 2008, at 9:35 PM, Martin Duerst wrote:
Hi,
Hi,
----- Original Message -----
On Sep 17, 2008, at 9:32 PM, Michael Selig wrote:
On Sep 17, 2008, at 8:43 PM, James Gray wrote:
[#18698] Next design meeting — Evan Phoenix <evan@...>
Hi everyone,
[#18710] Encoding Safe Regexp.escape() — James Gray <james@...>
As part of my ongoing process to make CSV m17n savvy, I'm needing an =20
[#18750] M17N Inspect Messages — James Gray <james@...>
What is the correct way to handle inspect() with regards to M17N? Do
[#18762] [Feature #578] add method to disassemble Proc objects — Roger Pack <redmine@...>
Feature #578: add method to disassemble Proc objects
[#18813] Feature idea: Class#subclasses — Charles Oliver Nutter <charles.nutter@...>
In JRuby we have added an extension that provides a "subclasses" method
[#18815] mv trunk/include/ruby/node.h to trunk/node.h — SASADA Koichi <ko1@...>
I moved trunk/include/ruby/node.h to trunk/node.h. On 1.9, only
[#18820] miniunit added — Ryan Davis <ryand-ruby@...>
I've replaced test/unit with miniunit in order to meet the feature
SASADA Koichi wrote:
I got it.
[#18844] [Bug #592] String#rstrip sometimes strips NULLs, sometimes doesn't - encoding dependent — Michael Selig <redmine@...>
Bug #592: String#rstrip sometimes strips NULLs, sometimes doesn't - encoding dependent
[#18861] tokenizing regular expressions when passed as method params — "Seth Dillingham" <seth.dillingham@...>
Hi,
[#18866] I'm changing the PickAxe to document miniunit — Dave Thomas <dave@...>
What's the correct way to load it up:
[#18872] [RIP] Guy Decoux. — "Jean-Fran輟is Tr穗" <jftran@...>
Hello,
[#18879] Mini Unit changing exceptions — Jim Weirich <jim.weirich@...>
Why does mini-unit change the exception in the test below?
On Sep 25, 2008, at 3:13 AM, Ryan Davis wrote:
[#18888] Re: [ruby-cvs:26761] Ruby:r19543 (trunk): Not a typo. The name is better plural. Better English and more consistent with the other assertions. — Nobuyoshi Nakada <nobu@...>
Hi,
[#18899] refute_{equal, match, nil, same} is not useful — Fujioka <fuj@...>
Hi,
On Thu, Sep 25, 2008 at 8:15 AM, Fujioka <fuj@rabbix.jp> wrote:
On Tue, Oct 7, 2008 at 10:40 PM, Ryan Davis <ryand-ruby@zenspider.com> wrote:
>I can actually see Ryan's point of saying that "refute_equal a, b"
Related to this:
On Wed, Oct 8, 2008 at 2:48 AM, Martin Duerst <duerst@it.aoyama.ac.jp>wrote:
2008/10/8 Eric Mahurin :
On Wed, Oct 8, 2008 at 5:08 PM, Jean-Fran=E7ois Tr=E2n
[#18905] output format of miniunit — "Yusuke ENDOH" <mame@...>
Hi,
Hi,
[#18931] test/testunit and miniunit — Tanaka Akira <akr@...>
Currently test-all exits prematurely.
[#18934] [ANN] delay of releasing 1.9.0-5 — "Yugui (Yuki Sonoda)" <yugui@...>
Hi,
[#18937] A stupid question... — Dave Thomas <dave@...>
Just what was wrong with Test::Unit? Sure, it was slightly bloated.
> -----Original Message-----
On Sun, Sep 28, 2008 at 9:10 PM, Trans <transfire@gmail.com> wrote:
On Mon, Sep 29, 2008 at 1:20 AM, Meinrad Recheis
On Sep 28, 2008, at 3:19 PM, hemant wrote:
2008/9/28 Trans <transfire@gmail.com>:
[#18944] [RCR] $ABOUT.ts — _why <why@...>
I don't want to be indelicate and we can address this some other
[#18985] Encodings::default_internal patch — "Michael Selig" <michael.selig@...>
Hi,
On Sep 27, 2008, at 2:28 AM, Michael Selig wrote:
On Sun, 28 Sep 2008 02:02:57 +1000, James Gray <james@grayproductions.net>
On Sep 27, 2008, at 8:56 PM, Michael Selig wrote:
[#18986] miniunit problems and release of Ruby 1.9.0-5 — "Yugui (Yuki Sonoda)" <yugui@...>
Hi,
Hi,
Hi,
Hi,
[#19043] Ruby is "stealing" names from operating system API:s — "Johan Holmberg" <johan556@...>
Hi!
Hi,
[ruby-core:18985] Encodings::default_internal patch
Hi,
Attached is a patch that implements "Encoding::default_internal". The aim
of this patch is to allow a Ruby programmer to specify a default encoding
that all IO (and I hope other methods) will return strings in, so that you
can deal with the strings without worrying (too much) about encoding
compatibility.
In doing this patch I have attempted to take everyone's input into account.
I have spent quite a bit of time on this (mainly thinking about the
features that will make it useful for everyone), so I hope that it is
accepted, even if it is a bit late!
A summary of what this patch does:
1) Extends the "-E" command line flag to accept either "ext_enc",
"ext_enc:int_enc" or ":int_enc". If the form "ext_enc:int_enc" is used,
the value of BOTH default_external & default_internal is set. If the form
":int_enc" is used, only the default_internal is set - the
default_external is left at it's default value. Similarly if the form
"ext_enc" is used - default_internal is left at it's default value.
2) Extended the "magic comment" feature to also look for
"internal_encoding: XXXX". If this is found in the MAIN ruby file only,
both the source encoding AND the default_internal are set to the given
encoding. If found in an included source file (eg: library), it is treated
the same way as "encoding:" and sets the source encoding only. (You
shouldn't use it in a library anyhow).
3) The above 2 features give the Ruby programmer 2 ways of specifying the
value of default_internal in their main program:
- By using "internal_encoding: XXX" magic comment, or
- By using -E:XXX in the ruby "shebang" on the first line (note the ":"
before the XXX to indicate "internal encoding").
The second form allows the programmer to specify a different value for
"default_internal" than the source encoding (not sure if that is
particularly useful).
Use of both of these is ignored when not in the main program, so that
"default_internal" can only be set once and only under the control of the
programmer or user.
Note: If the default_internal is specified on the ruby command line, it
overrides both of these.
4) If for some misguided reason default_internal is set to US-ASCII, it
actually sets itself to UTF-8, as this is much more likely to cope better.
5) The new method Encoding#default_internal works just like
Encoding#default_external - it is read-only. If it is not set, it's value
is nil. The is no "default_internal=" method.
6) When setting the encodings of an IO, if no internal encoding is
specified, the default_internal value will be used if it is set. If it is
not set, there is no transcoding.
7) There is one exception to the use of "default_internal" when opening a
file with an external encoding of ASCII-8BIT (aka BINARY). In this case
"default_internal" is ignored, and data returned untranscoded because the
user probably wants to do byte or bit processing rather than character.
8) The "int_enc" passed in the file mode or to IO#set_encoding may also be
"-". This will force the internal encoding to nil for the file, so no
transcoding is done on input.
For example: open(file, "r:EUC-JP:-") or f.set_encoding("EUC-JP", "-").
This gives the programmer the ability to prevent transcoding even if
"default_internal" is set.
Notes:
- When default_internal is not set Ruby works exactly as it does now in
1.9.0
- "Unicode whiners" (not my expression!) should probably set
"default_internal" to UTF-8.
- "default_internal" does *not* guarantee that all strings in your program
will be in that encoding. What it does (currently) is work on IO when
reading files or sockets etc. You can still use String#encode,
String#force_encoding etc to change them. It would be nice to actually
have a way of *forcing* all strings (and Regexps) to be a particular
encoding in order to guarantee no "encoding compatibility" problems.
However this is quite a bit more complicated to implement. So when writing
libraries & modules, you may still have to check the encoding of strings
passed to your methods.
- Using the "internal_encoding:" magic comment in your main program is the
preferred way of setting default_internal. This is because both your
source encoding and data read from files will be the same encoding, so at
least your program's string literals should match too.
- there are probably other libraries that should be patched to support
"default_internal" to ensure that strings are returned to the user in that
encoding. However ones like OpenURI *should* just work, because of the way
"default_internal" is supported transparently in the IO class.
- [I have not implemented this, because it would probably be rejected.
However I'll say it because I still think it is a good idea] I think if a
file is open with "b" mode, it should default to ASCII-8BIT, no
transcoding. This would probably get rid of the need for "bin_read" too.
Implementation notes:
- encoding.c/encoding.h - added functions to get & set "default_internal"
- parse.y - added code to support the "internal_encoding" magic comment
- ruby.c - added code to extend -E/--encoding option
- io.c - added code to support default_internal. Note that I almost
completely rewrote parse_mode_enc(). The other main mod is a new function
rb_io_ext_int_to_encs() which converts the external & internal encodings
to values for "enc" & "enc2" in convconfig_t. It is shame that "enc" &
"enc2" don't map directly to "internal" and "external" - it would have
made the code much simpler!
- Testing: not extensively tested. Passes "make test". Some "make
test-all" tests fail (esp rdoc), but I think that they may be due to other
issues (not certain).
I'm sure you'll tell me if I have broken anything!
Cheers
Mike
Attachments (1)
Index: encoding.c
===================================================================
--- encoding.c (revision 19561)
+++ encoding.c (working copy)
@@ -1027,8 +1027,57 @@
default_external = 0;
}
+/* -2 => not yet set, -1 => nil */
+static int default_internal_index = -2;
+static rb_encoding *default_internal;
+
+rb_encoding *
+rb_default_internal_encoding(void)
+{
+ if (!default_internal && default_internal_index >= 0) {
+ default_internal = rb_enc_from_index(default_internal_index);
+ }
+ return default_internal;
+}
+
+VALUE
+rb_enc_default_internal(void)
+{
+ /* Note: These functions cope with default_internal not being set */
+ return rb_enc_from_encoding(rb_default_internal_encoding());
+}
+
/*
* call-seq:
+ * Encoding.default_internal => enc
+ *
+ * Returns default internal encoding.
+ *
+ * It is initialized by the source internal_encoding or -E option,
+ * and can't be modified after that.
+ */
+static VALUE
+get_default_internal(VALUE klass)
+{
+ return rb_enc_default_internal();
+}
+
+void
+rb_enc_set_default_internal(VALUE encoding)
+{
+ if (default_internal_index != -2)
+ /* Already set */
+ return;
+ default_internal_index = encoding == Qnil ?
+ -1 :rb_enc_to_index(rb_to_encoding(encoding));
+ /* Convert US-ASCII => UTF-8 */
+ if (default_internal_index == rb_usascii_encindex())
+ default_internal_index = rb_utf8_encindex();
+ default_internal = 0;
+}
+
+/*
+ * call-seq:
* Encoding.locale_charmap => string
*
* Returns the locale charmap name.
@@ -1212,6 +1261,7 @@
rb_define_singleton_method(rb_cEncoding, "_load", enc_load, 1);
rb_define_singleton_method(rb_cEncoding, "default_external", get_default_external, 0);
+ rb_define_singleton_method(rb_cEncoding, "default_internal", get_default_internal, 0);
rb_define_singleton_method(rb_cEncoding, "locale_charmap", rb_locale_charmap, 0);
list = rb_ary_new2(enc_table.count);
Index: include/ruby/encoding.h
===================================================================
--- include/ruby/encoding.h (revision 19561)
+++ include/ruby/encoding.h (working copy)
@@ -168,11 +168,14 @@
rb_encoding *rb_locale_encoding(void);
rb_encoding *rb_filesystem_encoding(void);
rb_encoding *rb_default_external_encoding(void);
+rb_encoding *rb_default_internal_encoding(void);
int rb_ascii8bit_encindex(void);
int rb_utf8_encindex(void);
int rb_usascii_encindex(void);
VALUE rb_enc_default_external(void);
+VALUE rb_enc_default_internal(void);
void rb_enc_set_default_external(VALUE encoding);
+void rb_enc_set_default_internal(VALUE encoding);
VALUE rb_locale_charmap(VALUE klass);
long rb_memsearch(const void*,long,const void*,long,rb_encoding*);
Index: io.c
===================================================================
--- io.c (revision 19561)
+++ io.c (working copy)
@@ -2183,10 +2183,8 @@
}
newline = (unsigned char)rsptr[rslen - 1];
- if (fptr->encs.enc2)
- enc = fptr->encs.enc;
- else
- enc = io_input_encoding(fptr);
+ /* MS - Optimisation */
+ enc = io_read_encoding(fptr);
while ((c = appendline(fptr, newline, &str, &limit)) != EOF) {
const char *s, *p, *pp, *e;
@@ -3746,52 +3744,87 @@
return NULL; /* not reached */
}
+/*
+ * Convert external/internal encodings to enc/enc2
+ * NULL => use default encoding
+ * Qnil => no encoding specified (internal only)
+ */
static void
+rb_io_ext_int_to_encs(rb_encoding *ext, rb_encoding *intern, rb_encoding **enc, rb_encoding **enc2)
+{
+ int default_ext = 0;
+
+ if (ext == NULL) {
+ ext = rb_default_external_encoding();
+ default_ext = 1;
+ }
+ if (intern == NULL && ext != rb_ascii8bit_encoding())
+ /* If external is ASCII-8BIT, no default transcoding */
+ intern = rb_default_internal_encoding();
+ if (intern == NULL || intern == (rb_encoding *)Qnil || intern == ext) {
+ /* No internal encoding => use external + no transcoding */
+ *enc = default_ext ? NULL : ext;
+ *enc2 = NULL;
+ }
+ else {
+ *enc = intern;
+ *enc2 = ext;
+ }
+}
+
+static void
parse_mode_enc(const char *estr, rb_encoding **enc_p, rb_encoding **enc2_p)
{
- const char *p0, *p1;
- char *enc2name;
+ const char *p;
+ char encname[ENCODING_MAXNAMELEN+1];
int idx, idx2;
+ rb_encoding *ext_enc, *int_enc;
- /* parse estr as "enc" or "enc2:enc" */
+ /* parse estr as "enc" or "enc2:enc" or "enc:-" */
- *enc_p = 0;
- *enc2_p = 0;
+ p = strrchr(estr, ':');
+ if (p) {
+ int len = (p++) - estr;
+ if (len == 0 || len > ENCODING_MAXNAMELEN)
+ idx = -1;
+ else {
+ memcpy(encname, estr, len);
+ encname[len] = '\0';
+ estr = encname;
+ idx = rb_enc_find_index(encname);
+ }
+ }
+ else
+ idx = rb_enc_find_index(estr);
- p0 = strrchr(estr, ':');
- if (!p0) p1 = estr;
- else p1 = p0 + 1;
- idx = rb_enc_find_index(p1);
- if (idx >= 0) {
- *enc_p = rb_enc_from_index(idx);
- }
+ if (idx >= 0)
+ ext_enc = rb_enc_from_index(idx);
else {
- rb_warn("Unsupported encoding %s ignored", p1);
+ if (idx != -2)
+ rb_warn("Unsupported encoding %s ignored", estr);
+ ext_enc = NULL;
}
- if (*enc_p && p0) {
- int n = p0 - estr;
- if (n > ENCODING_MAXNAMELEN) {
- idx2 = -1;
+ int_enc = NULL;
+ if (p) {
+ if (*p == '-' && *(p+1) == '\0') {
+ /* Special case - "-" => no transcoding */
+ int_enc = (rb_encoding *)Qnil;
}
else {
- enc2name = ALLOCA_N(char, n+1);
- memcpy(enc2name, estr, n);
- enc2name[n] = '\0';
- estr = enc2name;
- idx2 = rb_enc_find_index(enc2name);
+ idx2 = rb_enc_find_index(p);
+ if (idx2 < 0)
+ rb_warn("Unsupported encoding %s ignored", p);
+ else if (idx2 == idx) {
+ rb_warn("Ignoring internal encoding %s: it is identical to external encoding %s", p, estr);
+ int_enc = (rb_encoding *)Qnil;
+ }
+ else
+ int_enc = rb_enc_from_index(idx2);
}
- if (idx2 < 0) {
- rb_warn("Unsupported encoding %.*s ignored", n, estr);
- }
- else if (idx2 == idx) {
- rb_warn("Ignoring internal encoding %.*s: it is identical to external encoding %s",
- n, estr, p1);
- }
- else {
- *enc2_p = rb_enc_from_index(idx2);
- }
}
+
+ rb_io_ext_int_to_encs(ext_enc, int_enc, enc_p, enc2_p);
}
static void
@@ -3827,28 +3860,32 @@
}
if (!NIL_P(extenc)) {
rb_encoding *extencoding = rb_to_encoding(extenc);
+ rb_encoding *intencoding = NULL;
extracted = 1;
- *enc_p = 0;
- *enc2_p = 0;
if (!NIL_P(encoding)) {
rb_warn("Ignoring encoding parameter '%s': external_encoding is used",
RSTRING_PTR(encoding));
}
if (!NIL_P(intenc)) {
- rb_encoding *intencoding = rb_to_encoding(intenc);
+ if (!NIL_P(encoding = rb_check_string_type(intenc))) {
+ char *p = StringValueCStr(encoding);
+ if (*p == '-' && *(p+1) == '\0') {
+ /* Special case - "-" => no transcoding */
+ intencoding = (rb_encoding *)Qnil;
+ }
+ else
+ intencoding = rb_to_encoding(intenc);
+ }
+ else
+ intencoding = rb_to_encoding(intenc);
if (extencoding == intencoding) {
rb_warn("Ignoring internal encoding '%s': it is identical to external encoding '%s'",
RSTRING_PTR(rb_inspect(intenc)),
RSTRING_PTR(rb_inspect(extenc)));
+ intencoding = (rb_encoding *)Qnil;
}
- else {
- *enc_p = intencoding;
- *enc2_p = extencoding;
- }
}
- else {
- *enc_p = extencoding;
- }
+ rb_io_ext_int_to_encs(extencoding, intencoding, enc_p, enc2_p);
}
else {
if (!NIL_P(intenc)) {
@@ -3888,8 +3925,8 @@
vmode = *vmode_p;
- enc = NULL;
- enc2 = NULL;
+ /* Set to defaults */
+ rb_io_ext_int_to_encs(NULL, NULL, &enc, &enc2);
if (NIL_P(vmode)) {
fmode = FMODE_READABLE;
@@ -4076,8 +4113,8 @@
rb_io_t *fptr;
convconfig_t cc;
if (!convconfig) {
- cc.enc = NULL;
- cc.enc2 = NULL;
+ /* Set to default encodings */
+ rb_io_ext_int_to_encs(NULL, NULL, &cc.enc, &cc.enc2);
cc.ecflags = 0;
cc.ecopts = Qnil;
convconfig = &cc;
@@ -4105,8 +4142,8 @@
parse_mode_enc(p+1, &convconfig.enc, &convconfig.enc2);
}
else {
- convconfig.enc = NULL;
- convconfig.enc2 = NULL;
+ /* Set to default encodings */
+ rb_io_ext_int_to_encs(NULL, NULL, &convconfig.enc, &convconfig.enc2);
convconfig.ecflags = 0;
convconfig.ecopts = Qnil;
}
@@ -6677,29 +6714,40 @@
{
rb_encoding *enc, *enc2;
int ecflags;
- VALUE ecopts;
+ VALUE ecopts, tmp;
if (!NIL_P(v2)) {
enc2 = rb_to_encoding(v1);
- enc = rb_to_encoding(v2);
+ tmp = rb_check_string_type(v2);
+ if (!NIL_P(tmp)) {
+ char *p = StringValueCStr(tmp);
+ if (*p == '-' && *(p+1) == '\0') {
+ /* Special case - "-" => no transcoding */
+ enc = enc2;
+ enc2 = NULL;
+ }
+ else
+ enc = rb_to_encoding(v2);
+ }
+ else
+ enc = rb_to_encoding(v2);
ecflags = rb_econv_prepare_opts(opt, &ecopts);
}
else {
if (NIL_P(v1)) {
- enc = NULL;
- enc2 = NULL;
+ /* Set to default encodings */
+ rb_io_ext_int_to_encs(NULL, NULL, &enc, &enc2);
ecflags = 0;
ecopts = Qnil;
}
else {
- VALUE tmp = rb_check_string_type(v1);
+ tmp = rb_check_string_type(v1);
if (!NIL_P(tmp)) {
parse_mode_enc(StringValueCStr(tmp), &enc, &enc2);
ecflags = rb_econv_prepare_opts(opt, &ecopts);
}
else {
- enc = rb_to_encoding(v1);
- enc2 = NULL;
+ rb_io_ext_int_to_encs(rb_to_encoding(v1), NULL, &enc, &enc2);
ecflags = 0;
ecopts = Qnil;
}
Index: parse.y
===================================================================
--- parse.y (revision 19561)
+++ parse.y (working copy)
@@ -6068,6 +6068,8 @@
if (parser->line_count != (parser->has_shebang ? 2 : 1))
return;
parser_set_encode(parser, val);
+ if (strcmp(name, "internal_encoding") == 0)
+ rb_enc_set_default_internal(rb_enc_from_encoding(parser->enc));
}
struct magic_comment {
@@ -6079,6 +6081,7 @@
static const struct magic_comment magic_comments[] = {
{"coding", magic_comment_encoding, parser_encode_length},
{"encoding", magic_comment_encoding, parser_encode_length},
+ {"internal_encoding", magic_comment_encoding, parser_encode_length},
};
#endif
@@ -6207,6 +6210,8 @@
{
int sep = 0;
const char *beg = str;
+ const char *name;
+ int name_len;
VALUE s;
for (;;) {
@@ -6229,6 +6234,11 @@
}
if (STRNCASECMP(str-6, "coding", 6) == 0) break;
}
+ /* Search for the start of the keyword */
+ for (name = str-6; name >= beg; name--)
+ if (!ISALNUM(*name) && *name != '_')
+ break;
+ name_len = str - ++name;
for (;;) {
do {
if (++str >= send) return;
@@ -6243,6 +6253,8 @@
s = rb_str_new(beg, parser_encode_length(parser, beg, str - beg));
parser_set_encode(parser, RSTRING_PTR(s));
rb_str_resize(s, 0);
+ if (name_len == 17 && STRNCASECMP(name, "internal_encoding", 17) == 0)
+ rb_enc_set_default_internal(rb_enc_from_encoding(parser->enc));
}
static void
Index: ruby.c
===================================================================
--- ruby.c (revision 19561)
+++ ruby.c (working copy)
@@ -94,7 +94,7 @@
VALUE name;
int index;
} enc;
- } src, ext;
+ } src, ext, intern;
VALUE req_list;
};
@@ -869,6 +869,7 @@
ruby_each_words(s, disable_option, &opt->disable);
}
else if (strncmp("encoding", s, n = 8) == 0 && (!s[n] || s[n] == '=')) {
+ char *p;
s += n;
if (!*s++) {
next_encoding:
@@ -877,7 +878,15 @@
}
}
encoding:
- opt->ext.enc.name = rb_str_new2(s);
+ p = strchr(s, ':');
+ if (p) {
+ if (p > s)
+ opt->ext.enc.name = rb_str_new(s, p-s);
+ if (*++p)
+ opt->intern.enc.name = rb_str_new2(p);
+ }
+ else
+ opt->ext.enc.name = rb_str_new2(s);
}
else if (strcmp("version", s) == 0)
opt->version = 1;
@@ -980,6 +989,7 @@
rb_safe_level() == 0 && (s = getenv("RUBYOPT"))) {
VALUE src_enc_name = opt->src.enc.name;
VALUE ext_enc_name = opt->ext.enc.name;
+ VALUE int_enc_name = opt->intern.enc.name;
while (ISSPACE(*s))
s++;
@@ -1019,6 +1029,8 @@
opt->src.enc.name = src_enc_name;
if (ext_enc_name)
opt->ext.enc.name = ext_enc_name;
+ if (int_enc_name)
+ opt->intern.enc.name = int_enc_name;
}
if (opt->version) {
@@ -1087,6 +1099,9 @@
if (opt->ext.enc.name != 0) {
opt->ext.enc.index = opt_enc_index(opt->ext.enc.name);
}
+ if (opt->intern.enc.name != 0) {
+ opt->intern.enc.index = opt_enc_index(opt->intern.enc.name);
+ }
if (opt->src.enc.name != 0) {
opt->src.enc.index = opt_enc_index(opt->src.enc.name);
src_encoding_index = opt->src.enc.index;
@@ -1098,6 +1113,11 @@
enc = lenc;
}
rb_enc_set_default_external(rb_enc_from_encoding(enc));
+ if (opt->intern.enc.index >= 0) {
+ enc = rb_enc_from_index(opt->intern.enc.index);
+ rb_enc_set_default_internal(rb_enc_from_encoding(enc));
+ opt->intern.enc.index = -1;
+ }
rb_set_safe_level_force(safe);
if (opt->e_script) {
@@ -1119,6 +1139,15 @@
tree = load_file(parser, opt->script, 1, opt);
}
+ if (opt->intern.enc.index >= 0) {
+ /* Set in the shebang line */
+ enc = rb_enc_from_index(opt->intern.enc.index);
+ rb_enc_set_default_internal(rb_enc_from_encoding(enc));
+ }
+ else
+ /* Freeze default_internal */
+ rb_enc_set_default_internal(Qnil);
+
if (!tree) return Qfalse;
process_sflag(opt);
@@ -1189,6 +1218,7 @@
char *p;
int no_src_enc = !opt->src.enc.name;
int no_ext_enc = !opt->ext.enc.name;
+ int no_int_enc = !opt->intern.enc.name;
enc = rb_usascii_encoding();
rb_funcall(f, rb_intern("set_encoding"), 1, rb_enc_from_encoding(enc));
@@ -1275,6 +1305,9 @@
if (no_ext_enc && opt->ext.enc.name) {
opt->ext.enc.index = opt_enc_index(opt->ext.enc.name);
}
+ if (no_int_enc && opt->intern.enc.name) {
+ opt->intern.enc.index = opt_enc_index(opt->intern.enc.name);
+ }
}
else if (!NIL_P(c)) {
rb_io_ungetbyte(f, c);
@@ -1538,6 +1571,7 @@
args.argv = argv;
args.opt = cmdline_options_init(&opt);
opt.ext.enc.index = -1;
+ opt.intern.enc.index = -1;
tree = (NODE *)rb_vm_call_cfunc(rb_vm_top_self(),
process_options, (VALUE)&args,
0, rb_progname);