[#18436] [ANN] Ruby 1.9.1 feature freeze — "Yugui (Yuki Sonoda)" <yugui@...>

Hi all,

81 messages 2008/09/02
[#18667] Re: [ANN] Ruby 1.9.1 feature freeze — "Yusuke ENDOH" <mame@...> 2008/09/17

Hi,

[#18847] Re: [ANN] Ruby 1.9.1 feature freeze — "Yugui (Yuki Sonoda)" <yugui@...> 2008/09/24

Hi, Yusuke

[#18848] Re: [ANN] Ruby 1.9.1 feature freeze — "Yusuke ENDOH" <mame@...> 2008/09/24

Hi,

[#18886] Re: [ANN] Ruby 1.9.1 feature freeze — Ryan Davis <ryand-ruby@...> 2008/09/25

[#18889] Re: [ANN] Ruby 1.9.1 feature freeze — SASADA Koichi <ko1@...> 2008/09/25

Ryan Davis wrote:

[#18906] Re: [ANN] Ruby 1.9.1 feature freeze — Dave Thomas <dave@...> 2008/09/25

[#18908] Re: [ANN] Ruby 1.9.1 feature freeze — SASADA Koichi <ko1@...> 2008/09/25

Dave Thomas wrote:

[#19032] Re: [ANN] Ruby 1.9.1 feature freeze — Ryan Davis <ryand-ruby@...> 2008/09/30

[#19036] Re: [ANN] Ruby 1.9.1 feature freeze — Jim Weirich <jim.weirich@...> 2008/09/30

[#19039] Re: [ANN] Ruby 1.9.1 feature freeze — Ryan Davis <ryand-ruby@...> 2008/09/30

[#19042] Re: [ANN] Ruby 1.9.1 feature freeze — Dave Thomas <dave@...> 2008/09/30

[#19195] Re: [ANN] Ruby 1.9.1 feature freeze — Ryan Davis <ryand-ruby@...> 2008/10/08

[#19202] Re: [ANN] Ruby 1.9.1 feature freeze — "Austin Ziegler" <halostatue@...> 2008/10/08

On Wed, Oct 8, 2008 at 3:05 AM, Ryan Davis <ryand-ruby@zenspider.com> wrote=

[#19203] Re: [ANN] Ruby 1.9.1 feature freeze — Paul Brannan <pbrannan@...> 2008/10/08

On Wed, Oct 08, 2008 at 09:28:22PM +0900, Austin Ziegler wrote:

[#18452] [ANN] Ruby 1.9.1 feature freeze — "Roger Pack" <rogerpack2005@...>

Would it be possible to have a few patches applied before freeze [if

27 messages 2008/09/04
[#18471] Re: [ANN] Ruby 1.9.1 feature freeze — Yukihiro Matsumoto <matz@...> 2008/09/06

Hi,

[#18490] Re: [ANN] Ruby 1.9.1 feature freeze — Nobuyoshi Nakada <nobu@...> 2008/09/08

Hi,

[#18486] Ruby 1.9 strings & character encoding — "Michael Selig" <michael.selig@...>

Firstly, I apologise if I am going over old ground here - I haven't been

39 messages 2008/09/08
[#18492] Re: Ruby 1.9 strings & character encoding — Yukihiro Matsumoto <matz@...> 2008/09/08

Hi,

[#18494] Re: Ruby 1.9 strings & character encoding — "Michael Selig" <michael.selig@...> 2008/09/08

On Mon, 08 Sep 2008 19:45:36 +1000, Yukihiro Matsumoto

[#18499] Re: Ruby 1.9 strings & character encoding — "NARUSE, Yui" <naruse@...> 2008/09/08

Hi,

[#18500] Re: Ruby 1.9 strings & character encoding — Tim Bray <Tim.Bray@...> 2008/09/08

On Sep 8, 2008, at 10:43 AM, NARUSE, Yui wrote:

[#18515] Re: Ruby 1.9 strings & character encoding — Urabe Shyouhei <shyouhei@...> 2008/09/09

# First off, I'm neutral to this issue

[#18530] Re: Ruby 1.9 strings & character encoding — Tim Bray <Tim.Bray@...> 2008/09/10

On Sep 8, 2008, at 9:06 PM, Urabe Shyouhei wrote:

[#18533] Re: Ruby 1.9 strings & character encoding — Tanaka Akira <akr@...> 2008/09/10

In article <3119E5AB-AEC8-4FEE-B2FA-8C75482E0E9D@sun.com>,

[#18504] Re: Ruby 1.9 strings & character encoding — "Michael Selig" <michael.selig@...> 2008/09/09

On Tue, 09 Sep 2008 03:43:54 +1000, NARUSE, Yui <naruse@airemix.jp> wrote:

[#18572] Working on CSV's Encoding Support — James Gray <james@...>

I'm trying to get the standard CSV library ready for m17n in Ruby

23 messages 2008/09/13
[#18575] Re: Working on CSV's Encoding Support — James Gray <james@...> 2008/09/14

On Sep 13, 2008, at 5:39 PM, James Gray wrote:

[#18576] Re: Working on CSV's Encoding Support — "Michael Selig" <michael.selig@...> 2008/09/14

On Sun, 14 Sep 2008 14:48:47 +1000, James Gray <james@grayproductions.net>

[#18640] Character encodings - a radical suggestion — "Michael Selig" <michael.selig@...>

Hi,

89 messages 2008/09/17
[#18643] Re: Character encodings - a radical suggestion — James Gray <james@...> 2008/09/17

On Sep 16, 2008, at 8:20 PM, Michael Selig wrote:

[#18647] Re: Character encodings - a radical suggestion — "Michael Selig" <michael.selig@...> 2008/09/17

On Wed, 17 Sep 2008 12:51:14 +1000, James Gray <james@grayproductions.net>

[#18658] Re: Character encodings - a radical suggestion — James Gray <james@...> 2008/09/17

On Sep 16, 2008, at 11:21 PM, Michael Selig wrote:

[#18660] Re: Character encodings - a radical suggestion — "NARUSE, Yui" <naruse@...> 2008/09/17

Hi,

[#18663] Re: Character encodings - a radical suggestion — Matthias Wächter <matthias@...> 2008/09/17

On 9/17/2008 3:39 PM, NARUSE, Yui wrote:

[#18666] Re: Character encodings - a radical suggestion — Yukihiro Matsumoto <matz@...> 2008/09/17

Hi,

[#18728] Re: Character encodings - a radical suggestion — Martin Duerst <duerst@...> 2008/09/19

At 00:01 08/09/18, Yukihiro Matsumoto wrote:

[#18729] Re: Character encodings - a radical suggestion — Yukihiro Matsumoto <matz@...> 2008/09/19

Hi,

[#18732] Re: Character encodings - a radical suggestion — "Michael Selig" <michael.selig@...> 2008/09/19

On Fri, 19 Sep 2008 18:24:41 +1000, Yukihiro Matsumoto

[#18734] Re: Character encodings - a radical suggestion — Yukihiro Matsumoto <matz@...> 2008/09/19

Oops, I misfired my mail reader; the following is the right one:

[#18751] Re: Character encodings - a radical suggestion — "Michael Selig" <michael.selig@...> 2008/09/20

On Fri, 19 Sep 2008 19:52:30 +1000, Yukihiro Matsumoto

[#18761] Re: Character encodings - a radical suggestion — Yukihiro Matsumoto <matz@...> 2008/09/20

Hi,

[#18774] Re: Character encodings - a radical suggestion — "Michael Selig" <michael.selig@...> 2008/09/21

On Sun, 21 Sep 2008 02:05:30 +1000, Yukihiro Matsumoto

[#18776] Re: Character encodings - a less radical suggestion — Martin Duerst <duerst@...> 2008/09/22

Hello Michael,

[#18664] Re: Character encodings - a radical suggestion — Yukihiro Matsumoto <matz@...> 2008/09/17

Hi,

[#18762] [Feature #578] add method to disassemble Proc objects — Roger Pack <redmine@...>

Feature #578: add method to disassemble Proc objects

17 messages 2008/09/20

[#18872] [RIP] Guy Decoux. — "Jean-Fran輟is Tr穗" <jftran@...>

Hello,

14 messages 2008/09/24

[#18899] refute_{equal, match, nil, same} is not useful — Fujioka <fuj@...>

Hi,

27 messages 2008/09/25

[#18937] A stupid question... — Dave Thomas <dave@...>

Just what was wrong with Test::Unit? Sure, it was slightly bloated.

25 messages 2008/09/25
[#18941] Re: A stupid question... — "Berger, Daniel" <Daniel.Berger@...> 2008/09/25

> -----Original Message-----

[#19004] Let Ruby be Ruby — Trans <transfire@...> 2008/09/28

[#18986] miniunit problems and release of Ruby 1.9.0-5 — "Yugui (Yuki Sonoda)" <yugui@...>

Hi,

14 messages 2008/09/27

[#19043] Ruby is "stealing" names from operating system API:s — "Johan Holmberg" <johan556@...>

Hi!

13 messages 2008/09/30

[ruby-core:18852] Re: Character encodings - a less radicalsuggestion

From: Martin Duerst <duerst@...>
Date: 2008-09-24 11:02:14 UTC
List: ruby-core #18852
At 12:25 08/09/22, Michael Selig wrote:
>On Mon, 22 Sep 2008 12:35:49 +1000, Martin Duerst <duerst@it.aoyama.ac.jp>  
>wrote:
>
>>
>> Therefore, I think we should seriously consider this proposal,
>> and hopefully implement it before Sept. 25th. In terms of
>> implementation, I don't think it should be that difficult,
>> but it may be quite a bit of work to check
>> Encoding::default_internal in all the affected methods.
>
>Wow, that is rather ambitious - 3 days?

Well, that's the deadline for feature changes for 1.9.1.
It would be a real pity to wait for 2.0 for this.
The feature freeze wiki at
http://redmine.ruby-lang.org/wiki/ruby/DevelopersMeeting20080922
says that default_internal is currently pending, but that
this should be discussed/settled this week.

Anyhow, I had a look at the code, and it doesn't seem to be that
difficult. The function io_extract_encoding_option in io.c
seems to be central. I'm attaching a patch, which I hope is
a good start. I'm also writing to ruby-dev (in Japanese)
because that's where the real experts are.
The patch isn't as strict as your proposal with respect
to re-setting, but I'm fine either way.

I have tested this patch with code like the following
(called with -Eutf-8, -Eshift_jis, -Eeuc-jp, and without -E
option, in all combinations)

>>>>
Encoding.default_internal = 'utf-8'
      # tested with 'utf-8', 'shift_jis', and 'euc-jp'

s = "\u3042\u3044\u3046\u3048\u304A"
File.open('testout1.txt', 'w:shift_jis') do |f| f.write s end
File.open('testout2.txt', 'w:euc-jp') do |f| f.write s end
File.open('testout3.txt', 'w:utf-8') do |f| f.write s end

File.open('testout1.txt', 'r:shift_jis') do |f| s = f.read; p s.encoding end
File.open('testout2.txt', 'r:euc-jp') do |f| s = f.read; p s.encoding end
File.open('testout3.txt', 'r:utf-8') do |f| s = f.read; p s.encoding end
File.open('testout3.txt', 'r:ASCII-8BIT') do |f| s = f.read; p s.encoding end

# for next line, change file number to pick up default_internal
File.open('testout3.txt', 'r') do |f| s = f.read; p s.encoding end
>>>>

>The bulk of the implementation will be in the libraries, and I think many  
>of them need updating to cope with non-acsii encodings anyhow.

Yes. I'm not sure how libraries are affected by the feature
freeze, but they have to be fixed anyhow, completely independently
of default_internal. And I agree that this cannot be done in 3 days.

Regards,    Martin.

>> - We should think through various scenarios for output.
>>   I can't think of any problems just now, I just noticed
>>   the absence of considerations for output below.
>
>I did think about output to a certain extent, and one good thing is that  
>IO already seems to automatically transcode to the "external" encoding at  
>the moment. As for other classes, again I think most need updating to  
>support multiple encodings anyhow. They will at a minimum need a way of  
>having the user pass the "external" encoding (defaulting to  
>"default_external"), and do the transcode as necessary, based on the  
>encoding of the data to be output. However, as with IO, this behaviour  
>probably should happen no matter whether "default_internal" is implemented  
>or not.
>
>Cheers
>Mike
>


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp     

Attachments (1)

patch_default_internal.txt (3 KB, text/x-diff)
Index: encoding.c
===================================================================
--- encoding.c	(revision 19510)
+++ encoding.c	(working copy)
@@ -1062,6 +1062,67 @@
 #endif
 }
 
+static int default_internal_index = -1;
+static rb_encoding *default_internal = 0;
+
+
+rb_encoding *
+rb_default_internal_encoding(void)
+{
+    return default_internal;
+}
+
+VALUE
+rb_enc_default_internal(void)
+{
+    return default_internal==0 ? Qnil : 
+	rb_enc_from_encoding(default_internal);
+}
+
+/*
+ * call-seq:
+ *   Encoding.default_internal => enc
+ *
+ * Returns default internal encoding (nil if unused).
+ *
+ */
+static VALUE
+get_default_internal(VALUE klass)
+{
+    return rb_enc_default_internal();
+}
+
+void
+rb_enc_set_default_internal(VALUE encoding)
+{
+    if (default_internal)
+	rb_warn("Resetting Encoding.default_internal");
+    if (encoding == Qnil) {
+        default_internal = 0;
+        default_internal_index = -1;
+    }
+    else {
+	default_internal = rb_to_encoding(encoding);
+	default_internal_index = rb_enc_to_index(default_internal);
+    }
+}
+
+/*
+ * call-seq:
+ *   Encoding.default_internal= enc => enc
+ *
+ * Sets default internal encoding (default is nil, i.e. unused).
+ * For use in main application; never use in a library!
+ * Returns nil. Produces a warning if reset.
+ *
+ */
+static VALUE
+set_default_internal(VALUE klass, VALUE encoding)
+{
+    rb_enc_set_default_internal(encoding);
+    return Qnil;
+}
+
 static void
 set_encoding_const(const char *name, rb_encoding *enc)
 {
@@ -1214,6 +1275,9 @@
     rb_define_singleton_method(rb_cEncoding, "default_external", get_default_external, 0);
     rb_define_singleton_method(rb_cEncoding, "locale_charmap", rb_locale_charmap, 0);
 
+    rb_define_singleton_method(rb_cEncoding, "default_internal",   get_default_internal, 0);
+    rb_define_singleton_method(rb_cEncoding, "default_internal=",  set_default_internal, 1);
+
     list = rb_ary_new2(enc_table.count);
     RBASIC(list)->klass = 0;
     rb_encoding_list = list;
Index: io.c
===================================================================
--- io.c	(revision 19510)
+++ io.c	(working copy)
@@ -3885,6 +3885,7 @@
     VALUE ecopts;
     int has_enc = 0, has_vmode = 0;
     VALUE intmode;
+    rb_encoding *def_internal;
 
     vmode = *vmode_p;
 
@@ -3972,6 +3973,20 @@
 
     *oflags_p = oflags;
     *fmode_p = fmode;
+    if (fmode&FMODE_READABLE && !enc2 && (def_internal=rb_default_internal_encoding())) {
+	rb_encoding *def_external = rb_default_external_encoding();
+	rb_encoding *ascii_8bit = rb_enc_find("ASCII-8BIT");
+	if (!enc) {
+	    if (def_external!=def_internal && def_external!=ascii_8bit) {
+	        enc  = def_internal;
+	        enc2 = def_external;
+	    }
+	}
+	else if (enc!=def_internal && enc!=ascii_8bit) {
+	    enc2 = enc;
+	    enc = def_internal;
+	}
+    }
     convconfig_p->enc = enc;
     convconfig_p->enc2 = enc2;
     convconfig_p->ecflags = ecflags;

In This Thread