[#34911] erb still treats $KCODE — "Yusuke ENDOH" <mame@...>

遠藤です。

16 messages 2008/06/03

[#34923] open() and encodings — "NARUSE, Yui" <naruse@...>

成瀬です。

53 messages 2008/06/03
[#34924] Re: open() and encodings — Yukihiro Matsumoto <matz@...> 2008/06/04

まつもと ゆきひろです

[#34931] Re: open() and encodings — "NARUSE, Yui" <naruse@...> 2008/06/04

成瀬です。

[#34934] Re: open() and encodings — Yukihiro Matsumoto <matz@...> 2008/06/05

まつもと ゆきひろです

[#34935] Re: open() and encodings — "U.Nakamura" <usa@...> 2008/06/05

こんにちは、なかむら(う)です。

[#34936] Re: open() and encodings — Yukihiro Matsumoto <matz@...> 2008/06/05

まつもと ゆきひろです

[#34937] Re: open() and encodings — "U.Nakamura" <usa@...> 2008/06/05

こんにちは、なかむら(う)です。

[#34948] Re: open() and encodings — Hidetoshi NAGAI <nagai@...> 2008/06/05

永井@知能.九工大です.

[#34961] Re: open() and encodings — "NARUSE, Yui" <naruse@...> 2008/06/05

成瀬です。

[#34997] Re: [ruby-changes:5517] Ruby:r17021 (trunk): * vm_insnhelper.c, vm.c, proc.c (proc_call): allow call method with — SASADA Koichi <ko1@...>

 ささだです.

19 messages 2008/06/08
[#34998] Re: [ruby-changes:5517] Ruby:r17021 (trunk): * vm_insnhelper.c, vm.c, proc.c (proc_call): allow call method with — Yukihiro Matsumoto <matz@...> 2008/06/08

まつもと ゆきひろです

[#34999] Re: [ruby-changes:5517] Ruby:r17021 (trunk): * vm_insnhelper.c, vm.c, proc.c (proc_call): allow call method with — SASADA Koichi <ko1@...> 2008/06/08

 ささだです.

[#35000] Re: [ruby-changes:5517] Ruby:r17021 (trunk): * vm_insnhelper.c, vm.c, proc.c (proc_call): allow call method with — Yukihiro Matsumoto <matz@...> 2008/06/08

まつもと ゆきひろです

[#35001] Re: [ruby-changes:5517] Ruby:r17021 (trunk): * vm_insnhelper.c, vm.c, proc.c (proc_call): allow call method with — SASADA Koichi <ko1@...> 2008/06/08

 ささだです.

[#35003] Re: [ruby-changes:5517] Ruby:r17021 (trunk): * vm_insnhelper.c, vm.c, proc.c (proc_call): allow call method with — Yukihiro Matsumoto <matz@...> 2008/06/08

まつもと ゆきひろです

[#35007] Re: [ruby-changes:5517] Ruby:r17021 (trunk): * vm_insnhelper.c, vm.c, proc.c (proc_call): allow call method with — "Yusuke ENDOH" <mame@...> 2008/06/09

遠藤です。

[#35013] Re: [ruby-changes:5517] Ruby:r17021 (trunk): * vm_insnhelper.c, vm.c, proc.c (proc_call): allow call method with — Yukihiro Matsumoto <matz@...> 2008/06/09

まつもと ゆきひろです

[#35019] Re: [ruby-changes:5517] Ruby:r17021 (trunk): * vm_insnhelper.c, vm.c, proc.c (proc_call): allow call method with — "Yusuke ENDOH" <mame@...> 2008/06/09

遠藤です。

[#35021] Re: [ruby-changes:5517] Ruby:r17021 (trunk): * vm_insnhelper.c, vm.c, proc.c (proc_call): allow call method with — Yukihiro Matsumoto <matz@...> 2008/06/09

まつもと ゆきひろです

[#35020] Ruby 1.8.7-p17 has been released — "Akinori MUSHA" <knu@...>

 Ruby 1.8.7-p17 をリリースしました。

13 messages 2008/06/09

[#35044] deadlock detection for 1.9 — "Yusuke ENDOH" <mame@...>

遠藤です。

14 messages 2008/06/10

[#35108] Re: [ruby-list:44988] Re: 各ブランチの計画 — Urabe Shyouhei <shyouhei@...>

卜部です。

15 messages 2008/06/15

[#35200] Win32 Unicode console output — Tietew <tietew@...>

Tietew です。

22 messages 2008/06/22
[#35270] Re: Win32 Unicode console output — "NARUSE, Yui" <naruse@...> 2008/06/29

[#35226] [PATCH] freeze required_paths in gem_prelude.rb — "Keita Yamaguchi" <keita.yamaguchi@...>

山口と申します。

14 messages 2008/06/25
[#35228] Re: [PATCH] freeze required_paths in gem_prelude.rb — "Yusuke ENDOH" <mame@...> 2008/06/25

遠藤です。

[#35230] Re: [PATCH] freeze required_paths in gem_prelude.rb — Yukihiro Matsumoto <matz@...> 2008/06/25

まつもと ゆきひろです

[#35227] [Bug:trunk] Re: [ruby-cvs:24798] Ruby:r17573 (trunk): * parse.y (primary): make functional-style not operator to act — "U.Nakamura" <usa@...>

こんにちは、なかむら(う)です。

7 messages 2008/06/25

[#35247] Re: [ruby-list:45128] Re: Ruby 1.9.0/1.8.7/1.8.6/1.8.5 new releases (Security Fix) — Urabe Shyouhei <shyouhei@...>

卜部です。-devに振ります。ひょっとしてこんなパッチでSEGVのほうはおさまっ

13 messages 2008/06/26
[#35250] Re: [ruby-list:45128] Re: Ruby 1.9.0/1.8.7/1.8.6/1.8.5 new releases (Security Fix) — Yukihiro Matsumoto <matz@...> 2008/06/26

まつもと ゆきひろです

[#35273] $SAFEの今後 — Urabe Shyouhei <shyouhei@...>

〜これまでのあらすじ〜

24 messages 2008/06/30
[#35293] Re: $SAFEの今後 — Yukihiro Matsumoto <matz@...> 2008/07/01

まつもと ゆきひろです

[#35298] Re: $SAFEの今後 — Urabe Shyouhei <shyouhei@...> 2008/07/01

卜部です。

[#35303] Re: $SAFEの今後 — Yukihiro Matsumoto <matz@...> 2008/07/01

まつもと ゆきひろです

[#35304] Re: $SAFEの今後 — Urabe Shyouhei <shyouhei@...> 2008/07/01

卜部です。

[#35305] Re: $SAFEの今後 — Yukihiro Matsumoto <matz@...> 2008/07/01

まつもと ゆきひろです

[#35306] Re: $SAFEの今後 — "Shugo Maeda" <shugo@...> 2008/07/02

前田です。

[#35278] [BUG] test_win32ole_event.rb in trunk — Masaki Suketa <masaki.suketa@...>

助田です。

22 messages 2008/06/30
[#35281] Re: [BUG] test_win32ole_event.rb in trunk — "U.Nakamura" <usa@...> 2008/06/30

こんにちは、なかむら(う)です。

[#35282] Re: [BUG] test_win32ole_event.rb in trunk — arton <artonx@...> 2008/06/30

artonです。

[#35295] Re: [BUG] test_win32ole_event.rb in trunk — Masaki Suketa <masaki.suketa@...> 2008/07/01

助田です。

[ruby-dev:34923] open() and encodings

From: "NARUSE, Yui" <naruse@...>
Date: 2008-06-03 23:08:54 UTC
List: ruby-dev #34923
成瀬です。

Dir.open/entries に関連して、open() 絡みの変更案です。

まず、ファイルパスについて、現在はファイルシステムのエンコーディングに関らず、
与えられたバイナリ列をそのままOSに渡していますが、
ファイルシステムのエンコーディングに変換した上で渡した方が親切でしょう。

次に、現在open()はエンコーディングを別にオプションで取ることができません。
これも
open("filename", internal_encoding:"EUC-JP", external_encoding:"Shift_JIS")
などと取れたほうが便利ではないでしょうか。
なお、キー名はとりあえずそのまんま長めの名前にしています。
これならば誤解はないであろうし、後でもっと良い名前を思いついたときに変えやすいと思ったので。

このような感じでいかがでしょうか。

-- 
NARUSE, Yui  <naruse@airemix.jp>

Attachments (1)

open_and_encoding.patch (10.1 KB, text/x-diff)
--- dir.c	(revision 16799)
+++ dir.c	(working copy)
@@ -12,6 +12,7 @@
 **********************************************************************/
 
 #include "ruby/ruby.h"
+#include "ruby/encoding.h"
 
 #include <sys/types.h>
 #include <sys/stat.h>
@@ -342,6 +343,8 @@ VALUE rb_cDir;
 struct dir_data {
     DIR *dir;
     char *path;
+    rb_encoding *intenc;
+    rb_encoding *extenc;
 };
 
 static void
@@ -364,6 +367,8 @@ dir_s_alloc(VALUE klass)
 
     dirp->dir = NULL;
     dirp->path = NULL;
+    dirp->intenc = NULL;
+    dirp->extenc = NULL;
 
     return obj;
 }
@@ -375,16 +380,52 @@ dir_s_alloc(VALUE klass)
  *  Returns a new directory object for the named directory.
  */
 static VALUE
-dir_initialize(VALUE dir, VALUE dirname)
+dir_initialize(int argc, VALUE *argv, VALUE dir)
 {
     struct dir_data *dp;
-
+    static rb_encoding *fs_enc;
+    rb_encoding  *dirname_enc, *intenc, *extenc;
+    VALUE dirname, opt;
+    static VALUE sym_intenc, sym_extenc;
+
+    if (!sym_intenc) {
+	sym_intenc = ID2SYM(rb_intern("internal_encoding"));
+	sym_extenc = ID2SYM(rb_intern("external_encoding"));
+	fs_enc = rb_filesystem_encoding();
+    }
+
+    intenc = NULL;
+    extenc = fs_enc;
+    rb_scan_args(argc, argv, "11", &dirname, &opt);
+    if (!NIL_P(opt)) {
+        VALUE v;
+        opt = rb_check_convert_type(opt, T_HASH, "Hash", "to_hash");
+        v = rb_hash_aref(opt, sym_intenc);
+        if (!NIL_P(v)) intenc = rb_to_encoding(v);
+        v = rb_hash_aref(opt, sym_extenc);
+        if (!NIL_P(v)) extenc = rb_to_encoding(v);
+    }
+
+    dirname_enc = rb_enc_get(dirname);
+    if (intenc);
+    else if (rb_usascii_encoding() == dirname_enc
+            || rb_ascii8bit_encoding() == dirname_enc
+            || extenc == dirname_enc) {
+        intenc = NULL;
+    }
+    else {
+	intenc = dirname_enc;
+        dirname = rb_str_transcode(dirname, rb_enc_from_encoding(extenc));
+    }
     FilePathValue(dirname);
+
     Data_Get_Struct(dir, struct dir_data, dp);
     if (dp->dir) closedir(dp->dir);
     if (dp->path) free(dp->path);
     dp->dir = NULL;
     dp->path = NULL;
+    dp->intenc = intenc;
+    dp->extenc = extenc;
     dp->dir = opendir(RSTRING_PTR(dirname));
     if (dp->dir == NULL) {
 	if (errno == EMFILE || errno == ENFILE) {
@@ -412,12 +453,12 @@ dir_initialize(VALUE dir, VALUE dirname)
  *  block.
  */
 static VALUE
-dir_s_open(VALUE klass, VALUE dirname)
+dir_s_open(int argc, VALUE *argv, VALUE klass)
 {
     struct dir_data *dp;
     VALUE dir = Data_Make_Struct(klass, struct dir_data, 0, free_dir, dp);
 
-    dir_initialize(dir, dirname);
+    dir_initialize(argc, argv, dir);
     if (rb_block_given_p()) {
 	return rb_ensure(rb_yield, dir, dir_close, dir);
     }
@@ -445,6 +486,16 @@ dir_check(VALUE dir)
     if (dirp->dir == NULL) dir_closed();\
 } while (0)
 
+static VALUE
+dir_enc_str(VALUE str, struct dir_data *dirp)
+{
+    rb_enc_associate(str, dirp->extenc);
+    if (dirp->intenc) {
+        str = rb_str_transcode(str, rb_enc_from_encoding(dirp->intenc));
+    }
+    return str;
+}
+
 /*
  *  call-seq:
  *     dir.inspect => string
@@ -483,7 +534,7 @@ dir_path(VALUE dir)
 
     Data_Get_Struct(dir, struct dir_data, dirp);
     if (!dirp->path) return Qnil;
-    return rb_str_new2(dirp->path);
+    return dir_enc_str(rb_str_new2(dirp->path), dirp);
 }
 
 /*
@@ -508,7 +559,7 @@ dir_read(VALUE dir)
     errno = 0;
     dp = readdir(dirp->dir);
     if (dp) {
-	return rb_tainted_str_new(dp->d_name, NAMLEN(dp));
+	return dir_enc_str(rb_tainted_str_new(dp->d_name, NAMLEN(dp)), dirp);
     }
     else if (errno == 0) {	/* end of stream */
 	return Qnil;
@@ -546,7 +597,7 @@ dir_each(VALUE dir)
     GetDIR(dir, dirp);
     rewinddir(dirp->dir);
     for (dp = readdir(dirp->dir); dp != NULL; dp = readdir(dirp->dir)) {
-	rb_yield(rb_tainted_str_new(dp->d_name, NAMLEN(dp)));
+	rb_yield(dir_enc_str(rb_tainted_str_new(dp->d_name, NAMLEN(dp)), dirp));
 	if (dirp->dir == NULL) dir_closed();
     }
     return dir;
@@ -1685,9 +1736,9 @@ dir_s_glob(int argc, VALUE *argv, VALUE 
 }
 
 static VALUE
-dir_open_dir(VALUE path)
+dir_open_dir(int argc, VALUE *argv)
 {
-    VALUE dir = rb_funcall(rb_cDir, rb_intern("open"), 1, path);
+    VALUE dir = rb_funcall2(rb_cDir, rb_intern("open"), argc, argv);
 
     if (TYPE(dir) != T_DATA ||
 	RDATA(dir)->dfree != (RUBY_DATA_FUNC)free_dir) {
@@ -1716,12 +1767,12 @@ dir_open_dir(VALUE path)
  *
  */
 static VALUE
-dir_foreach(VALUE io, VALUE dirname)
+dir_foreach(int argc, VALUE *argv, VALUE io)
 {
     VALUE dir;
 
-    RETURN_ENUMERATOR(io, 1, &dirname);
-    dir = dir_open_dir(dirname);
+    RETURN_ENUMERATOR(io, argc, argv);
+    dir = dir_open_dir(argc, argv);
     rb_ensure(dir_each, dir, dir_close, dir);
     return Qnil;
 }
@@ -1738,11 +1789,11 @@ dir_foreach(VALUE io, VALUE dirname)
  *
  */
 static VALUE
-dir_entries(VALUE io, VALUE dirname)
+dir_entries(int argc, VALUE *argv, VALUE io)
 {
     VALUE dir;
 
-    dir = dir_open_dir(dirname);
+    dir = dir_open_dir(argc, argv);
     return rb_ensure(rb_Array, dir, dir_close, dir);
 }
 
@@ -1867,11 +1918,11 @@ Init_Dir(void)
     rb_include_module(rb_cDir, rb_mEnumerable);
 
     rb_define_alloc_func(rb_cDir, dir_s_alloc);
-    rb_define_singleton_method(rb_cDir, "open", dir_s_open, 1);
+    rb_define_singleton_method(rb_cDir, "open", dir_s_open, -1);
     rb_define_singleton_method(rb_cDir, "foreach", dir_foreach, 1);
-    rb_define_singleton_method(rb_cDir, "entries", dir_entries, 1);
+    rb_define_singleton_method(rb_cDir, "entries", dir_entries, -1);
 
-    rb_define_method(rb_cDir,"initialize", dir_initialize, 1);
+    rb_define_method(rb_cDir,"initialize", dir_initialize, -1);
     rb_define_method(rb_cDir,"path", dir_path, 0);
     rb_define_method(rb_cDir,"inspect", dir_inspect, 0);
     rb_define_method(rb_cDir,"read", dir_read, 0);
--- encoding.c	(revision 16799)
+++ encoding.c	(working copy)
@@ -964,6 +964,23 @@ rb_locale_encoding(void)
     return rb_enc_from_index(idx);
 }
 
+rb_encoding *
+rb_filesystem_encoding(void)
+{
+    static rb_encoding *enc;
+    if (!enc) {
+#if defined __APPLE__
+	enc = rb_enc_find("UTF8-MAC");
+#elif defined _WIN32
+        /* sitll use ANSI encoding */
+	enc = rb_locale_encoding();
+#else
+	enc = rb_locale_encoding();
+#endif
+    }
+    return enc;
+}
+
 static int default_external_index;
 
 rb_encoding *
--- include/ruby/encoding.h	(revision 16799)
+++ include/ruby/encoding.h	(working copy)
@@ -169,6 +169,7 @@ rb_encoding *rb_ascii8bit_encoding(void)
 rb_encoding *rb_utf8_encoding(void);
 rb_encoding *rb_usascii_encoding(void);
 rb_encoding *rb_locale_encoding(void);
+rb_encoding *rb_filesystem_encoding(void);
 rb_encoding *rb_default_external_encoding(void);
 int rb_usascii_encindex(void);
 int rb_ascii8bit_encindex(void);
--- io.c	(revision 16799)
+++ io.c	(working copy)
@@ -125,6 +125,7 @@ VALUE rb_default_rs;
 static VALUE argf;
 
 static ID id_write, id_read, id_getc, id_flush, id_encode, id_readpartial;
+static VALUE sym_mode, sym_perm, sym_extenc, sym_intenc, sym_encoding, sym_open_args;
 
 struct timeval rb_time_interval(VALUE);
 
@@ -4015,11 +4016,34 @@ rb_io_s_popen(int argc, VALUE *argv, VAL
 static VALUE
 rb_open_file(int argc, VALUE *argv, VALUE io)
 {
-    VALUE fname, vmode, perm;
+    VALUE fname, vmode, perm, extenc=Qnil, intenc=Qnil, opt;
     const char *mode;
     int flags, fmode;
+    rb_encoding *fname_enc;
+
+    opt = rb_check_convert_type(argv[argc-1], T_HASH, "Hash", "to_hash");
+    if (!NIL_P(opt)) {
+	VALUE v;
+	v = rb_hash_aref(opt, sym_mode);
+	if (!NIL_P(v)) vmode = v;
+	v = rb_hash_aref(opt, sym_perm);
+	if (!NIL_P(v)) perm = v;
+	v = rb_hash_aref(opt, sym_extenc);
+	if (!NIL_P(v)) extenc = v;
+	v = rb_hash_aref(opt, sym_intenc);
+	if (!NIL_P(v)) intenc = v;
+	argc -= 1;
+    }
 
     rb_scan_args(argc, argv, "12", &fname, &vmode, &perm);
+    fname_enc = rb_enc_get(fname);
+    if (rb_usascii_encoding() == fname_enc
+	|| rb_ascii8bit_encoding() == fname_enc
+        || rb_filesystem_encoding() == fname_enc) {
+    }
+    else {
+	fname = rb_str_transcode(fname, rb_enc_from_encoding(rb_filesystem_encoding()));
+    }
     FilePathValue(fname);
 
     if (FIXNUM_P(vmode) || !NIL_P(perm)) {
@@ -4035,10 +4059,16 @@ rb_open_file(int argc, VALUE *argv, VALU
 	rb_file_sysopen_internal(io, RSTRING_PTR(fname), flags, fmode);
     }
     else {
-
 	mode = NIL_P(vmode) ? "r" : StringValueCStr(vmode);
 	rb_file_open_internal(io, RSTRING_PTR(fname), mode);
     }
+
+    if (!NIL_P(extenc)) {
+	rb_io_t *fptr;
+	GetOpenFile(io, fptr);
+	fptr->enc = rb_to_encoding(extenc);
+	if (!NIL_P(intenc)) fptr->enc2 = rb_to_encoding(intenc);
+    }
     return io;
 }
 
@@ -6145,7 +6175,6 @@ static void
 open_key_args(int argc, VALUE *argv, struct foreach_arg *arg)
 {
     VALUE opt, v;
-    static VALUE encoding, mode, open_args;
 
     FilePathValue(argv[0]);
     arg->io = 0;
@@ -6160,17 +6189,8 @@ open_key_args(int argc, VALUE *argv, str
     if (NIL_P(opt)) goto no_key;
     if (argc > 2) arg->argc = 1;
     else arg->argc = 0;
-    if (!encoding) {
-	ID id;
 
-	id = rb_intern("encoding");
-	encoding = ID2SYM(id);
-	id = rb_intern("mode");
-	mode = ID2SYM(id);
-	id = rb_intern("open_args");
-	open_args = ID2SYM(id);
-    }
-    v = rb_hash_aref(opt, open_args);
+    v = rb_hash_aref(opt, sym_open_args);
     if (!NIL_P(v)) {
 	VALUE args;
 
@@ -6183,7 +6203,7 @@ open_key_args(int argc, VALUE *argv, str
 	arg->io = rb_io_open_with_args(RARRAY_LEN(args), RARRAY_PTR(args));
 	return;
     }
-    v = rb_hash_aref(opt, mode);
+    v = rb_hash_aref(opt, sym_mode);
     if (!NIL_P(v)) {
 	arg->io = rb_io_open(RSTRING_PTR(argv[0]), StringValueCStr(v));
     }
@@ -6191,7 +6211,7 @@ open_key_args(int argc, VALUE *argv, str
 	arg->io = rb_io_open(RSTRING_PTR(argv[0]), "r");
     }
 
-    v = rb_hash_aref(opt, encoding);
+    v = rb_hash_aref(opt, sym_encoding);
     if (!NIL_P(v)) {
 	rb_io_t *fptr;
 	GetOpenFile(arg->io, fptr);
@@ -7740,4 +7760,11 @@ Init_IO(void)
 #ifdef O_SYNC
     rb_file_const("SYNC", INT2FIX(O_SYNC));
 #endif
+
+    sym_mode = ID2SYM(rb_intern("mode"));
+    sym_perm = ID2SYM(rb_intern("perm"));
+    sym_extenc = ID2SYM(rb_intern("external_encoding"));
+    sym_intenc = ID2SYM(rb_intern("internal_encoding"));
+    sym_encoding = ID2SYM(rb_intern("encoding"));
+    sym_open_args = ID2SYM(rb_intern("open_args"));
 }

In This Thread

Prev Next