From: Bill Kelly Date: 2010-05-06T19:39:27+09:00 Subject: [ruby-core:30052] Re: [Bug #1685] Some windows unicode path issues remain U.Nakamura wrote: > > In message "[ruby-core:30012] Re: [Bug #1685] Some windows unicode path issues remain" > on May.05,2010 15:35:11, wrote: > | > | It seems rb_stat in file.c calls stat(), but stat does > | not map to the unicode version. > > Oops, thank you! Thanks, the test gets much further now. It now fails at the last line: Dir.chdir DNAME_CHINESE cwd = Dir.pwd ( cwd[(-DNAME_CHINESE.length)..-1] == DNAME_CHINESE ) or raise "cwd check fail" Currently there was only rb_w32_getcwd. I have added a unicode rb_w32_ugetcwd: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Index: include/ruby/win32.h =================================================================== --- include/ruby/win32.h (revision 27644) +++ include/ruby/win32.h (working copy) @@ -254,6 +254,7 @@ extern struct servent *WSAAPI rb_w32_getservbyport(int, const char *); extern int rb_w32_socketpair(int, int, int, int *); extern char * rb_w32_getcwd(char *, int); +extern char * rb_w32_ugetcwd(char *, int); extern char * rb_w32_getenv(const char *); extern int rb_w32_rename(const char *, const char *); extern int rb_w32_urename(const char *, const char *); @@ -611,7 +612,7 @@ #define get_osfhandle(h) rb_w32_get_osfhandle(h) #undef getcwd -#define getcwd(b, s) rb_w32_getcwd(b, s) +#define getcwd(b, s) rb_w32_ugetcwd(b, s) #undef getenv #define getenv(n) rb_w32_getenv(n) Index: win32/win32.c =================================================================== --- win32/win32.c (revision 27644) +++ win32/win32.c (working copy) @@ -3692,6 +3692,57 @@ return p; } +char * +rb_w32_ugetcwd(char *buffer, int size) +{ + char *p; + WCHAR *wp; + long len, wlen; + + wlen = GetCurrentDirectoryW(0, NULL); // wlen includes null terminating character + if (!wlen) { + errno = map_errno(GetLastError()); + return NULL; + } + + wp = malloc(wlen * sizeof(WCHAR)); + if (!wp) { + errno = ENOMEM; + return NULL; + } + + if (!GetCurrentDirectoryW(wlen, wp)) { + errno = map_errno(GetLastError()); + free(wp); + return NULL; + } + + p = wstr_to_utf8(wp, &len); + free(wp); + len += 1; // len now includes null terminating character + + if (!p) { + errno = ENOMEM; + return NULL; + } + + if (buffer) { + if (size < len) { + free(p); + errno = ERANGE; + return NULL; + } + + memcpy(buffer, p, len); + free(p); + p = buffer; + } + + translate_char(p, '\\', '/'); + + return p; +} + int chown(const char *path, int owner, int group) { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This works, in terms of returning a UTF-8 path string; however, rb_dir_getwd calls rb_enc_associate(cwd, rb_filesystem_encoding()) on the result, associating the WINDOWS-1252 encoding instead of UTF-8. So, I would like to ask: is there a reason enc_set_filesystem_encoding() should not return UTF-8 now for Windows? static int enc_set_filesystem_encoding(void) { int idx; #if defined NO_LOCALE_CHARMAP idx = rb_enc_to_index(rb_default_external_encoding()); #elif defined _WIN32 || defined __CYGWIN__ char cp[sizeof(int) * 8 / 3 + 4]; snprintf(cp, sizeof cp, "CP%d", AreFileApisANSI() ? GetACP() : GetOEMCP()); idx = rb_enc_find_index(cp); if (idx < 0) idx = rb_ascii8bit_encindex(); #else idx = rb_enc_to_index(rb_default_external_encoding()); #endif enc_alias_internal("filesystem", idx); return idx; } It seems strange that it still selects non-unicode encodings. * * * Also, my bootstraptest encountered one more problem. The mktmpdir can't delete the unicode directory entries created by my test: P:/code/ruby-svn/trunk/lib/fileutils.rb:1307:in `unlink': Invalid argument - C:/temp/bootstraptest20100505-1016-1lvss6a.tmpwd/???? (Errno::EINVAL) from P:/code/ruby-svn/trunk/lib/fileutils.rb:1307:in `block in remove_file' from P:/code/ruby-svn/trunk/lib/fileutils.rb:1315:in `platform_support' from P:/code/ruby-svn/trunk/lib/fileutils.rb:1306:in `remove_file' from P:/code/ruby-svn/trunk/lib/fileutils.rb:1295:in `remove' from P:/code/ruby-svn/trunk/lib/fileutils.rb:761:in `block in remove_entry' from P:/code/ruby-svn/trunk/lib/fileutils.rb:1345:in `block (2 levels) in postorder_traverse' from P:/code/ruby-svn/trunk/lib/fileutils.rb:1349:in `postorder_traverse' from P:/code/ruby-svn/trunk/lib/fileutils.rb:1344:in `block in postorder_traverse' from P:/code/ruby-svn/trunk/lib/fileutils.rb:1343:in `each' from P:/code/ruby-svn/trunk/lib/fileutils.rb:1343:in `postorder_traverse' from P:/code/ruby-svn/trunk/lib/fileutils.rb:759:in `remove_entry' from P:/code/ruby-svn/trunk/lib/fileutils.rb:688:in `remove_entry_secure' from P:/code/ruby-svn/trunk/lib/tmpdir.rb:85:in `ensure in mktmpdir' from P:/code/ruby-svn/trunk/lib/tmpdir.rb:85:in `mktmpdir' from ./bootstraptest/runner.rb:375:in `in_temporary_working_directory' from ./bootstraptest/runner.rb:126:in `main' from ./bootstraptest/runner.rb:398:in `
' I don't have a patch for this yet. However, it looks like in win32.c, routines such as rb_w32_opendir and rb_w32_readdir_with_enc are already using WCHAR internally! For example: DIR * rb_w32_opendir(const char *filename) { struct stati64 sbuf; WIN32_FIND_DATAW fd; HANDLE fh; WCHAR *wpath; if (!(wpath = filecp_to_wstr(filename, NULL))) return NULL; ... so it seems if filesystem encoding were considered UTF-8 instead of WINDOWS-1252, then opendir might just work. Similarly (somewhat) with rb_w32_readdir_with_enc. (At least, it does call readdir_internal, which uses WCHAR.) So I *think* these are very close to working UTF-8, but, again, I don't understand why enc_set_filesystem_encoding() uses WINDOWS-1252 still. Thanks, Regards, Bill