[#4595] New block syntax — Daniel Amelang <daniel.amelang@...>

I'm really sorry if this isn't the place to talk about this. I've

25 messages 2005/03/21
[#4606] Re: New block syntax — "David A. Black" <dblack@...> 2005/03/21

Hi --

[#4629] Re: New block syntax — "Sean E. Russell" <ser@...> 2005/03/30

On Monday 21 March 2005 16:17, David A. Black wrote:

[#4648] about REXML::Encoding — speakillof <speakillof@...>

Hi.

15 messages 2005/03/31
[#4659] Re: about REXML::Encoding — "Sean E. Russell" <ser@...> 2005/04/04

On Thursday 31 March 2005 09:44, speakillof wrote:

Re: Win32 Non-ASCII Filename Access

From: Austin Ziegler <halostatue@...>
Date: 2005-03-09 17:09:16 UTC
List: ruby-core #4538
On Thu, 10 Mar 2005 01:38:12 +0900, Berger, Daniel
<Daniel.Berger@qwest.com> wrote:
>> -----Original Message-----
>> From: Austin Ziegler [mailto:halostatue@gmail.com ]
>> Sent: Wednesday, March 09, 2005 8:52 AM
>> To: ruby-core@ruby-lang.org 
>> Subject: Win32 Non-ASCII Filename Access

>> I have been working on stuff at work that involves non-ASCII
>> filenames on Windows, with differing character sets (such as
>> "日本語" and "jalapeño"). Windows stores these filenames as UCS-2
>> entries on all modern filesystems (FAT32 and NTFS).
>> 
>> The win32 directory and filename handling is using FindFirstFile
>> instead of FindFirstFileW; this means that it will never be
>> possible to handle certain filenames in Ruby.
> Actually, it depends on how Ruby was built. FindFirstFile() and
> FindFirstFileEx() will use the wide character versions
> automatically IF the UNICODE macro is set. Basically, most Windows
> functions look like this:

I know.

UNFORTUNATELY, to get that to work, you also have to use TCHAR as
your character type. That is, instead of:

  char*	spec = "C:\\Foo\\Bar\\*.*";

you need:

  TCHAR* spec = "C:\\Foo\\Bar\\*.*";

This may cause *other* problems with Ruby, since it seemss to be
written around the assumption that a character is a single byte
wide.

Ultimately, the only acceptable way to do this is to NOT use TCHAR,
but to explicitly use the wide versions of functions and do
MultibyteToWide and WideToMultibyte calls as necessary. The best
choice for this will be, of course, UTF-8 (CP_UTF8), but if we're
not in UTF-8 mode, we can always use ANSI (CP_ACP) and get the exact
same behaviour. Better, we get to choose the mode of behaviour at
run-time.

I do NOT recommend the use of TCHAR and _TEXT; they are
Microsoftisms, and they won't be compatible with standard Ruby, I
don't think.

Going the route that I'm going, however, does present its own
problems in terms of active memory space and/or processing time,
although going to and from UTF-8 will be quick.

-austin
-- 
Austin Ziegler * halostatue@gmail.com
               * Alternate: austin@halostatue.ca


In This Thread

Prev Next