From: Martin Duerst <duerst@...>
Date: 2008-10-27T19:37:58+09:00
Subject: [ruby-core:19541] Re: String literal encoding (Was:  Defaultsource encoding (Was: [Bug #680] csv.rb: CSV.parse is toolate  when encoding ismismatch))

At 07:28 08/10/27, Michael Selig wrote:

>I thought one of your points was that you would like to be able to write  
>Japanese (or other non-ascii) comments which is otherwise only ascii  
>(which may use "\u" in literals, and want default_internal to be UTF-8).  
>This means that the source encoding should be Japanese. Your suggestion of  
>defaulting default_internal to the source encoding means that it will be  
>set to Japanese. I am not sure that this is always desirable. (This is  
>very minor - you can always override it)

I'm not sure what you mean by "Japanese". It's no problem at all
to use UTF-8 to write Japanese. And I guess if somebody uses
\u literals and wants default_internal  to be UTF-8, they'll
in most cases use UTF-8 for the source encoding (comments or
whatever else).

If you mean Japanese legacy encodings (such as Shift_JIS and
EUC-JP), then your are correct, but it would be very rare
for somebody to use Shift_JIS or EUC-JP for comments when
the program is otherwise supposed to run all-UTF-8.


>Isn't backward compatibility with 1.8 scripts more important?
>You are now forcing anyone with 1.8 scripts containing non-ascii string  
>literals to put in a magic comment, otherwise you get "inavlid multibyte  
>char (US-ASCII)" error in 1.9.

Well, yes, that's actually the point of it. Wherever necessary,
get everybody to declare their encoding. It may be somewhat suboptimal
in the transition phase, but after that, we know what we're dealing
with.

Regards,    Martin.


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp