From: Martin Duerst Date: 2008-10-27T19:37:58+09:00 Subject: [ruby-core:19541] Re: String literal encoding (Was: Defaultsource encoding (Was: [Bug #680] csv.rb: CSV.parse is toolate when encoding ismismatch)) At 07:28 08/10/27, Michael Selig wrote: >I thought one of your points was that you would like to be able to write >Japanese (or other non-ascii) comments which is otherwise only ascii >(which may use "\u" in literals, and want default_internal to be UTF-8). >This means that the source encoding should be Japanese. Your suggestion of >defaulting default_internal to the source encoding means that it will be >set to Japanese. I am not sure that this is always desirable. (This is >very minor - you can always override it) I'm not sure what you mean by "Japanese". It's no problem at all to use UTF-8 to write Japanese. And I guess if somebody uses \u literals and wants default_internal to be UTF-8, they'll in most cases use UTF-8 for the source encoding (comments or whatever else). If you mean Japanese legacy encodings (such as Shift_JIS and EUC-JP), then your are correct, but it would be very rare for somebody to use Shift_JIS or EUC-JP for comments when the program is otherwise supposed to run all-UTF-8. >Isn't backward compatibility with 1.8 scripts more important? >You are now forcing anyone with 1.8 scripts containing non-ascii string >literals to put in a magic comment, otherwise you get "inavlid multibyte >char (US-ASCII)" error in 1.9. Well, yes, that's actually the point of it. Wherever necessary, get everybody to declare their encoding. It may be somewhat suboptimal in the transition phase, but after that, we know what we're dealing with. Regards, Martin. #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University #-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp