From: Michael Selig Date: 2008-10-27T15:57:03+09:00 Subject: [ruby-core:19535] Re: String literal encoding (Was: Default source encoding (Was: [Bug #680] csv.rb: CSV.parse is toolate when encoding is mismatch)) On Mon, 27 Oct 2008 17:27:57 +1100, Nobuyoshi Nakada wrote: > Even in 1.8 or prior, -Ks has been mandatory for Shift_JIS > sources, so they have had -K in the shebang lines already. Why then can I write a ruby 1.8 script which does a "puts" of a Shift_JIS string (no shebang or magic comment), and have it run fine without -Ks? ruby1.8 t1.rb | od -c 0000000 S h i f t _ J I S s t r i n g 0000020 : 202 240 , 202 242 \n 0000030 ruby1.8 -Ks t1.rb | od -c 0000000 S h i f t _ J I S s t r i n g 0000020 : 202 240 , 202 242 \n 0000030 But on 1.9 it only works with -Ks: ruby -v ruby 1.9.0 (2008-10-27 revision 19961) [i686-linux] ruby t1.rb t1.rb:2: invalid multibyte char (US-ASCII) t1.rb:2: invalid multibyte char (US-ASCII) ruby -Ks t1.rb 0000000 S h i f t _ J I S s t r i n g 0000020 : 202 240 , 202 242 \n 0000030 > >> Defaulting source encoding to locale encoding (like -e does) should fix >> this (as long as the end-user's locale is correct), right? > > Yes if they match. > >> I guess if necessary James can put "-KU" in the RUBYOPT environment >> variable to save having to add multiple magic comments, but I feel this >> shouldn't be necessary. > > -U option would be better. I don't think that will work: t2.rb is a single line script which does a puts of a short UTF-8 multibyte string. ruby t2.rb t2.rb:2: invalid multibyte char (US-ASCII) t2.rb:2: invalid multibyte char (US-ASCII) ruby -U t2.rb ruby: "\xD8" on US-ASCII (Encoding::InvalidByteSequenceError) ruby -KU t2.rb | od -c 0000000 U n i c o d e s t r i n g : 0000020 a b 330 265 330 271 \n 0000030 ruby1.8 t2.rb | od -c 0000000 U n i c o d e s t r i n g : 0000020 a b 330 265 330 271 \n 0000030 Cheers Mike