From: Tom Link Date: 2009-02-04T18:22:24+09:00 Subject: [ruby-core:21830] [Feature #1106] Script encoding vs. default_internal: Implicitly transcode strings/regexps Feature #1106: Script encoding vs. default_internal: Implicitly transcode strings/regexps http://redmine.ruby-lang.org/issues/show/1106 Author: Tom Link Status: Open, Priority: Normal If I'm not mistaken, a related issue was discussed in the past (eg [1]). Anyway, please take a sec and consider the following scripts and input files: FILE: test2.rb: # encoding: UTF-8 Encoding.default_internal = Encoding::UTF_8 Encoding.default_external = Encoding::UTF_8 require 'test2a' File.readlines('test2.txt').each do |line| p line, test2a(line) end FILE: test2a.rb # encoding: ISO-8859-1 p __ENCODING__ def test2a(x) x =~ /[��������������]/ end FILE: test.txt (uft8 byte sequences; the second line should read "wei��", the third one "B��r" in UTF-8 encoding) foo wei���� B����r bar If I run $ ruby -v ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-cygwin] $ ruby test2.rb # "foo\n" nil /home/t/src/tmp/test2a.rb:6:in `test2a': invalid byte sequence in UTF-8 (ArgumentError) from test2.rb:9:in `block in
' from test2.rb:8:in `each' from test2.rb:8:in `
' It seems the ISO-8859-1 encoded regexp in test2a.rb /[��������������]/, is not transcoded to UTF-8. But since default_internal is set to UFT-8, ruby seems to expect a valid UTF-8 string. Please forgive me if my interpretation of that error message is wrong. It is quite possible that I missed something and that there already exists an easy solution to this problem, which I don't know of. If that is the case, I kindly ask you to tell me about it. If this is the way, ruby 1.9.1 currently is supposed to work, I would humbly suggest to silently transcode all strings found in scripts to default_internal if non-nil. IMHO not transcoding strings doesn't make any sense and drives users who work with heterogeneous files to madness. If a string cannot be transcoded to default_internal, an error should be raised. Thanks. [1] http://groups.google.com/group/ruby-core-google/browse_frm/thread/d6474429dd112926?hl=en ---------------------------------------- http://redmine.ruby-lang.org