From: Suraj Kurapati Date: 2009-11-29T18:26:29+09:00 Subject: [ruby-core:26941] [Bug #2411] String#encode fails but eval("#coding:") works Bug #2411: String#encode fails but eval("#coding:") works http://redmine.ruby-lang.org/issues/show/2411 Author: Suraj Kurapati Status: Open, Priority: Normal ruby -v: ruby 1.9.1p243 (2009-07-16 revision 24175) [i686-linux] Hello, [Summary] String#encode() should internally try the eval() approach shown below before giving up hope and raising Encoding::UndefinedConversionError I found a surprising (POLS please!) workaround for encoding conversion errors in Ruby 1.9 while trying to understand why some Chinese text returned by screen scraping (via Net::HTTP) was appearing in escaped form when it was written to a file. $ irb ## ruby 1.9.1p243 (2009-07-16 revision 24175) [i686-linux] >> Encoding.default_external => # >> s = "%s\xE5\x92\x8C%s" => "%s���%s" >> s.encoding => # This works because my IRB session began in UTF-8 mode, and anything I enter there is naturally treated with UTF-8 encoding. To simulate the actual problem I faced when Net::HTTP returned that string to me with ASCII-8BIT encoding, I tried the following conversion: >> ascii_8bit = s.encode('ascii-8bit') Encoding::UndefinedConversionError: "\xE5\x92\x8C" from UTF-8 to ASCII-8BIT from (irb):4:in `encode' from (irb):4 from /usr/bin/irb:12:in `
' No luck. Let us try eval() because Ruby 1.9 has per-file encoding: >> ascii_8bit = eval("# encoding: ascii-8bit\n#{ s.inspect }") => "%s\xE5\x92\x8C%s" >> ascii_8bit.encoding => # That worked! Surprising! I wonder immediately: couldn't String#encode() fall back to the eval() approach internally instead of giving up & raising Encoding::UndefinedConversionError? And now, if we take that ASCII-8BIT string and try to convert into UTF-8 (just like the problem I faced with the result of Net::HTTP), we face a similar, but opposite problem: >> utf_8 = ascii_8bit.encode('utf-8') Encoding::UndefinedConversionError: "\xE5" from ASCII-8BIT to UTF-8 from (irb):7:in `encode' from (irb):7 from /usr/bin/irb:12:in `
' No luck again. Let us try eval(): >> utf_8 = eval("# encoding: utf-8\n#{ ascii_8bit.inspect }") => "%s���%s" >> utf_8.encoding => # Thanks for your consideration. ---------------------------------------- http://redmine.ruby-lang.org