From: meta@... Date: 2014-08-15T22:28:14+00:00 Subject: [ruby-core:64405] [ruby-trunk - Bug #10132] unpack() ignores default encoding when generating strings, always uses ASCII-8BIT Issue #10132 has been updated by mathew murphy. Now that I read [the documentation on encodings](http://ruby-doc.org/core-2.0/Encoding.html) more carefully, I think the real problem is more fundamental: `__ENCODING__` doesn't determine the encoding of *all* created strings; it only affects strings created using string constants in the source code. String.new.encoding => # "".encoding => # So: > String.new == "" => true > String.new.encoding == "".encoding => false So Ruby is actually behaving as documented, it's just that I find the behavior surprising. Maybe I'm alone in that, though. Any chance we could have a way to specify a default encoding for *all* created strings? ---------------------------------------- Bug #10132: unpack() ignores default encoding when generating strings, always uses ASCII-8BIT https://bugs.ruby-lang.org/issues/10132#change-48364 * Author: mathew murphy * Status: Rejected * Priority: Normal * Assignee: * Category: * Target version: * ruby -v: ruby 2.1.1p76 (2014-02-24 revision 45161) [x86_64-linux] * Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN ---------------------------------------- New strings are generated in the default encoding: irb> __ENCODING__.name => "UTF-8" irb> "��nicode".encoding.name => "UTF-8" ...but not if they're generated by unpack: irb> "��nicode".split.pack('M*').unpack('M*').first => "\xC3\xBCnicode" irb> "��nicode".split.pack('M*').unpack('M*').first.encoding.name => "ASCII-8BIT" Workaround is to force the encoding on every string unpack generates: irb> "��nicode".split.pack('M*').unpack('M*').first.force_encoding(__ENCODING__.name) => "��nicode" -- https://bugs.ruby-lang.org/