From: Eric Wong Date: 2016-02-07T06:37:44+00:00 Subject: [ruby-core:73727] Re: [Ruby trunk - Bug #12034] RegExp does not respect file encoding directive Eric Wong wrote: > nobu@ruby-lang.org wrote: > > That encoding has never changed since 1.9. > > It seems because `File.readlink` and `File.realpath` return locale strings. > > Ugh, that isn't right to me since filesystem names (on *nix) can have > any byte besides "\0". How about fall back to ASCII-8BIT if we detect broken code range? We try to be helpful by respecting FS encoding, but we need to acknowledge symlinks can have any byte value from 1-0xFF http://80x24.org/spew/20160207063040.31341-1-e%4080x24.org/raw Subject: [PATCH v2] file.c (rb_file_s_readlink): do not set invalid encoding With the exception of '\0', POSIX allows arbitrary bytes in a symlink. So we should not assume rb_filesystem_encoding() is correct, and fall back to ASCII-8BIT if we detect strange characters. * file.c (rb_file_s_readlink): fall back to ASCII-8BIT * test/ruby/test_file_exhaustive.rb (test_readlink_binary): add [ruby-core:73582] [Bug #12034] --- file.c | 10 +++++++++- test/ruby/test_file_exhaustive.rb | 16 ++++++++++++++++ 2 files changed, 25 insertions(+), 1 deletion(-) diff --git a/file.c b/file.c index 9f430a3..f880411 100644 --- a/file.c +++ b/file.c @@ -2768,7 +2768,15 @@ rb_file_s_symlink(VALUE klass, VALUE from, VALUE to) static VALUE rb_file_s_readlink(VALUE klass, VALUE path) { - return rb_readlink(path, rb_filesystem_encoding()); + VALUE str = rb_readlink(path, rb_filesystem_encoding()); + int cr = rb_enc_str_coderange(str); + + /* POSIX allows arbitrary bytes with the exception of '\0' */ + if (cr == ENC_CODERANGE_BROKEN) { + rb_enc_associate(str, rb_ascii8bit_encoding()); + } + + return str; } #ifndef _WIN32 diff --git a/test/ruby/test_file_exhaustive.rb b/test/ruby/test_file_exhaustive.rb index 53b867e..730000b 100644 --- a/test/ruby/test_file_exhaustive.rb +++ b/test/ruby/test_file_exhaustive.rb @@ -549,6 +549,22 @@ def test_readlink rescue NotImplementedError end + def test_readlink_binary + return unless symlinkfile + bug12034 = '[ruby-core:73582] [Bug #12034]' + Dir.mktmpdir('rubytest-file-readlink') do |tmpdir| + Dir.chdir(tmpdir) do + link = "\xde\xad\xbe\xef".b + File.symlink(link, 'foo') + str = File.readlink('foo') + assert_predicate str, :valid_encoding?, bug12034 + assert_equal link, str, bug12034 + end + end + rescue NotImplementedError => e + skip "#{e.message} (#{e.class})" + end + def test_readlink_long_path return unless symlinkfile bug9157 = '[ruby-core:58592] [Bug #9157]' Unsubscribe: