From: naruse@... Date: 2014-04-11T08:42:26+00:00 Subject: [ruby-core:61961] [ruby-trunk - Bug #9712] Dir.entries replace Unicode character with questionmarks Issue #9712 has been updated by Yui NARUSE. Backport changed from 2.0.0: UNKNOWN, 2.1: UNKNOWN to 2.0.0: DONTNEED, 2.1: DONTNEED Thomas Thomassen wrote: > Usaku NAKAMURA wrote: > > check Dir.entries('Foo', encoding: 'utf-8') > > Ah, well that worked. I'd been referring to the Ruby 2.0.0 docs where this argument is missing: > http://www.ruby-doc.org/core-2.0/Dir.html#method-c-entries > > But why is this needed? > On my machine it returns the strings by default in Windows-1252 - which is the same as File.find('filesystem'). I guess it returns it based on that? yes. > But for Windows this is really awkward. Windows-1252 is the compatibility codepage - but the file system itself is perfectly capable of handling Unicode characters. > > I see Ruby explicitly calls the W versions of the Windows file functions instead of declaring the UNICODE flag - this makes all system calls treat Ruby with compatibility handling. > > The Windows file system isn't actually Windows-1252 encoded - or any other encoding ruby currently reports. It's all Unicode - I can use any character I like, so why isn't Ruby just returning result from file functions as Unicode? * Ruby side: many part of Ruby implementation already uses W version API but some part are not. therefore for consistency it is still ANSI based * User side: there's many legacy code which imply ANSI strings Ruby must migrate to Unicode on some day future, but we haven't done yet. ---------------------------------------- Bug #9712: Dir.entries replace Unicode character with questionmarks https://bugs.ruby-lang.org/issues/9712#change-46156 * Author: Thomas Thomassen * Status: Assigned * Priority: Normal * Assignee: Zachary Scott * Category: doc * Target version: current: 2.2.0 * ruby -v: ruby 2.2.0dev (2014-04-07 trunk 45528) [i386-mswin32_100] * Backport: 2.0.0: DONTNEED, 2.1: DONTNEED ---------------------------------------- My basis when testing this is that I have a computer with English OS - codepage Windows-1252. The tests might yield different result if the Windows codepage is different - so please pay attention to that if you are unable to reproduce. Given a folder named "Foo" which contains a sub-folder "���������" ("\u3066\u3059\u3068") Dir.entries("Foo") will return: [".", "..", "???"] The characters that doesn't fit my filesystem codepage is translated into question marks. I would have expected the strings returned to be in some Unicode format. -- https://bugs.ruby-lang.org/