[ruby-core:63958] [ruby-trunk - Bug #7267] Dir.glob on Mac OS X returns unexpected string encodings for unicode file names

From: duerst@...
Date: 2014-07-23 10:11:07 UTC
List: ruby-core #63958
Issue #7267 has been updated by Martin D端rst.

Related to Feature #10084: Add Unicode String Normalization to String class added

----------------------------------------
Bug #7267: Dir.glob on Mac OS X returns unexpected string encodings for unicode file names
https://bugs.ruby-lang.org/issues/7267#change-47979

* Author: Kenny Grant
* Status: Closed
* Priority: Normal
* Assignee: Martin D端rst
* Category: 
* Target version: next minor
* ruby -v: ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-darwin11.4.0]
* Backport: 
----------------------------------------
Tested on Ruby 1.9.3-p194 and ruby-2.0.0-preview1 on Mac OS X 10. 7.5

When calling file system methods with Ruby on Mac OS X, it is not possible to manipulate the resulting file name as a normal UTF-8 string, even though it reports the encoding as UTF-8. It seems to be a UTF-8-MAC string, even when the default encoding is set to UTF-8. This leads to confusion as the string can be manipulated normally except for any unicode characters, which seem to be decomposed. So a regexp using utf-8 characters won't work on the string, unless it is first converted from UTF-8-MAC. I'd expect the string encoding to be UTF-8, or at least to report that it is not a normal UTF-8 string if it has to be UTF-8-MAC for some reason. 

Example, run with a file called Test辿.txt in the same folder:

def transform_string s
   puts "Testing string #{s}"
   puts s.gsub(/辿/,'TEST')
end

Dir.glob("./*.txt").each do |f|  
  puts "Inline string works as expected" 
   s = "./Test辿.txt" 
   puts transform_string s

   puts "File name from Dir.glob does not" 
   puts transform_string f
   
   puts "Encoded file name works as expected, though it is reported as UTF-8, not UTF-8-MAC" 
   f.encode!('UTF-8','UTF-8-MAC')
   puts transform_string f
end

---Files--------------------------------
test.rb (926 Bytes)
Test辿.txt (21 Bytes)
results.txt (1.09 KB)
writer.rb (221 Bytes)


-- 
https://bugs.ruby-lang.org/

In This Thread

Prev Next