[#48745] [ruby-trunk - Bug #7267][Open] Dir.glob on Mac OS X returns unexpected string encodings for unicode file names — "kennygrant (Kenny Grant)" <kennygrant@...>

17 messages 2012/11/02

[#48773] [ruby-trunk - Bug #7269][Open] Refinement doesn't work if using locate after method — "ko1 (Koichi Sasada)" <redmine@...>

12 messages 2012/11/03

[#48847] [ruby-trunk - Bug #7274][Open] UnboundMethods should be bindable to any object that is_a?(owner of the UnboundMethod) — "rits (First Last)" <redmine@...>

21 messages 2012/11/04

[#48854] [ruby-trunk - Bug #7276][Open] TestFile#test_utime failure — "jonforums (Jon Forums)" <redmine@...>

14 messages 2012/11/04

[#48988] [ruby-trunk - Feature #7292][Open] Enumerable#to_h — "marcandre (Marc-Andre Lafortune)" <ruby-core@...>

40 messages 2012/11/06

[#48997] [ruby-trunk - Feature #7297][Open] map_to alias for each_with_object — "nathan.f77 (Nathan Broadbent)" <nathan.f77@...>

19 messages 2012/11/06

[#49001] [ruby-trunk - Bug #7298][Open] Behavior of Enumerator.new different between 1.9.3 and 2.0.0 — "ayumin (Ayumu AIZAWA)" <ayumu.aizawa@...>

12 messages 2012/11/06

[#49018] [ruby-trunk - Feature #7299][Open] Ruby should not completely ignore blocks. — "marcandre (Marc-Andre Lafortune)" <ruby-core@...>

13 messages 2012/11/07

[#49044] [ruby-trunk - Bug #7304][Open] Random test failures around test_autoclose_true_closed_by_finalizer — "luislavena (Luis Lavena)" <luislavena@...>

11 messages 2012/11/07

[#49196] [ruby-trunk - Feature #7322][Open] Add a new operator name #>< for bit-wise "exclusive or" — "alexeymuranov (Alexey Muranov)" <redmine@...>

18 messages 2012/11/10

[#49211] [ruby-trunk - Feature #7328][Open] Move ** operator precedence under unary + and - — "boris_stitnicky (Boris Stitnicky)" <boris@...>

20 messages 2012/11/11

[#49229] [ruby-trunk - Bug #7331][Open] Set the precedence of unary `-` equal to the precedence `-`, same for `+` — "alexeymuranov (Alexey Muranov)" <redmine@...>

17 messages 2012/11/11

[#49256] [ruby-trunk - Feature #7336][Open] Flexiable OPerator Precedence — "trans (Thomas Sawyer)" <transfire@...>

18 messages 2012/11/12

[#49354] review open pull requests on github — Zachary Scott <zachary@...>

Could we get a review on any open pull requests on github before the

12 messages 2012/11/15
[#49355] Re: review open pull requests on github — "NARUSE, Yui" <naruse@...> 2012/11/15

2012/11/15 Zachary Scott <zachary@zacharyscott.net>:

[#49356] Re: review open pull requests on github — Zachary Scott <zachary@...> 2012/11/15

Ok, I was hoping one of the maintainers might want to.

[#49451] [ruby-trunk - Bug #7374][Open] File.expand_path resolving to first file/dir instead of absolute path — mdube@... (Martin Dubé) <mdube@...>

12 messages 2012/11/16

[#49463] [ruby-trunk - Feature #7375][Open] embedding libyaml in psych for Ruby 2.0 — "tenderlovemaking (Aaron Patterson)" <aaron@...>

21 messages 2012/11/16
[#49494] [ruby-trunk - Feature #7375] embedding libyaml in psych for Ruby 2.0 — "vo.x (Vit Ondruch)" <v.ondruch@...> 2012/11/17

[#49467] [ruby-trunk - Feature #7377][Open] #indetical? as an alias for #equal? — "aef (Alexander E. Fischer)" <aef@...>

13 messages 2012/11/17

[#49558] [ruby-trunk - Bug #7395][Open] Negative numbers can't be primes by definition — "zzak (Zachary Scott)" <zachary@...>

10 messages 2012/11/19

[#49566] [ruby-trunk - Feature #7400][Open] Incorporate OpenSSL tests from JRuby. — "zzak (Zachary Scott)" <zachary@...>

11 messages 2012/11/19

[#49770] [ruby-trunk - Feature #7414][Open] Now that const_get supports "Foo::Bar" syntax, so should const_defined?. — "robertgleeson (Robert Gleeson)" <rob@...>

9 messages 2012/11/20

[#49950] [ruby-trunk - Feature #7427][Assigned] Update Rubygems — "mame (Yusuke Endoh)" <mame@...>

17 messages 2012/11/24

[#50043] [ruby-trunk - Bug #7429][Open] Provide options for core collections to customize behavior — "headius (Charles Nutter)" <headius@...>

10 messages 2012/11/24

[#50092] [ruby-trunk - Feature #7434][Open] Allow caller_locations and backtrace_locations to receive negative params — "sam.saffron (Sam Saffron)" <sam.saffron@...>

21 messages 2012/11/25

[#50094] [ruby-trunk - Bug #7436][Open] Allow for a "granularity" flag for backtrace_locations — "sam.saffron (Sam Saffron)" <sam.saffron@...>

11 messages 2012/11/25

[#50207] [ruby-trunk - Bug #7445][Open] strptime('%s %z') doesn't work — "felipec (Felipe Contreras)" <felipe.contreras@...>

19 messages 2012/11/27

[#50424] [ruby-trunk - Bug #7485][Open] ruby cannot build on mingw32 due to missing __sync_val_compare_and_swap — "drbrain (Eric Hodel)" <drbrain@...7.net>

15 messages 2012/11/30

[#50429] [ruby-trunk - Feature #7487][Open] Cutting through the issues with Refinements — "trans (Thomas Sawyer)" <transfire@...>

13 messages 2012/11/30

[ruby-core:48766] [ruby-trunk - Bug #7267] Dir.glob on Mac OS X returns unexpected string encodings for unicode file names

From: "shyouhei (Shyouhei Urabe)" <shyouhei@...>
Date: 2012-11-02 19:18:16 UTC
List: ruby-core #48766
Issue #7267 has been updated by shyouhei (Shyouhei Urabe).


meta (mathew murphy) wrote:
> Seems to me Ruby should pick one of the standard normalization forms

No, Ruby's M17N design is that it never picks one standard form of strings.  Strings have various encodings and it's you, not ruby, who should pick a right thing.

I admit this is not the one only solution for this mess.  Other langages like Perl have different opinions.  But the way ruby works is this.  So please, do it for yourself.

And I admit it should be much more easier for you to do the choice.  We need some brush-up.
----------------------------------------
Bug #7267: Dir.glob on Mac OS X returns unexpected string encodings for unicode file names
https://bugs.ruby-lang.org/issues/7267#change-32249

Author: kennygrant (Kenny Grant)
Status: Open
Priority: Normal
Assignee: 
Category: 
Target version: 2.0.0
ruby -v: ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-darwin11.4.0]


Tested on Ruby 1.9.3-p194 and ruby-2.0.0-preview1 on Mac OS X 10. 7.5

When calling file system methods with Ruby on Mac OS X, it is not possible to manipulate the resulting file name as a normal UTF-8 string, even though it reports the encoding as UTF-8. It seems to be a UTF-8-MAC string, even when the default encoding is set to UTF-8. This leads to confusion as the string can be manipulated normally except for any unicode characters, which seem to be decomposed. So a regexp using utf-8 characters won't work on the string, unless it is first converted from UTF-8-MAC. I'd expect the string encoding to be UTF-8, or at least to report that it is not a normal UTF-8 string if it has to be UTF-8-MAC for some reason. 

Example, run with a file called Test辿.txt in the same folder:

def transform_string s
   puts "Testing string #{s}"
   puts s.gsub(/辿/,'TEST')
end

Dir.glob("./*.txt").each do |f|  
  puts "Inline string works as expected" 
   s = "./Test辿.txt" 
   puts transform_string s

   puts "File name from Dir.glob does not" 
   puts transform_string f
   
   puts "Encoded file name works as expected, though it is reported as UTF-8, not UTF-8-MAC" 
   f.encode!('UTF-8','UTF-8-MAC')
   puts transform_string f
end


-- 
http://bugs.ruby-lang.org/

In This Thread