[#48745] [ruby-trunk - Bug #7267][Open] Dir.glob on Mac OS X returns unexpected string encodings for unicode file names — "kennygrant (Kenny Grant)" <kennygrant@...>

17 messages 2012/11/02

[#48773] [ruby-trunk - Bug #7269][Open] Refinement doesn't work if using locate after method — "ko1 (Koichi Sasada)" <redmine@...>

12 messages 2012/11/03

[#48847] [ruby-trunk - Bug #7274][Open] UnboundMethods should be bindable to any object that is_a?(owner of the UnboundMethod) — "rits (First Last)" <redmine@...>

21 messages 2012/11/04

[#48854] [ruby-trunk - Bug #7276][Open] TestFile#test_utime failure — "jonforums (Jon Forums)" <redmine@...>

14 messages 2012/11/04

[#48988] [ruby-trunk - Feature #7292][Open] Enumerable#to_h — "marcandre (Marc-Andre Lafortune)" <ruby-core@...>

40 messages 2012/11/06

[#48997] [ruby-trunk - Feature #7297][Open] map_to alias for each_with_object — "nathan.f77 (Nathan Broadbent)" <nathan.f77@...>

19 messages 2012/11/06

[#49001] [ruby-trunk - Bug #7298][Open] Behavior of Enumerator.new different between 1.9.3 and 2.0.0 — "ayumin (Ayumu AIZAWA)" <ayumu.aizawa@...>

12 messages 2012/11/06

[#49018] [ruby-trunk - Feature #7299][Open] Ruby should not completely ignore blocks. — "marcandre (Marc-Andre Lafortune)" <ruby-core@...>

13 messages 2012/11/07

[#49044] [ruby-trunk - Bug #7304][Open] Random test failures around test_autoclose_true_closed_by_finalizer — "luislavena (Luis Lavena)" <luislavena@...>

11 messages 2012/11/07

[#49196] [ruby-trunk - Feature #7322][Open] Add a new operator name #>< for bit-wise "exclusive or" — "alexeymuranov (Alexey Muranov)" <redmine@...>

18 messages 2012/11/10

[#49211] [ruby-trunk - Feature #7328][Open] Move ** operator precedence under unary + and - — "boris_stitnicky (Boris Stitnicky)" <boris@...>

20 messages 2012/11/11

[#49229] [ruby-trunk - Bug #7331][Open] Set the precedence of unary `-` equal to the precedence `-`, same for `+` — "alexeymuranov (Alexey Muranov)" <redmine@...>

17 messages 2012/11/11

[#49256] [ruby-trunk - Feature #7336][Open] Flexiable OPerator Precedence — "trans (Thomas Sawyer)" <transfire@...>

18 messages 2012/11/12

[#49354] review open pull requests on github — Zachary Scott <zachary@...>

Could we get a review on any open pull requests on github before the

12 messages 2012/11/15
[#49355] Re: review open pull requests on github — "NARUSE, Yui" <naruse@...> 2012/11/15

2012/11/15 Zachary Scott <zachary@zacharyscott.net>:

[#49356] Re: review open pull requests on github — Zachary Scott <zachary@...> 2012/11/15

Ok, I was hoping one of the maintainers might want to.

[#49451] [ruby-trunk - Bug #7374][Open] File.expand_path resolving to first file/dir instead of absolute path — mdube@... (Martin Dubé) <mdube@...>

12 messages 2012/11/16

[#49463] [ruby-trunk - Feature #7375][Open] embedding libyaml in psych for Ruby 2.0 — "tenderlovemaking (Aaron Patterson)" <aaron@...>

21 messages 2012/11/16
[#49494] [ruby-trunk - Feature #7375] embedding libyaml in psych for Ruby 2.0 — "vo.x (Vit Ondruch)" <v.ondruch@...> 2012/11/17

[#49467] [ruby-trunk - Feature #7377][Open] #indetical? as an alias for #equal? — "aef (Alexander E. Fischer)" <aef@...>

13 messages 2012/11/17

[#49558] [ruby-trunk - Bug #7395][Open] Negative numbers can't be primes by definition — "zzak (Zachary Scott)" <zachary@...>

10 messages 2012/11/19

[#49566] [ruby-trunk - Feature #7400][Open] Incorporate OpenSSL tests from JRuby. — "zzak (Zachary Scott)" <zachary@...>

11 messages 2012/11/19

[#49770] [ruby-trunk - Feature #7414][Open] Now that const_get supports "Foo::Bar" syntax, so should const_defined?. — "robertgleeson (Robert Gleeson)" <rob@...>

9 messages 2012/11/20

[#49950] [ruby-trunk - Feature #7427][Assigned] Update Rubygems — "mame (Yusuke Endoh)" <mame@...>

17 messages 2012/11/24

[#50043] [ruby-trunk - Bug #7429][Open] Provide options for core collections to customize behavior — "headius (Charles Nutter)" <headius@...>

10 messages 2012/11/24

[#50092] [ruby-trunk - Feature #7434][Open] Allow caller_locations and backtrace_locations to receive negative params — "sam.saffron (Sam Saffron)" <sam.saffron@...>

21 messages 2012/11/25

[#50094] [ruby-trunk - Bug #7436][Open] Allow for a "granularity" flag for backtrace_locations — "sam.saffron (Sam Saffron)" <sam.saffron@...>

11 messages 2012/11/25

[#50207] [ruby-trunk - Bug #7445][Open] strptime('%s %z') doesn't work — "felipec (Felipe Contreras)" <felipe.contreras@...>

19 messages 2012/11/27

[#50424] [ruby-trunk - Bug #7485][Open] ruby cannot build on mingw32 due to missing __sync_val_compare_and_swap — "drbrain (Eric Hodel)" <drbrain@...7.net>

15 messages 2012/11/30

[#50429] [ruby-trunk - Feature #7487][Open] Cutting through the issues with Refinements — "trans (Thomas Sawyer)" <transfire@...>

13 messages 2012/11/30

[ruby-core:49165] Re: [ruby-trunk - Bug #7267] Dir.glob on Mac OS X returns unexpected string encodings for unicode file names

From: "NARUSE, Yui" <naruse@...>
Date: 2012-11-09 12:38:15 UTC
List: ruby-core #49165
2012/11/9 kennygrant (Kenny Grant) <kennygrant@gmail.com>:
> Thanks for the comments on this issue. I'm not clear on what the UTF8-MAC encoding represents, are there docs on this Ruby behaviour and the problems involved somewhere?

see several lines at the end of enc/utf_8.c.

> It may return a filename marked UTF-8 which is NFD, or NFC, depending on the glob pattern you call it with (see writer.rb attachment to this issue). That's a small issue though and just indicates a wider complex problem.

writer.rb's two puts output the same result.
What do you mean?

>> An issue is people may write decomposed filename. A imaginary use case is a program which make a filename from the name of a music output from iTunes. iTunes manages texts with UTF8-MAC. So the people will confuse.
>
> OK, so in this case someone is unwittingly using a mix of UTF-8 NFC (any strings they create in ruby with legible accents) and UTF-8 NFD (any strings they get from itunes say) in their script, which could lead to issues even before writing file names. If they get NFD from itunes, then try to match on a track name with a regexp, it won't work unless they convert to NFC or explicitly create an NFD string will it?

It will work unless the regexp highly depends composed string.

> One thing I don't understand though, is that you say there are both in normal use - in use of Ruby ignoring file systems, if you create a string or regexp, NFC is the default isn't it?

No, NFC is not default.
The fact is that many IMEs outputs composed characters.
Once a decomposed characters is mixed in a string, the character lives as is.
It won't normalized.

> So Ruby has chosen one default for UTF-8 strings created in Ruby (as it must), but has to interact with lots of systems which might or might not be using NFC. At present we seem to have a de-facto default normalization of NFC, but nothing is translated to it when it comes from the OS. That might be a a very hard problem, but in principle it would be nice to have one normalization blessed as the default so that all strings in a given encoding are comparable. The results of leaving them as they are supplied are really unexpected, and people using Ruby are not going to want to manually convert every string they touch from outside Ruby to NFC in case it was touched by HFS or created as NFD.

Ruby don't normalize characters.
It treat them as they are.
Windows, Linux, and other file systems also don't normalize.

Moreover NFC/NFD lost information.
If a filename is decomposed characters on Windows or Linux, NFC for
the filename lost it.

>> First Ruby 1.9.0 set strings derived from filenames UTF8-MAC.
>> But some reported that if filenames is UTF8-MAC, it is hard to compare
>> with normal UTF-8 strings.
>
> This is interesting as it's exactly the behaviour I expected (if it's not possible to cleanly translate to NFC) - if strings are coming through as UTF-8 NFD, I'd expect them to be marked as such somehow (for example by being marked as encoding UTF8-MAC) - is there any indication?

A no so simple point is UTF8-MAC string is valid as UTF-8.

> Then at least it is clear that they are not comparable or compatible with the NFC ruby strings I get when creating a string s = "d騁ente".

Even if the string is accidentally composed, there are no guarantee
that a string is always composed.

>> If the translation from UTF8-MAC -> UTF-8 is entirely non-lossy and would do no harm to other UTF-8 strings
>> Yes until all part of the converting string is truly UTF8-MAC.
>
> I assumed from others' comments that UTF8-MAC was purely a sub-encoding used to indicate the use of decomposed strings, but would appreciate some more detail (if anyone has a link) on what exactly it involves, and if translation from UTF8-MAC to UTF8 can lose information that implies other differences. If the only difference is the decomposition (patterns which do not occur in NFC), I'd expect re-encoding to be idempotent and not affect NFC strings and thus harmless to apply to NFC strings or strings containing a mix. Re the file-system example, I had assumed that if you ask HFS to write to a file on a mounted file system HFS would normalize all names to NFD (as it does for any HFS files), but perhaps that is incorrect.

A UTF-8 string is not always NFCed.

> I suppose the above boils down to this question:
>
> Is there a correct way to handle this situation, and never fail when comparing a default Ruby string (NFC) against a file from any file system which may be NFD?

No way.
And again, Ruby string is not NFC.

-- 
NARUSE, Yui  <naruse@airemix.jp>

In This Thread