[#35446] [Ruby 1.9 - Bug #4477][Open] Kernel:exec and backtick (`) don't work for certain system commands — Joachim Wuttke <j.wuttke@...>

10 messages 2011/03/07

[#35476] [Ruby 1.9 - Bug #4489][Open] [PATCH] Encodings with /-(unix|dos|mac)\Z/ — "James M. Lawrence" <quixoticsycophant@...>

20 messages 2011/03/10

[#35552] [Ruby 1.9 - Feature #4523][Open] Kernel#require to return the path of the loaded file — Alex Young <alex@...>

14 messages 2011/03/24

[#35565] [Ruby 1.9 - Feature #4531][Open] [PATCH 0/7] use poll() instead of select() in certain cases — Eric Wong <normalperson@...>

33 messages 2011/03/28

[#35566] [Ruby 1.9 - Feature #4532][Open] [PATCH] add IO#pread and IO#pwrite methods — Eric Wong <normalperson@...>

12 messages 2011/03/28

[#35586] [Ruby 1.9 - Feature #4538][Open] [PATCH (cleanup)] avoid unnecessary select() calls before doing I/O — Eric Wong <normalperson@...>

9 messages 2011/03/29

[ruby-core:35525] Re: [Feature #2350](Rejected) Unicode specific functionality on String in 1.9

From: Nikolai Weibull <now@...>
Date: 2011-03-18 12:52:27 UTC
List: ruby-core #35525
On Fri, Mar 18, 2011 at 11:53, Magnus Holm <judofyr@gmail.com> wrote:
> The problem is that the definition of #upcase doesn't only depend on the
> encoding used, but also the language of the encoded text. For instance, if
> you're writing in Turkish, you would expect "i".upcase to return a dotted
> uppcase I: http://www.i18nguy.com/unicode/turkish-i18n.html

I know.  The same goes for ‘i’ in Lithuanian.

> Doing this properly is *really* hard and needs to have a lot of flexibility,
> especially when it comes to non-Western languages.

This is simply not true.  Unicode defines how to deal with case
conversions.  I’m not saying that the Unicode standard is infallible,
but we can at least adhere to it.  I’m not saying that Unicode is the
only encoding that we should care about, but if we support the Unicode
transfer formats, why not support other interesting parts of the
standard?

> It's far easier for everyone that the built-in #upcase is
> simple and fast and you'll have to be explicit about any
> other I18n stuff IMO.

Easy, perhaps, but hardly useful.

My point is that the current #upcase (and similar methods) is
basically useless for anything other than ASCII.  I was looking for an
actual solution to this problem.  I have a library
(character-encodings) that does support these conversions, based on
locale and the Unicode character database (UCD).  How do we make it
easy for the user to deal with m18n?  I mean, if I say

# -*- coding: utf-8 -*-

puts "äbc".upcase

I expect this to do the right thing for Unicode under the current locale.

As Unicode defines how to deal with case conversions, if I tell Ruby
that “this String is encoded as UTF-8” (or, in this case, “strings in
this file are encoded as UTF-8”), I expect Ruby to respond “OK, I’ll
use the Unicode rules that govern methods like #upcase for that
String”.

The UCD requires a lot of memory, so I suggested that a library, such
as character-encodings, should be able to seamlessly add this kind of
behavior without requiring the user to write "äbc".unicodify.upcase,
if the UCD can’t be included in standard Ruby runtime.

But, come to think of it, doesn’t Oniguruma need most of the UCD
information, so isn’t most of it already included in the Ruby runtime?
 Adding casing information perhaps wouldn’t require much additional
space.

If this isn’t of interest, then I’m still looking for a way to
override #upcase for Strings that use the UTF-8 encoding without
resorting to alias_method or extend (as shown earlier in this
discussion).  This seems impossible to do at the moment, as Encoding
is a completely opaque object.

In This Thread