[#18436] [ANN] Ruby 1.9.1 feature freeze — "Yugui (Yuki Sonoda)" <yugui@...>

Hi all,

81 messages 2008/09/02
[#18667] Re: [ANN] Ruby 1.9.1 feature freeze — "Yusuke ENDOH" <mame@...> 2008/09/17

Hi,

[#18847] Re: [ANN] Ruby 1.9.1 feature freeze — "Yugui (Yuki Sonoda)" <yugui@...> 2008/09/24

Hi, Yusuke

[#18848] Re: [ANN] Ruby 1.9.1 feature freeze — "Yusuke ENDOH" <mame@...> 2008/09/24

Hi,

[#18886] Re: [ANN] Ruby 1.9.1 feature freeze — Ryan Davis <ryand-ruby@...> 2008/09/25

[#18889] Re: [ANN] Ruby 1.9.1 feature freeze — SASADA Koichi <ko1@...> 2008/09/25

Ryan Davis wrote:

[#18906] Re: [ANN] Ruby 1.9.1 feature freeze — Dave Thomas <dave@...> 2008/09/25

[#18908] Re: [ANN] Ruby 1.9.1 feature freeze — SASADA Koichi <ko1@...> 2008/09/25

Dave Thomas wrote:

[#19032] Re: [ANN] Ruby 1.9.1 feature freeze — Ryan Davis <ryand-ruby@...> 2008/09/30

[#19036] Re: [ANN] Ruby 1.9.1 feature freeze — Jim Weirich <jim.weirich@...> 2008/09/30

[#19039] Re: [ANN] Ruby 1.9.1 feature freeze — Ryan Davis <ryand-ruby@...> 2008/09/30

[#19042] Re: [ANN] Ruby 1.9.1 feature freeze — Dave Thomas <dave@...> 2008/09/30

[#19195] Re: [ANN] Ruby 1.9.1 feature freeze — Ryan Davis <ryand-ruby@...> 2008/10/08

[#19202] Re: [ANN] Ruby 1.9.1 feature freeze — "Austin Ziegler" <halostatue@...> 2008/10/08

On Wed, Oct 8, 2008 at 3:05 AM, Ryan Davis <ryand-ruby@zenspider.com> wrote=

[#19203] Re: [ANN] Ruby 1.9.1 feature freeze — Paul Brannan <pbrannan@...> 2008/10/08

On Wed, Oct 08, 2008 at 09:28:22PM +0900, Austin Ziegler wrote:

[#18452] [ANN] Ruby 1.9.1 feature freeze — "Roger Pack" <rogerpack2005@...>

Would it be possible to have a few patches applied before freeze [if

27 messages 2008/09/04
[#18471] Re: [ANN] Ruby 1.9.1 feature freeze — Yukihiro Matsumoto <matz@...> 2008/09/06

Hi,

[#18490] Re: [ANN] Ruby 1.9.1 feature freeze — Nobuyoshi Nakada <nobu@...> 2008/09/08

Hi,

[#18486] Ruby 1.9 strings & character encoding — "Michael Selig" <michael.selig@...>

Firstly, I apologise if I am going over old ground here - I haven't been

39 messages 2008/09/08
[#18492] Re: Ruby 1.9 strings & character encoding — Yukihiro Matsumoto <matz@...> 2008/09/08

Hi,

[#18494] Re: Ruby 1.9 strings & character encoding — "Michael Selig" <michael.selig@...> 2008/09/08

On Mon, 08 Sep 2008 19:45:36 +1000, Yukihiro Matsumoto

[#18499] Re: Ruby 1.9 strings & character encoding — "NARUSE, Yui" <naruse@...> 2008/09/08

Hi,

[#18500] Re: Ruby 1.9 strings & character encoding — Tim Bray <Tim.Bray@...> 2008/09/08

On Sep 8, 2008, at 10:43 AM, NARUSE, Yui wrote:

[#18515] Re: Ruby 1.9 strings & character encoding — Urabe Shyouhei <shyouhei@...> 2008/09/09

# First off, I'm neutral to this issue

[#18530] Re: Ruby 1.9 strings & character encoding — Tim Bray <Tim.Bray@...> 2008/09/10

On Sep 8, 2008, at 9:06 PM, Urabe Shyouhei wrote:

[#18533] Re: Ruby 1.9 strings & character encoding — Tanaka Akira <akr@...> 2008/09/10

In article <3119E5AB-AEC8-4FEE-B2FA-8C75482E0E9D@sun.com>,

[#18504] Re: Ruby 1.9 strings & character encoding — "Michael Selig" <michael.selig@...> 2008/09/09

On Tue, 09 Sep 2008 03:43:54 +1000, NARUSE, Yui <naruse@airemix.jp> wrote:

[#18572] Working on CSV's Encoding Support — James Gray <james@...>

I'm trying to get the standard CSV library ready for m17n in Ruby

23 messages 2008/09/13
[#18575] Re: Working on CSV's Encoding Support — James Gray <james@...> 2008/09/14

On Sep 13, 2008, at 5:39 PM, James Gray wrote:

[#18576] Re: Working on CSV's Encoding Support — "Michael Selig" <michael.selig@...> 2008/09/14

On Sun, 14 Sep 2008 14:48:47 +1000, James Gray <james@grayproductions.net>

[#18640] Character encodings - a radical suggestion — "Michael Selig" <michael.selig@...>

Hi,

89 messages 2008/09/17
[#18643] Re: Character encodings - a radical suggestion — James Gray <james@...> 2008/09/17

On Sep 16, 2008, at 8:20 PM, Michael Selig wrote:

[#18647] Re: Character encodings - a radical suggestion — "Michael Selig" <michael.selig@...> 2008/09/17

On Wed, 17 Sep 2008 12:51:14 +1000, James Gray <james@grayproductions.net>

[#18658] Re: Character encodings - a radical suggestion — James Gray <james@...> 2008/09/17

On Sep 16, 2008, at 11:21 PM, Michael Selig wrote:

[#18660] Re: Character encodings - a radical suggestion — "NARUSE, Yui" <naruse@...> 2008/09/17

Hi,

[#18663] Re: Character encodings - a radical suggestion — Matthias Wächter <matthias@...> 2008/09/17

On 9/17/2008 3:39 PM, NARUSE, Yui wrote:

[#18666] Re: Character encodings - a radical suggestion — Yukihiro Matsumoto <matz@...> 2008/09/17

Hi,

[#18728] Re: Character encodings - a radical suggestion — Martin Duerst <duerst@...> 2008/09/19

At 00:01 08/09/18, Yukihiro Matsumoto wrote:

[#18729] Re: Character encodings - a radical suggestion — Yukihiro Matsumoto <matz@...> 2008/09/19

Hi,

[#18732] Re: Character encodings - a radical suggestion — "Michael Selig" <michael.selig@...> 2008/09/19

On Fri, 19 Sep 2008 18:24:41 +1000, Yukihiro Matsumoto

[#18734] Re: Character encodings - a radical suggestion — Yukihiro Matsumoto <matz@...> 2008/09/19

Oops, I misfired my mail reader; the following is the right one:

[#18751] Re: Character encodings - a radical suggestion — "Michael Selig" <michael.selig@...> 2008/09/20

On Fri, 19 Sep 2008 19:52:30 +1000, Yukihiro Matsumoto

[#18761] Re: Character encodings - a radical suggestion — Yukihiro Matsumoto <matz@...> 2008/09/20

Hi,

[#18774] Re: Character encodings - a radical suggestion — "Michael Selig" <michael.selig@...> 2008/09/21

On Sun, 21 Sep 2008 02:05:30 +1000, Yukihiro Matsumoto

[#18776] Re: Character encodings - a less radical suggestion — Martin Duerst <duerst@...> 2008/09/22

Hello Michael,

[#18664] Re: Character encodings - a radical suggestion — Yukihiro Matsumoto <matz@...> 2008/09/17

Hi,

[#18762] [Feature #578] add method to disassemble Proc objects — Roger Pack <redmine@...>

Feature #578: add method to disassemble Proc objects

17 messages 2008/09/20

[#18872] [RIP] Guy Decoux. — "Jean-Fran輟is Tr穗" <jftran@...>

Hello,

14 messages 2008/09/24

[#18899] refute_{equal, match, nil, same} is not useful — Fujioka <fuj@...>

Hi,

27 messages 2008/09/25

[#18937] A stupid question... — Dave Thomas <dave@...>

Just what was wrong with Test::Unit? Sure, it was slightly bloated.

25 messages 2008/09/25
[#18941] Re: A stupid question... — "Berger, Daniel" <Daniel.Berger@...> 2008/09/25

> -----Original Message-----

[#19004] Let Ruby be Ruby — Trans <transfire@...> 2008/09/28

[#18986] miniunit problems and release of Ruby 1.9.0-5 — "Yugui (Yuki Sonoda)" <yugui@...>

Hi,

14 messages 2008/09/27

[#19043] Ruby is "stealing" names from operating system API:s — "Johan Holmberg" <johan556@...>

Hi!

13 messages 2008/09/30

[ruby-core:18681] Re: Character encodings - a radical suggestion

From: "Michael Selig" <michael.selig@...>
Date: 2008-09-18 00:03:35 UTC
List: ruby-core #18681
Hi,

Thanks for all the replies - I am not an expert on all these encodings,  
and I (obviously mistakenly!) assumed that all other encodings could be  
converted to Unicode.

When I first looked at Ruby 1.9's encoding support I thought "that's neat  
- I think it will solve my m17n problems". However as I got into it I soon  
discovered that it wasn't nearly this easy!

Here is a summary of my issues:

- Non "ASCII-compatible" data is almost impossible to work with. Just take  
a look at what James Gray was proposing to do for CSV.

- When developing standard classes & mixins that could be installed in any  
country, virtually all methods that handle more than 1 string are going to  
have to worry about the possibility of dealing with incompatible  
encodings. This is a major overhead to a programmer - it may not be  
acceptable to let it raise an error.

- Other alternative languages to Ruby which represent all strings as  
Unicode don't have this problem. Although they may not be a 100% solution  
in Japan & China, they would certainly be fine for me to use.

- As my application is under my control, I can make the decision to  
transcode everything to UTF-8 if I want to. I was hoping not to, but I  
think the extra code I would have to write to test encoding compatibility  
would not be worthwhile as it would be in so many places. And yes, I could  
write a

- For people like James who are trying to modify a standard library like  
CSV, which on the surface looks like a simple task, it is really quite  
daunting.


My "ideal" would be that Ruby automatically converted to a common encoding  
rather than raising an Encoding Compatibility Error. And although Unicode  
apparently may not cope with every character on the planet at present, I  
guess it will one day, and it seems to me to be the sensible thing to use  
as the "common encoding" - or UTF-8 to be precise.

That way, in the 99% of cases where the encodings ARE compatible, Ruby  
would work exactly as it does now.

But it also means that I can write methods and not have to worry about  
them blowing up because of encoding incompatibility.

It *does* mean that strings may "magically" be converted to UTF-8, but I  
don't see this as a big deal as long as when they are output they are  
converted back to the necessary encoding (which I think Ruby does with  
files now). If the "magic" conversion is a problem, maybe there should be  
a switch to turn it on & off.
This auto-convert policy should also be used with non-destructive methods  
like String#== etc so the programmer needn't worry whether the same  
character has a different representation on each side of the "==".
The ASCII-8BIT encoding should be reserved as a "special case" and not be  
subject to auto-conversion, because it is going to be mainly used for  
"byte strings".
Yes, there may be a performance overhead doing this. But is this a big  
deal if it only happens in 1% of cases?

Sure there are issues with this, like what to do with text that cannot be  
encoded to Unicode (now that I know it exists!), and also the  
implementation of these suggestions may not be easy, but I think *not*  
doing something about these issues may make the dev community have a  
negative impression of Ruby, which would be a great, great shame.

Cheers
Mike

On Thu, 18 Sep 2008 00:28:03 +1000, Yukihiro Matsumoto  
<matz@ruby-lang.org> wrote:

> Hi,
>
> In message "Re: [ruby-core:18640] Character encodings - a radical  
> suggestion"
>     on Wed, 17 Sep 2008 10:20:13 +0900, "Michael Selig"  
> <michael.selig@fs.com.au> writes:
>
> |So my radical suggestion is this:
> |
> |Remove internal support for non-ASCII encodings completely, and when
> |reading/writing UTF-16 (and UTF-32) files automatically transcode  
> to/from
> |UTF-8.
>
> What happens with non Unicode text under your suggestion?
>
> My conservative suggestion is that:
>
> Put "r:UTF-16BE:UTF-8" for mode when you open an UTF-16 file to read,
> so that your internal strings are all UTF-8 encoding.
>
> |My reasons:
> |
> |- String & Regexp operations should just "work" without the programmer
> |worrying about encoding comaptibility (I think!)
> |- The programmer only has to think about character encodings at the
> |"interfaces" (files, network interfaces) not throughout the program  
> logic
>
> My "suggestion" satisfies above two.
>
> |- To my knowledge UTF-16 & UTF-32 are the only "non-ASCII compatible" as
> |Ruby defines it
>
> As akr stated this is wrong.
>
> |- To my knowledge no one actually uses UTF-16 or UTF-32 as a locale
>
> Yes.
>
> |- I would avoid having to use ugly modes to open a file like
> |"r:UTF-16LE:UTF-8" (very minor)
>
> This is ugly indeed.  We might add more Unicode support in the
> future.  But we are no hurry.
>
> |- Ruby's internal code would be simpler & cleaner and therefore probably
> |faster and easier to maintain
>
> Dropping UTF-{16,32} is not enough.  Unless we abandon non-Unicode
> encoding support altogether, it won't be THAT simple.  And I am not
> going to remove their support.  I use them everyday.
>
> 							matz.


In This Thread