[#18436] [ANN] Ruby 1.9.1 feature freeze — "Yugui (Yuki Sonoda)" <yugui@...>

Hi all,

81 messages 2008/09/02
[#18667] Re: [ANN] Ruby 1.9.1 feature freeze — "Yusuke ENDOH" <mame@...> 2008/09/17

Hi,

[#18847] Re: [ANN] Ruby 1.9.1 feature freeze — "Yugui (Yuki Sonoda)" <yugui@...> 2008/09/24

Hi, Yusuke

[#18848] Re: [ANN] Ruby 1.9.1 feature freeze — "Yusuke ENDOH" <mame@...> 2008/09/24

Hi,

[#18886] Re: [ANN] Ruby 1.9.1 feature freeze — Ryan Davis <ryand-ruby@...> 2008/09/25

[#18889] Re: [ANN] Ruby 1.9.1 feature freeze — SASADA Koichi <ko1@...> 2008/09/25

Ryan Davis wrote:

[#18906] Re: [ANN] Ruby 1.9.1 feature freeze — Dave Thomas <dave@...> 2008/09/25

[#18908] Re: [ANN] Ruby 1.9.1 feature freeze — SASADA Koichi <ko1@...> 2008/09/25

Dave Thomas wrote:

[#19032] Re: [ANN] Ruby 1.9.1 feature freeze — Ryan Davis <ryand-ruby@...> 2008/09/30

[#19036] Re: [ANN] Ruby 1.9.1 feature freeze — Jim Weirich <jim.weirich@...> 2008/09/30

[#19039] Re: [ANN] Ruby 1.9.1 feature freeze — Ryan Davis <ryand-ruby@...> 2008/09/30

[#19042] Re: [ANN] Ruby 1.9.1 feature freeze — Dave Thomas <dave@...> 2008/09/30

[#19195] Re: [ANN] Ruby 1.9.1 feature freeze — Ryan Davis <ryand-ruby@...> 2008/10/08

[#19202] Re: [ANN] Ruby 1.9.1 feature freeze — "Austin Ziegler" <halostatue@...> 2008/10/08

On Wed, Oct 8, 2008 at 3:05 AM, Ryan Davis <ryand-ruby@zenspider.com> wrote=

[#19203] Re: [ANN] Ruby 1.9.1 feature freeze — Paul Brannan <pbrannan@...> 2008/10/08

On Wed, Oct 08, 2008 at 09:28:22PM +0900, Austin Ziegler wrote:

[#18452] [ANN] Ruby 1.9.1 feature freeze — "Roger Pack" <rogerpack2005@...>

Would it be possible to have a few patches applied before freeze [if

27 messages 2008/09/04
[#18471] Re: [ANN] Ruby 1.9.1 feature freeze — Yukihiro Matsumoto <matz@...> 2008/09/06

Hi,

[#18490] Re: [ANN] Ruby 1.9.1 feature freeze — Nobuyoshi Nakada <nobu@...> 2008/09/08

Hi,

[#18486] Ruby 1.9 strings & character encoding — "Michael Selig" <michael.selig@...>

Firstly, I apologise if I am going over old ground here - I haven't been

39 messages 2008/09/08
[#18492] Re: Ruby 1.9 strings & character encoding — Yukihiro Matsumoto <matz@...> 2008/09/08

Hi,

[#18494] Re: Ruby 1.9 strings & character encoding — "Michael Selig" <michael.selig@...> 2008/09/08

On Mon, 08 Sep 2008 19:45:36 +1000, Yukihiro Matsumoto

[#18499] Re: Ruby 1.9 strings & character encoding — "NARUSE, Yui" <naruse@...> 2008/09/08

Hi,

[#18500] Re: Ruby 1.9 strings & character encoding — Tim Bray <Tim.Bray@...> 2008/09/08

On Sep 8, 2008, at 10:43 AM, NARUSE, Yui wrote:

[#18515] Re: Ruby 1.9 strings & character encoding — Urabe Shyouhei <shyouhei@...> 2008/09/09

# First off, I'm neutral to this issue

[#18530] Re: Ruby 1.9 strings & character encoding — Tim Bray <Tim.Bray@...> 2008/09/10

On Sep 8, 2008, at 9:06 PM, Urabe Shyouhei wrote:

[#18533] Re: Ruby 1.9 strings & character encoding — Tanaka Akira <akr@...> 2008/09/10

In article <3119E5AB-AEC8-4FEE-B2FA-8C75482E0E9D@sun.com>,

[#18504] Re: Ruby 1.9 strings & character encoding — "Michael Selig" <michael.selig@...> 2008/09/09

On Tue, 09 Sep 2008 03:43:54 +1000, NARUSE, Yui <naruse@airemix.jp> wrote:

[#18572] Working on CSV's Encoding Support — James Gray <james@...>

I'm trying to get the standard CSV library ready for m17n in Ruby

23 messages 2008/09/13
[#18575] Re: Working on CSV's Encoding Support — James Gray <james@...> 2008/09/14

On Sep 13, 2008, at 5:39 PM, James Gray wrote:

[#18576] Re: Working on CSV's Encoding Support — "Michael Selig" <michael.selig@...> 2008/09/14

On Sun, 14 Sep 2008 14:48:47 +1000, James Gray <james@grayproductions.net>

[#18640] Character encodings - a radical suggestion — "Michael Selig" <michael.selig@...>

Hi,

89 messages 2008/09/17
[#18643] Re: Character encodings - a radical suggestion — James Gray <james@...> 2008/09/17

On Sep 16, 2008, at 8:20 PM, Michael Selig wrote:

[#18647] Re: Character encodings - a radical suggestion — "Michael Selig" <michael.selig@...> 2008/09/17

On Wed, 17 Sep 2008 12:51:14 +1000, James Gray <james@grayproductions.net>

[#18658] Re: Character encodings - a radical suggestion — James Gray <james@...> 2008/09/17

On Sep 16, 2008, at 11:21 PM, Michael Selig wrote:

[#18660] Re: Character encodings - a radical suggestion — "NARUSE, Yui" <naruse@...> 2008/09/17

Hi,

[#18663] Re: Character encodings - a radical suggestion — Matthias Wächter <matthias@...> 2008/09/17

On 9/17/2008 3:39 PM, NARUSE, Yui wrote:

[#18666] Re: Character encodings - a radical suggestion — Yukihiro Matsumoto <matz@...> 2008/09/17

Hi,

[#18728] Re: Character encodings - a radical suggestion — Martin Duerst <duerst@...> 2008/09/19

At 00:01 08/09/18, Yukihiro Matsumoto wrote:

[#18729] Re: Character encodings - a radical suggestion — Yukihiro Matsumoto <matz@...> 2008/09/19

Hi,

[#18732] Re: Character encodings - a radical suggestion — "Michael Selig" <michael.selig@...> 2008/09/19

On Fri, 19 Sep 2008 18:24:41 +1000, Yukihiro Matsumoto

[#18734] Re: Character encodings - a radical suggestion — Yukihiro Matsumoto <matz@...> 2008/09/19

Oops, I misfired my mail reader; the following is the right one:

[#18751] Re: Character encodings - a radical suggestion — "Michael Selig" <michael.selig@...> 2008/09/20

On Fri, 19 Sep 2008 19:52:30 +1000, Yukihiro Matsumoto

[#18761] Re: Character encodings - a radical suggestion — Yukihiro Matsumoto <matz@...> 2008/09/20

Hi,

[#18774] Re: Character encodings - a radical suggestion — "Michael Selig" <michael.selig@...> 2008/09/21

On Sun, 21 Sep 2008 02:05:30 +1000, Yukihiro Matsumoto

[#18776] Re: Character encodings - a less radical suggestion — Martin Duerst <duerst@...> 2008/09/22

Hello Michael,

[#18664] Re: Character encodings - a radical suggestion — Yukihiro Matsumoto <matz@...> 2008/09/17

Hi,

[#18762] [Feature #578] add method to disassemble Proc objects — Roger Pack <redmine@...>

Feature #578: add method to disassemble Proc objects

17 messages 2008/09/20

[#18872] [RIP] Guy Decoux. — "Jean-Fran輟is Tr穗" <jftran@...>

Hello,

14 messages 2008/09/24

[#18899] refute_{equal, match, nil, same} is not useful — Fujioka <fuj@...>

Hi,

27 messages 2008/09/25

[#18937] A stupid question... — Dave Thomas <dave@...>

Just what was wrong with Test::Unit? Sure, it was slightly bloated.

25 messages 2008/09/25
[#18941] Re: A stupid question... — "Berger, Daniel" <Daniel.Berger@...> 2008/09/25

> -----Original Message-----

[#19004] Let Ruby be Ruby — Trans <transfire@...> 2008/09/28

[#18986] miniunit problems and release of Ruby 1.9.0-5 — "Yugui (Yuki Sonoda)" <yugui@...>

Hi,

14 messages 2008/09/27

[#19043] Ruby is "stealing" names from operating system API:s — "Johan Holmberg" <johan556@...>

Hi!

13 messages 2008/09/30

[ruby-core:18776] Re: Character encodings - a less radical suggestion

From: Martin Duerst <duerst@...>
Date: 2008-09-22 02:35:49 UTC
List: ruby-core #18776
Hello Michael,

Many thanks for your proposal. Earlier, when I proposed some
general "encoding policies" to deal with this and similar
problems, the main problem brought up was that it would
interoperate badly with libraries. But looking at your
concrete proposal, it seems to me that overall, the problems
wouldn't actually be that serious.

Therefore, I think we should seriously consider this proposal,
and hopefully implement it before Sept. 25th. In terms of
implementation, I don't think it should be that difficult,
but it may be quite a bit of work to check
Encoding::default_internal in all the affected methods.

In terms of potential problems, I see the following:
- A library sets Encoding::default_internal. That would lead
  to serious problems, and should be clearly advised against
  in the documentation. Libraries either have to be written
  in a general way, or have to document that they only work
  with certain values of Encoding::default_internal
  (this proposal would therefore help you, but not e.g.
   James Gray for the CVS library)
- Encoding::default_internal is set to some dummy or non-ASCII-
  compatible encoding, which may lead to some hickups.
  We may want to make that impossible or advise against.
  (the main use is UTF-8 anyway)
- We should think through various scenarios for output.
  I can't think of any problems just now, I just noticed
  the absence of considerations for output below.

The advantages that I see with this proposal are:
- It gets rid of the bad usability for "r:UTF-16LE:UTF-8"
  (matz, ruby-core:18666)
- It clearly helps "Unicode inside" applications, but is
  not limited to any encoding and may be helpful for other
  encodings as well.
- It fits well within the rest of the naming scheme and the
  overall idea of having several specific encodings to make
  the work of the user easier. If we wouldn't have
  Encoding::default_external, using Ruby with a single
  local encoding would be a big pain. Introducing
  Encoding::default_internal makes using Ruby with
  "Unicode inside" much less of a pain.


At 08:56 08/09/22, Michael Selig wrote:
>On Sun, 21 Sep 2008 02:05:30 +1000, Yukihiro Matsumoto  
><matz@ruby-lang.org> wrote:
>
>> |- How a Japanese programmer would handle the situation of dealing with a
>> |combination of a Japanese non-Unicode compatible character set, and say  
>> a|UTF-8 encoding which included non-ascii characters, and non-Japanese  
>> ones.
>> |ie: Is there a reasonable alternative to encoding both to Unicode &
>> |somehow dealing with the "difficult characters" as special cases?
>>
>> Unicode is getting better each day.  So it now covers almost all
>> day-to-day problems.  Some cellphone problems are covered by using
>> private area.
>
>I infer from this that really Unicode is the only (imperfect) solution for  
>true m17n where we have a mixure of completely different character sets  
>(eg: Japanese & Arabic)?
>What I think this means is that there is no "one size fits all" solution,  
>unfortunately.

Yes. Unicode fits most of the time, some local encoding fits in many
cases (in particular small scripts), and for some very special jobs,
you may have to use something else (a special encoding such as Mojikyo,
the Unicode private areas, an additional level of markup,...).

>So I have an alternate suggestion. Maybe I should rename this thread  
>"Character encodings - a less radical suggestion" :-)

I just did :-).

Regards,    Martin.

>Ruby already has "Encoding::default_external", so why not also have  
>"default_internal"? This option would either be left unset (or NIL I  
>guess) or set to an encoding, likely to be UTF-8 in practice, but maybe  
>there would be a use for it to choose say one of the Japanese encodings if  
>you have a variety of Japanese encodings to handle.
>
>When "default_internal" is nil, Ruby will work as it does now:
>- Ruby libraries such as I/O & network libraries will by default return  
>character data in the external encoding
>- No transcoding will take place unless specifically requested by the Ruby  
>program
>- The Ruby program is responsible for ensuring that the encodings are what  
>it expects, that strings passed to & from Ruby libraries are in the  
>encoding the library expects, and that "Encoding Compatibility Errors"  
>will occur if it is not careful etc.
>
>When "default_internal" is set to an encoding "E":
>- Ruby libraries such as I/O & networking libraries will by default  
>transcode to/from internal encoding E (unless specifically overridden by  
>an option to the class)
>- A Ruby program can then be confident that all strings it handles will be  
>in encoding E, so it doesn't have to worry about encoding compatibility.  
>For example it can be sure that if "s" is "abc" then "s == 'abc'" is true,  
>no matter where the string "s" originated from.
>- Assuming that E is an "ascii-compatible" encoding, the Ruby programmer  
>doesn't have to face issues like "The value is #{val}" substitution  
>failing because "val" is non-ascii compatible.
>- The "downside" as pointed out by a number of people is that not all  
>characters may be transcoded cleanly or even be supported (driving without  
>a seat-belt? :-)), but then programs requiring this level of control  
>should probably not use this feature.
>
>Consequences of this suggestion:
>- Don't have to change the current implementation of encodings, String or  
>Regexp
>- Avoids "automagical transcoding" within String & Regexp methods
>- Responsibility of implementing "default_internal" lies with a certain  
>set of Ruby libraries like IO & networking
>
>Hope this makes sense.
>Mike
>
>
>


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp     


In This Thread