[#6954] Why isn't Perl highly orthogonal? — Terrence Brannon <brannon@...>

27 messages 2000/12/09

[#7022] Re: Ruby in the US — Kevin Smith <kevinbsmith@...>

> Is it possible for the US to develop corporate

36 messages 2000/12/11
[#7633] Re: Ruby in the US — Dave Thomas <Dave@...> 2000/12/19

tonys@myspleenklug.on.ca (tony summerfelt) writes:

[#7636] Re: Ruby in the US — "Joseph McDonald" <joe@...> 2000/12/19

[#7704] Re: Ruby in the US — Jilani Khaldi <jilanik@...> 2000/12/19

> > first candidates would be mysql and postgressql because source is

[#7705] Code sample for improvement — Stephen White <steve@...> 2000/12/19

During an idle chat with someone on IRC, they presented some fairly

[#7750] Re: Code sample for improvement — "Guy N. Hurst" <gnhurst@...> 2000/12/20

Stephen White wrote:

[#7751] Re: Code sample for improvement — David Alan Black <dblack@...> 2000/12/20

Hello --

[#7755] Re: Code sample for improvement — "Guy N. Hurst" <gnhurst@...> 2000/12/20

David Alan Black wrote:

[#7758] Re: Code sample for improvement — Stephen White <steve@...> 2000/12/20

On Wed, 20 Dec 2000, Guy N. Hurst wrote:

[#7759] Next amusing problem: talking integers (was Re: Code sample for improvement) — David Alan Black <dblack@...> 2000/12/20

On Wed, 20 Dec 2000, Stephen White wrote:

[#7212] New User Survey: we need your opinions — Dave Thomas <Dave@...>

16 messages 2000/12/14

[#7330] A Java Developer's Wish List for Ruby — "Richard A.Schulman" <RichardASchulman@...>

I see Ruby as having a very bright future as a language to

22 messages 2000/12/15

[#7354] Ruby performance question — Eric Crampton <EricCrampton@...>

I'm parsing simple text lines which look like this:

21 messages 2000/12/15
[#7361] Re: Ruby performance question — Dave Thomas <Dave@...> 2000/12/15

Eric Crampton <EricCrampton@worldnet.att.net> writes:

[#7367] Re: Ruby performance question — David Alan Black <dblack@...> 2000/12/16

On Sat, 16 Dec 2000, Dave Thomas wrote:

[#7371] Re: Ruby performance question — "Joseph McDonald" <joe@...> 2000/12/16

[#7366] GUIs for Rubies — "Conrad Schneiker" <schneik@...>

Thought I'd switch the subject line to the subject at hand.

22 messages 2000/12/16

[#7416] Re: Ruby IDE (again) — Kevin Smith <kevins14@...>

>> >> I would contribute to this project, if it

17 messages 2000/12/16
[#7422] Re: Ruby IDE (again) — Holden Glova <dsafari@...> 2000/12/16

-----BEGIN PGP SIGNED MESSAGE-----

[#7582] New to Ruby — takaoueda@...

I have just started learning Ruby with the book of Thomas and Hunt. The

24 messages 2000/12/18

[#7604] Any corrections for Programming Ruby — Dave Thomas <Dave@...>

12 messages 2000/12/18

[#7737] strange border-case Numeric errors — "Brian F. Feldman" <green@...>

I haven't had a good enough chance to familiarize myself with the code in

19 messages 2000/12/20

[#7801] Is Ruby part of any standard GNU Linux distributions? — "Pete McBreen, McBreen.Consulting" <mcbreenp@...>

Anybody know what it would take to get Ruby into the standard GNU Linux

15 messages 2000/12/20

[#7938] Re: defined? problem? — Kevin Smith <sent@...>

matz@zetabits.com (Yukihiro Matsumoto) wrote:

26 messages 2000/12/22
[#7943] Re: defined? problem? — Dave Thomas <Dave@...> 2000/12/22

Kevin Smith <sent@qualitycode.com> writes:

[#7950] Re: defined? problem? — Stephen White <steve@...> 2000/12/22

On Fri, 22 Dec 2000, Dave Thomas wrote:

[#7951] Re: defined? problem? — David Alan Black <dblack@...> 2000/12/22

On Fri, 22 Dec 2000, Stephen White wrote:

[#7954] Re: defined? problem? — Dave Thomas <Dave@...> 2000/12/22

David Alan Black <dblack@candle.superlink.net> writes:

[#7975] Re: defined? problem? — David Alan Black <dblack@...> 2000/12/22

Hello --

[#7971] Hash access method — Ted Meng <ted_meng@...>

Hi,

20 messages 2000/12/22

[#8030] Re: Basic hash question — ts <decoux@...>

>>>>> "B" == Ben Tilly <ben_tilly@hotmail.com> writes:

15 messages 2000/12/24
[#8033] Re: Basic hash question — "David A. Black" <dblack@...> 2000/12/24

On Sun, 24 Dec 2000, ts wrote:

[#8178] Inexplicable core dump — "Nathaniel Talbott" <ntalbott@...>

I have some code that looks like this:

12 messages 2000/12/28

[#8196] My first impression of Ruby. Lack of overloading? (long) — jmichel@... (Jean Michel)

Hello,

23 messages 2000/12/28

[#8198] Re: Ruby cron scheduler for NT available — "Conrad Schneiker" <schneik@...>

John Small wrote:

14 messages 2000/12/28

[#8287] Re: speedup of anagram finder — "SHULTZ,BARRY (HP-Israel,ex1)" <barry_shultz@...>

> -----Original Message-----

12 messages 2000/12/29

[ruby-talk:7443] Re: Unicode Issues (was: "A Java Developer's Wish List for Ruby")

From: matz@... (Yukihiro Matsumoto)
Date: 2000-12-16 16:55:55 UTC
List: ruby-talk #7443
Hi,

In message "[ruby-talk:7436] Unicode Issues (was: "A Java Developer's Wish List for Ruby")"
    on 00/12/17, Richard A.Schulman <RichardASchulman@att.net> writes:

|>|Do you mean that it has been superceded UTF-16? Or what?
|>That's what I mean.
|
|Good. Both UCS-2 and UTF-16 have the same 16-bit encoding
|for the 49,194 presently defined characters used in most of
|the languages of the world. UTF-16 is a superset of UCS-2,
|adding in the possibility of surrogates. Just out of
|curiosity, though, how important is the surrogate extension
|to users in Japan?

Not important yet.  But future addition to JIS standard will probably
be covered by surrogate extension (yes, KANJI set is still growing).
So ignoring surrogates shall be great trouble in the future.

|Matz:
|>|>But I'm going to add M17N feature to the next version Ruby.
|>|>The future Ruby should handle Unicode as well as other encodings.
|
|What exactly is the "M17N feature" that you plan to add? 

Each string and regex object will be able to have information about
its encodings.  Matching, indexing etc. will be based on that
information.

|Matz:
|>Unicode 3.0 is really an improvement.  Most Japanese can accept it
|>except time and space efficiency.
|>...
|>    By using UTF-8, most of Japanese character takes 3 bytes each.  It
|>    would be 1.5 time bigger than current.  Imagine all of your text
|>    data grows 50% bigger.
|
|I agree. I'm not partial to UTF-8 either. In my earlier
|post, I recommended UCS-2, which is a two byte encoding for
|both the Western languages and the CJK languages. As far as
|DBCS Japanese goes, UCS-2 introduces no changes in storage
|or processing requirements. The same is true for the
|superset UTF-16, assuming surrogates are not required.

It doubles ASCII space though (multibyte text is often mixture of
ASCII and KANJI characters).

|In converting to UTF-16, it's the Western languages that
|would suffer a "hit" in terms of storage and processing
|time. UTF-8, accordingly, will probably remain common in
|Western end users shops for some time to come but not, I
|hope, as the internal encoding of system software.

Why not?  Although its variable length nature, I think UTF-8 is good
for internal encoding too.  E.g.

  * ASCII superset
  * no NULL (\0) in string
  * no endian problems

Plus, UTF-16 is variable length anyway (as I mentioned above, we can't
ignore surrogates).

I think it's the reason Perl and Python choose UTF-8 as their internal
encoding.  I'd choose UTF-8 too if I could stick with Unicode.

|My own experience in developing international software is
|that it is MUCH easier to work in an environment in which
|UCS-2 or UTF-16 is the internal storage norm rather than
|UTF-8. Accordingly, I seek out operating systems, databases,
|and language providers that standardize on either of these
|as their normative, internal coding.

If the following conditions can be fulfilled, it's easy to develop
I18N software using UTF-16, as you said.

  * surrogates can be ignored
  * all characters can be converted into Unicode

These conditions are often OK for many many applications.  But I can
not FORCE these conditions to ALL applications written in Ruby.

|>Using Unicode as an internal universal character
|>sets covers 98% of M17N, but I want to cover ALL of the cases, and
|>from my personal experience (Ruby Japanization), I think it's
|>efficiently possible.
|
|What is the 2% that isn't covered by Unicode's UTF-16
|encoding (which provides for about 1 mn code points, if one
|includes the surrogate facility)?

Don't take numbers literally.  It's a synonym for "almost all". ;-)

I was thinking of applications that process big character set
(e.g. Mojikyo set) which is not covered by Unicode.  I don't know
exactly how many code points it has.  But I've heard it's pretty big,
possibly consumes half of surrogate space.  And they want to process
them now.  I think they don't want to wait Unicode consortium to
assign code points for their characters.

							matz.

In This Thread