[#18436] [ANN] Ruby 1.9.1 feature freeze — "Yugui (Yuki Sonoda)" <yugui@...>
Hi all,
On Tue, Sep 2, 2008 at 3:09 PM, Yugui (Yuki Sonoda) <yugui@yugui.jp> wrote:
Michael Fellinger schrieb:
On 12/09/2008, Michael Neumann <mneumann@ntecs.de> wrote:
Hi,
Hi, Yusuke
Hi,
Ryan Davis wrote:
Dave Thomas wrote:
Jim Weirich wrote:
On Wed, Oct 8, 2008 at 3:05 AM, Ryan Davis <ryand-ruby@zenspider.com> wrote=
On Wed, Oct 08, 2008 at 09:28:22PM +0900, Austin Ziegler wrote:
2008/10/8 Paul Brannan <pbrannan@atdesk.com>:
T24gV2VkLCBPY3QgOCwgMjAwOCBhdCA0OjM4IFBNLCBQaXQgQ2FwaXRhaW4gPHBpdC5jYXBpdGFp
Trans wrote:
Hi,
Hi,
NARUSE, Yui wrote:
On Fri, Oct 3, 2008 at 12:01 AM, David Flanagan <david@davidflanagan.com> wrote:
[#18437] Class as second-generation singleton class — "David A. Black" <dblack@...>
Hi --
[#18444] [PATCH] remove timer signal after last ruby thread has died — Joe Damato <ice799@...>
Hi -
Hi,
[#18446] Global constants and other magic in 1.9 stdlib — "Michal Suchanek" <hramrach@...>
Hello
On Thu, Sep 4, 2008 at 05:01, Michal Suchanek <hramrach@centrum.cz> wrote:
[#18447] useless external functions — SASADA Koichi <ko1@...>
Hi,
[#18452] [ANN] Ruby 1.9.1 feature freeze — "Roger Pack" <rogerpack2005@...>
Would it be possible to have a few patches applied before freeze [if
Hi,
Hi,
Hi,
[#18454] WEBrick issue - HTTP/1.1 and IO objects — Brian Candler <B.Candler@...>
I am wondering if the following is a bug in WEBrick.
[#18486] Ruby 1.9 strings & character encoding — "Michael Selig" <michael.selig@...>
Firstly, I apologise if I am going over old ground here - I haven't been
Hi,
On Mon, 08 Sep 2008 19:45:36 +1000, Yukihiro Matsumoto
Hi,
On Sep 8, 2008, at 10:43 AM, NARUSE, Yui wrote:
# First off, I'm neutral to this issue
On Sep 8, 2008, at 9:06 PM, Urabe Shyouhei wrote:
In article <3119E5AB-AEC8-4FEE-B2FA-8C75482E0E9D@sun.com>,
At 18:07 08/09/10, Manfred Stienstra wrote:
In article <6.0.0.20.2.20080916184943.08a281f0@localhost>,
On 16/09/2008, Tanaka Akira <akr@fsij.org> wrote:
In article <a5d587fb0809170303x71ebde31r8adae082b82af182@mail.gmail.com>,
On Tue, 09 Sep 2008 03:43:54 +1000, NARUSE, Yui <naruse@airemix.jp> wrote:
In article <op.ug6ubske9245dp@kool>,
In article <9888DBB2-0FE8-4C5C-8EF0-02D7C30157FA@pragprog.com>,
[#18513] Make irb start a new line on EOF — "Daniel Luz" <dev@...>
Other interactive interpreters (namely `python`, `lua`, `psh`, and
[#18522] Warning for trailing comma in method declarations — Kornelius Kalnbach <murphy@...>
hello!
[#18525] Ruby for OS/2 Maintainer — "Brendan Oakley" <gentux2@...>
Hello.
[#18532] Ruby 1.9 string performance — "Michael Selig" <michael.selig@...>
I would like to submit the attached patch to string.c which substantially
[#18535] [Bug #557] Regexp does not match longest string — Wim Yedema <redmine@...>
Bug #557: Regexp does not match longest string
Wim Yedema schrieb:
2008/9/10 Wolfgang N=E1dasi-Donner <ed.odanow@wonado.de>:
Robert Klemme schrieb:
[#18572] Working on CSV's Encoding Support — James Gray <james@...>
I'm trying to get the standard CSV library ready for m17n in Ruby
On Sat, Sep 13, 2008 at 6:32 PM, James Gray <james@grayproductions.net> wrote:
On Sep 13, 2008, at 5:44 PM, Gregory Brown wrote:
On Sep 13, 2008, at 5:39 PM, James Gray wrote:
On Sep 13, 2008, at 11:55 PM, James Gray wrote:
At 00:43 08/09/15, James Gray wrote:
On Sun, 14 Sep 2008 14:48:47 +1000, James Gray <james@grayproductions.net>
On Sep 14, 2008, at 2:49 AM, Michael Selig wrote:
On Mon, 15 Sep 2008 04:51:55 +1000, James Gray <james@grayproductions.net>
On Sep 14, 2008, at 6:48 PM, Michael Selig wrote:
On Mon, 15 Sep 2008 10:45:52 +1000, James Gray <james@grayproductions.net>
On Sep 14, 2008, at 8:42 PM, Michael Selig wrote:
[#18594] [Bug #564] Regexp fails on UTF-16 & UTF-32 character encodings — Michael Selig <redmine@...>
Bug #564: Regexp fails on UTF-16 & UTF-32 character encodings
In article <48cddb5533ad_8725cd9524342@redmine.ruby-lang.org>,
On Mon, 15 Sep 2008 18:08:14 +1000, Tanaka Akira <akr@fsij.org> wrote:
[#18600] [Bug #566] String encoding error messages are inconsistent — Michael Selig <redmine@...>
Bug #566: String encoding error messages are inconsistent
[#18631] Request: File.binread (Or File.read_binary) — "Gregory Brown" <gregory.t.brown@...>
Just incase it got lost in the other thread, I'd like to recommend the
Hi,
On Wed, Sep 17, 2008 at 12:35 PM, Yukihiro Matsumoto <matz@ruby-lang.org> wrote:
On Sep 17, 2008, at 09:48 AM, Gregory Brown wrote:
On Sep 18, 2008, at 6:56 PM, Eric Hodel wrote:
[#18637] Reading non-ascii compatible files — "Michael Selig" <michael.selig@...>
Hi,
Hi,
[#18640] Character encodings - a radical suggestion — "Michael Selig" <michael.selig@...>
Hi,
On Sep 16, 2008, at 8:20 PM, Michael Selig wrote:
On Sep 16, 2008, at 8:20 PM, Michael Selig wrote:
On Wed, 17 Sep 2008 12:51:14 +1000, James Gray <james@grayproductions.net>
On Sep 16, 2008, at 11:21 PM, Michael Selig wrote:
Hi,
On 9/17/2008 3:39 PM, NARUSE, Yui wrote:
Hi,
Hi,
On Sep 17, 2008, at 9:45 AM, NARUSE, Yui wrote:
At 00:01 08/09/18, Yukihiro Matsumoto wrote:
Hi,
On Fri, 19 Sep 2008 18:24:41 +1000, Yukihiro Matsumoto
Oops, I misfired my mail reader; the following is the right one:
On Fri, 19 Sep 2008 19:52:30 +1000, Yukihiro Matsumoto
Hi,
On Sun, 21 Sep 2008 02:05:30 +1000, Yukihiro Matsumoto
Hello Michael,
On Sep 21, 2008, at 9:35 PM, Martin Duerst wrote:
On Mon, 22 Sep 2008 12:35:49 +1000, Martin Duerst <duerst@it.aoyama.ac.jp>
At 12:25 08/09/22, Michael Selig wrote:
On Sep 21, 2008, at 9:35 PM, Martin Duerst wrote:
Hi,
Hi,
----- Original Message -----
On Sep 17, 2008, at 9:32 PM, Michael Selig wrote:
On Sep 17, 2008, at 8:43 PM, James Gray wrote:
[#18698] Next design meeting — Evan Phoenix <evan@...>
Hi everyone,
[#18710] Encoding Safe Regexp.escape() — James Gray <james@...>
As part of my ongoing process to make CSV m17n savvy, I'm needing an =20
[#18750] M17N Inspect Messages — James Gray <james@...>
What is the correct way to handle inspect() with regards to M17N? Do
[#18762] [Feature #578] add method to disassemble Proc objects — Roger Pack <redmine@...>
Feature #578: add method to disassemble Proc objects
[#18813] Feature idea: Class#subclasses — Charles Oliver Nutter <charles.nutter@...>
In JRuby we have added an extension that provides a "subclasses" method
[#18815] mv trunk/include/ruby/node.h to trunk/node.h — SASADA Koichi <ko1@...>
I moved trunk/include/ruby/node.h to trunk/node.h. On 1.9, only
[#18820] miniunit added — Ryan Davis <ryand-ruby@...>
I've replaced test/unit with miniunit in order to meet the feature
SASADA Koichi wrote:
I got it.
[#18844] [Bug #592] String#rstrip sometimes strips NULLs, sometimes doesn't - encoding dependent — Michael Selig <redmine@...>
Bug #592: String#rstrip sometimes strips NULLs, sometimes doesn't - encoding dependent
[#18861] tokenizing regular expressions when passed as method params — "Seth Dillingham" <seth.dillingham@...>
Hi,
[#18866] I'm changing the PickAxe to document miniunit — Dave Thomas <dave@...>
What's the correct way to load it up:
[#18872] [RIP] Guy Decoux. — "Jean-Fran輟is Tr穗" <jftran@...>
Hello,
[#18879] Mini Unit changing exceptions — Jim Weirich <jim.weirich@...>
Why does mini-unit change the exception in the test below?
On Sep 25, 2008, at 3:13 AM, Ryan Davis wrote:
[#18888] Re: [ruby-cvs:26761] Ruby:r19543 (trunk): Not a typo. The name is better plural. Better English and more consistent with the other assertions. — Nobuyoshi Nakada <nobu@...>
Hi,
[#18899] refute_{equal, match, nil, same} is not useful — Fujioka <fuj@...>
Hi,
On Thu, Sep 25, 2008 at 8:15 AM, Fujioka <fuj@rabbix.jp> wrote:
On Tue, Oct 7, 2008 at 10:40 PM, Ryan Davis <ryand-ruby@zenspider.com> wrote:
>I can actually see Ryan's point of saying that "refute_equal a, b"
Related to this:
On Wed, Oct 8, 2008 at 2:48 AM, Martin Duerst <duerst@it.aoyama.ac.jp>wrote:
2008/10/8 Eric Mahurin :
On Wed, Oct 8, 2008 at 5:08 PM, Jean-Fran=E7ois Tr=E2n
[#18905] output format of miniunit — "Yusuke ENDOH" <mame@...>
Hi,
Hi,
[#18931] test/testunit and miniunit — Tanaka Akira <akr@...>
Currently test-all exits prematurely.
[#18934] [ANN] delay of releasing 1.9.0-5 — "Yugui (Yuki Sonoda)" <yugui@...>
Hi,
[#18937] A stupid question... — Dave Thomas <dave@...>
Just what was wrong with Test::Unit? Sure, it was slightly bloated.
> -----Original Message-----
On Sun, Sep 28, 2008 at 9:10 PM, Trans <transfire@gmail.com> wrote:
On Mon, Sep 29, 2008 at 1:20 AM, Meinrad Recheis
On Sep 28, 2008, at 3:19 PM, hemant wrote:
2008/9/28 Trans <transfire@gmail.com>:
[#18944] [RCR] $ABOUT.ts — _why <why@...>
I don't want to be indelicate and we can address this some other
[#18985] Encodings::default_internal patch — "Michael Selig" <michael.selig@...>
Hi,
On Sep 27, 2008, at 2:28 AM, Michael Selig wrote:
On Sun, 28 Sep 2008 02:02:57 +1000, James Gray <james@grayproductions.net>
On Sep 27, 2008, at 8:56 PM, Michael Selig wrote:
[#18986] miniunit problems and release of Ruby 1.9.0-5 — "Yugui (Yuki Sonoda)" <yugui@...>
Hi,
Hi,
Hi,
Hi,
[#19043] Ruby is "stealing" names from operating system API:s — "Johan Holmberg" <johan556@...>
Hi!
Hi,
[ruby-core:18770] Re: Character encodings - a radical suggestion
At 01:05 08/09/21, Yukihiro Matsumoto wrote:
>Hi,
>
>In message "Re: [ruby-core:18751] Re: Character encodings - a radical
>suggestion"
> on Sat, 20 Sep 2008 10:00:24 +0900, "Michael Selig"
><michael.selig@fs.com.au> writes:
>
>|Perhaps we need to go back to basics with this discussion. As a mere
>|English speaker, I do not fully understand the issues that are faced by
>|Japanese and other encodings. What I have gathered from this discussion is
>|(please tell me if I am wrong):
>|
>|- There are characters that Ruby needs to support which cannot be uniquely
>|mapped to Unicode
>
>Yes, even though they are minor.
>
>|- In fact there are entire character sets that we want to support in Ruby
>|that are not supported in Unicode
>
>Yes, I know two of them: Mojikyo, which refusing character
>unification. The character set contains 170,000 characters.
Just for general information, this doesn't specifically refer to
CJK unification (i.e. unification of the same ideograph from
China, Japan, Korea, and so on) but is more about general glyph
(dis)unification. This means that minor differences in how exactly
to write a character are given separate codepoints. This may help
in historical research (some variants are more used by some writers
or in some centuries than others,...), but in general isn't helpful,
on the contrary, it will make data processing more difficult.
However, even in daily life, there is some need to distinguish
some (ideographic) glyph variants in certain cases. For this,
Unicode contains variation selectors (U+FE00-FE0F and U+E0100-E01EF).
These are used after a base character, based on a registration in the
Ideographic Variation Database (http://www.unicode.org/ivd/).
There is currently only the Adobe-Japan1 collection registered, see
http://www.unicode.org/ivd/data/2007-12-14/IVD_Charts.pdf.
For glyph variants, it would be no problem (although quite some work,
of course) for Mojikyo to register them as Ideographic Variations
in this database. This would make all these Variations usable
in Unicode.
From http://www.mojikyo.com/info/konjaku/index.html, we can also
see the following:
Mojikyo Unicode
漢字 (kanji) 150,366 A bit more than double of what
Unicode has. In my guess mostly
glyph variants, but there sure are
a few not yet encoded characters, too.
非漢字 (non-kanji) 2,256 Kana variants could be encoded
with variation selectors
梵字(bonji) 1,875 Don't know, but because these are
of Indic origin, my guess is that
Unicode would use a different encoding
model with much less characters
甲骨文字(oracle bone) 3,364 space tentatively allocated (U+32000-327FF),
(http://www.internationalscientific.org/CharacterASP/why_study.aspx#oracle)
see http://unicode.org/roadmaps/tip/
西夏文字/Tangut 6,000 under consideration for encoding
水族文字 145 did not find any info, but I'm
quite sure a well-written proposal
would be accepted
篆書(seal characters) 10,969 Very old style, but most of them
(http://www.internationalscientific.org/CharacterASP/why_study.aspx#seal)
with clear equivalents to modern
ideographs. Still used on seals.
To unify or not to unify is the
big question.
It seems that Mojikyo is currently handled from two sides: www.mojikyo.org
for the non-commercial side, and www.mojikyo.com for the commercial side
(with various products published by Kinokuniya, a big Japanese publisher).
That leads to somewhat complicated usage conditions (you can use some
fonts for free for yourself, but have to pay if you use them in a paper
you publish,...), not only for the fonts (would be quite understandable)
but also for some of the data.
>At the
>time I first heard that number was huge, but Unicode is approaching
>pretty close (it now has more than 100,000 characters).
Conclusion: If the Mojikyo people wanted, they could get most if
not all of their stuff into Unicode in one way or another. But
similar to all other work of serious character encoding, it
would be a lot of work.
>GB18030, defined by Chinese government. I don't know the detail, but
>I've heard it officially contains Unicode as its subset. But encoding
>scheme for GB18030 is upto 4bytes per codepoint, so I am not sure how
>it can holds 21bit Unicode codepoint in it.
4 bytes raw would be 32 bits, so that should be enough to hold 21 bits.
Because some characters use only one or two bytes, the overall code space
is smaller, about 1,600,000 codepoints. This is still larger than Unicode
(around 1,100,000 codepoints), but the difference is currently not used
at all.
For more details, please see
http://www.icu-project.org/docs/papers/unicode-gb18030-faq.html
and http://unicode.org/faq/han_cjk.html#23.
(I was of the impression that GB 18030 contains a few characters
similar to the Japanese せ゜ and friends in JIS X 0213, but I haven't
found any such information anymore, so it may not be true).
So I don't think there is any real problem for GB 18030 and Unicode.
>|- There are ambiguous characters in some character sets - same code for
>|different characters
>
>Yes.
>
>|I think it would be a benefit if we all got to understand a bit more:
>|
>|- How the character ambiguity (eg: Yen/ backslash) issue is handled at the
>|moment - generally, not just with Ruby. ie: how do you know that a printer
>|or screen is going to show the right character?
>
>Either avoiding conversion (operation based on bytes), or selecting
>proper encoding scheme (out of many very similar encodings, such as
>Shift_JIS, CP932, Windows-31J for example). Conversion table from
>unicode.org is carefully designed to ensure roundtrip, although that
>is the very reason we have so many similar encoding. If we can choose
>(or negotiate) to use same conversion table at both ends, it is
>unlikely to have mojibake problems.
Yes, roundtrip is easy if you use the same conversion tables, but
unfortunately, the major vendors (Microsoft, Apple, IBM,...) messed
up with minor variations (usually just a few codepoints out of
several thousand).
As for how you know that a printer or screen is going to show the
right character, you simply don't, in particular e.g. on the Web.
0x5C will show as a Yen sign on Japanese systems with fonts tweaked
for Japanese, but will show as a backslash otherwise. Japanese
IT professionals have to just learn about this.
>|- How the various "non-ascii compatible" encodings are used in practice.
>|eg: it is my understanding that UTF-7 is really only used in email, and
>|that it would be straightforward to immediately transcode it to/from UTF-8
>|in an POP/IMAP library, so UTF-7 could be avoided completely as an
>|"internal" encoding in Ruby. It's as if were were treating UTF-7 like
>|base64 - just a transformation of a "real" encoding. (In fact UTF-16 & 32
>|could be considered the same sort of thing, except they may be used more
>|widely.)
>
>UTF-{16,32}{BE,LE} are non-ascii compatible, but they are safe to
>convert into UTF-8 since their difference only lies in encoding
>scheme. They represent same character set anyway. ISO-2022 is used
>often in mails and web.
That would be iso-2022-JP. ISO 2022 is a standard that defines a set
of tools to create encodings, not an encoding in and by itself.
Regards, Martin.
>The situation is little bit more complicated,
>but basically it can be converted into Unicode as well (with slight
>risk of yen sign problem). You can ignore UTF-7.
>
>|- How a Japanese programmer would handle the situation of dealing with a
>|combination of a Japanese non-Unicode compatible character set, and say a
>|UTF-8 encoding which included non-ascii characters, and non-Japanese ones.
>|ie: Is there a reasonable alternative to encoding both to Unicode &
>|somehow dealing with the "difficult characters" as special cases?
>
>Unicode is getting better each day. So it now covers almost all
>day-to-day problems. Some cellphone problems are covered by using
>private area.
>
> matz.
#-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp