[#19064] Fwd: [ruby-dev:36523] Re: Encoding.default_internal — Martin Duerst <duerst@...>
There has been some disconnect lately between ruby-dev and ruby-core
On Oct 1, 2008, at 5:09 AM, Martin Duerst wrote:
On Wed, Oct 1, 2008 at 9:46 AM, James Gray <james@grayproductions.net> wrote:
[#19075] Request For Removal: No Operator Concatenation — James Gray <james@...>
I'm disappointed that Ruby still supports this goofy syntax:
On Wed, Oct 1, 2008 at 1:58 PM, James Gray <james@grayproductions.net> wrote:
On Wed, Oct 1, 2008 at 1:08 PM, Gregory Brown <gregory.t.brown@gmail.com> wrote:
On Oct 1, 2008, at 1:15 PM, Jim Freeze wrote:
On Wed, Oct 1, 2008 at 1:29 PM, James Gray <james@grayproductions.net> wrote:
On Oct 1, 2008, at 1:37 PM, Jim Freeze wrote:
On Oct 1, 2008, at 11:42 AM, James Gray wrote:
On Wed, Oct 1, 2008 at 2:45 PM, Eric Hodel <drbrain@segment7.net> wrote:
On Wed, Oct 1, 2008 at 2:10 PM, Gregory Brown <gregory.t.brown@gmail.com> wrote:
On Oct 1, 2008, at 2:17 PM, Jim Freeze wrote:
On Wed, Oct 1, 2008 at 2:25 PM, James Gray <james@grayproductions.net> wrote:
On Oct 1, 2008, at 12:30 PM, Jim Freeze wrote:
Hi,
On Oct 1, 2008, at 10:33 PM, Yusuke ENDOH wrote:
[#19127] Autoload and class definition — Tomas Matousek <Tomas.Matousek@...>
I've found an interesting corner case of autoload behavior, which I think i=
[#19132] [Feature #615] "with" operator — Lavir the Whiolet <redmine@...>
Feature #615: "with" operator
Hi,
On Mon, Oct 06, 2008 at 10:46:49AM +0900, Nobuyoshi Nakada wrote:
On Mon, Oct 06, 2008 at 10:56:23PM +0900, Paul Brannan wrote:
On Mon, Oct 6, 2008 at 3:34 PM, Trans <transfire@gmail.com> wrote:
Hi --
On Tue, Oct 07, 2008 at 05:47:23AM +0900, David A. Black wrote:
Hi --
[#19168] [Bug:1.9] rubygems depend on test/unit/ui/console/testrunner — "Yusuke ENDOH" <mame@...>
Hi,
On Oct 7, 2008, at 07:43 AM, Yusuke ENDOH wrote:
Eric Hodel wrote:
[#19225] Module.freeze vs Object.freeze — Curt Hagenlocher <curth@...>
What's the difference between Module.freeze and Object.freeze? They seem t=
[#19242] Regexp Order Matters in 1.9 — James Gray <james@...>
I'm just curious, why does this work:
[#19250] default_internal encoding — Dave Thomas <dave@...>
I'm documenting default_internal for the PickAxe, and have a couple of
Hi,
On Oct 9, 2008, at 6:06 PM, Michael Selig wrote:
On Fri, 10 Oct 2008 13:09:31 +1100, James Gray <james@grayproductions.net>
On Wed, Oct 8, 2008 at 3:52 PM, Paul Brannan <pbrannan / atdesk.com> wrote:
On Fri, Oct 10, 2008 at 10:30:31AM +0900, Michael Selig wrote:
Paul Brannan wrote:
Charles Oliver Nutter wrote:
[#19294] [Bug #634] Time parsing works in 1.8 but not 1.9 — Aaron Patterson <redmine@...>
Bug #634: Time parsing works in 1.8 but not 1.9
Issue #634 has been updated by tadayoshi funaba.
[#19298] [Feature #639] New String#encode_internal method — Michael Selig <redmine@...>
Feature #639: New String#encode_internal method
Hi,
[#19304] 1.9, encoding & win32 wide char support — Lloyd Hilaiel <lloyd@...>
hello,
[#19315] [Feature #643] __DIR__ — Thomas Sawyer <redmine@...>
Feature #643: __DIR__
[#19332] Can I confirm a change to source file encoding — Dave Thomas <dave@...>
A month ago, if I had
[#19342] [Bug #649] Memory leak in a array assignment? — Henri Suur-Inkeroinen <redmine@...>
Bug #649: Memory leak in a array assignment?
On Tue, Feb 3, 2009 at 8:44 PM, Brent Roman <brent@mbari.org> wrote:
[#19343] Yet another block semantic/syntax question — "David A. Black" <dblack@...>
Hi --
[#19350] Net::HTTP.post_form bug : can't post form to correct uri which contains QueryString(QueryString part are lost) and revise — Klesh <kleshwong@...>
Hi,
You are trying to use GET-style query params instead of POSTing the
Dear Matt
From my experience, it's simply easier to process requests that way,
Thanks,
2008/10/17 Matt Todd <chiology@gmail.com>:
On Oct 19, 2008, at 8:55 AM, mathew wrote:
[#19373] capture_io in minitest — Tanaka Akira <akr@...>
capture_io changes $stdout.fileno.
[#19378] Constant names in 1.9 — Dave Thomas <dave@...>
When Ruby makes the tIDENTIFIER/tCONSTANT test, it looks to see if the =20=
Hi,
On Oct 18, 2008, at 8:32 AM, Yukihiro Matsumoto wrote:
Hi,
[#19385] [Bug #657] Thread.new { fork } — "James M. Lawrence" <redmine@...>
Bug #657: Thread.new { fork }
[#19388] [Bug #663] Benchmark.measure outputs different result when executed using command line "ruby -e ..." — Artem Vorozhtsov <redmine@...>
Bug #663: Benchmark.measure outputs different result when executed using command line "ruby -e ..."
[#19397] [Feature #666] Enumerable::to_hash — Marc-Andre Lafortune <redmine@...>
Feature #666: Enumerable::to_hash
Issue #666 has been updated by Yukihiro Matsumoto.
Hi,
Thank you for this explanation. If I understand correctly, you want methods
Hi,
Thank you for your response
On Wed, 22 Apr 2009 05:45:06 +0900
[#19410] rb_errinfo() vs rb_rubylevel_errinfo() — Paul Brannan <pbrannan@...>
What is the difference between these two functions?
Hi,
On Wed, Oct 22, 2008 at 12:34:19AM +0900, SASADA Koichi wrote:
[#19413] Is this expected, or should I report it? — Dave Thomas <dave@...>
Given
[#19422] Now that lambda has more powerful arguments... — Dave Thomas <dave@...>
is there anything that
Dave Thomas schrieb:
On Wed, Oct 22, 2008 at 04:01:45AM +0900, Dave Thomas wrote:
Hi --
On Wed, Oct 22, 2008 at 04:38:19AM +0900, David A. Black wrote:
Hi --
On Oct 21, 2008, at 4:24 PM, David A. Black wrote:
Hi --
[#19446] confused by this catch table — Paul Brannan <pbrannan@...>
irb(main):001:0> require 'internal/proc'
[#19458] Should Method@instance_methods reveal protected methods? — Dave Thomas <dave@...>
The RDoc says it just returns public methods, but
[#19465] [Bug #680] csv.rb: CSV.parse is too late when encoding is mismatch — Takeyuki Fujioka <redmine@...>
Bug #680: csv.rb: CSV.parse is too late when encoding is mismatch
Hi,
A default for the source encoding has been discussed quite a long
Hi,
Hi,
Hi,
Hi,
On Sun, 26 Oct 2008 17:26:32 +1100, Nobuyoshi Nakada <nobu@ruby-lang.org>
Hi,
On Sun, 26 Oct 2008 23:34:26 +1100, Nobuyoshi Nakada <nobu@ruby-lang.org>
Hi,
On Mon, 27 Oct 2008 16:07:54 +1100, Nobuyoshi Nakada <nobu@ruby-lang.org>
Hi,
On Mon, 27 Oct 2008 17:27:57 +1100, Nobuyoshi Nakada <nobu@ruby-lang.org>
Hi,
On Mon, 27 Oct 2008 20:55:32 +1100, Nobuyoshi Nakada <nobu@ruby-lang.org>
Hi,
On Oct 27, 2008, at 7:07 AM, Nobuyoshi Nakada wrote:
Hi,
On Oct 24, 2008, at 1:52 AM, Martin Duerst wrote:
On Oct 24, 2008, at 8:06 AM, James Gray wrote:
On Sat, 25 Oct 2008 00:07:13 +1100, James Gray <james@grayproductions.net>
On Oct 26, 2008, at 6:48 PM, Michael Selig wrote:
[#19468] [Bug:1.9] failures of test/minitest — Nobuyoshi Nakada <nobu@...>
Hi,
[#19478] Ruby 1.8.7 Throwing "Too many open files" Exception lately??? — "C.E. Thornton" <admin@...>
Group,
[#19487] [ANN] Sipper 1.1.3 Released — "Nasir Khan" <rubylearner@...>
1.1.3 of SIPr pronounced as Sipper has been released earlier this month.
[#19504] Is the stabby proc gone? broken? — "David A. Black" <dblack@...>
Hi --
[#19523] Too Many Files Error -- Test Case Produced. — "C.E. Thornton" <admin@...>
Core,
[#19555] Managing 1.9 threads in extensions — Dave Thomas <dave@...>
I'm trying to pin down the rules for folks who write extensions for
[#19561] Was there a feature freeze on October 25th? — Dave Thomas <dave@...>
Curious authors want to know... :)
[#19564] Ruby 1.9.1 preview1 is out — "Yugui (Yuki Sonoda)" <yugui@...>
Hi all,
[#19566] GC thought — "Roger Pack" <roger.pack@...>
Here is a recent patch I've been experimenting with--for any advice. [1]
On Tue, 28 Oct 2008 17:02:17 +0900, Roger Pack wrote:
> Letting the program continue execution during the mark phase could cause
On Wed, Oct 29, 2008 at 01:04:52AM +0900, Roger Pack wrote:
2008/10/28 Paul Brannan <pbrannan@atdesk.com>:
Robert Klemme wrote:
Robert Klemme wrote:
[#19578] [Bug #691] Time::zone_utc? does not follow rfc2822 — Chun Wang <redmine@...>
Bug #691: Time::zone_utc? does not follow rfc2822
[#19583] [Bug #694] eof? call on a pty IO object causes application to exit — Dave Thomas <redmine@...>
Bug #694: eof? call on a pty IO object causes application to exit
[#19590] [Feature #695] More flexibility when combining ASCII-8BIT strings with other encodings — Michael Selig <redmine@...>
Feature #695: More flexibility when combining ASCII-8BIT strings with other encodings
Hi,
At 07:14 08/10/31, Michael Selig wrote:
Hi
[#19599] Future of Continuations — "r. schempp" <ruben.schempp@...>
Hi,
On Wed, Oct 29, 2008 at 06:54:06PM +0900, r. schempp wrote:
r. schempp schrieb:
[#19604] test failure in r20022 — Mike Stok <mike@...>
I noticed this failure in my morning build of ruby trunk on my laptop:
[#19610] [Bug 1.9] gem_prelude.rb always require rubygems — Yukihiro Matsumoto <matz@...>
Hi,
[#19618] Result of backticks — Jim Deville <jdeville@...>
`echo disc world` returns "disc world\n"
[#19634] performance issues with --enable-pthread on Solaris. — Paul van den Bogaard <Paul.Vandenbogaard@...>
Introduction
[#19660] Odd TypeError in inject (1.9.1 preview 1) — "David A. Black" <dblack@...>
Hi --
On Fri, Oct 31, 2008 at 5:20 AM, David A. Black <dblack@rubypal.com> wrote:
Hi,
On Fri, Oct 31, 2008 at 8:40 AM, Nobuyoshi Nakada <nobu@ruby-lang.org>wrote:
[#19668] [Bug #703] string output duplication occurs if the same file descriptor written to in different threads — Roger Pack <redmine@...>
Bug #703: string output duplication occurs if the same file descriptor written to in different threads
Hi,
[ruby-core:19064] Fwd: [ruby-dev:36523] Re: Encoding.default_internal
There has been some disconnect lately between ruby-dev and ruby-core
about default_internal. For a while, I planned to help out a bit,
but didn't get around to it until today.
Basically, what happened was that in [ruby-dev:36523], Matz made a new,
far-reaching proposal re. default_internal. Since then, various details
(as well as the proposal itself) have been discussed extensively on
ruby-dev.
I'll try to translate the salient points of the proposal below.
Please try to read the whole mail before responding; the discussion
has moved ahead considerably on details that are not covered here.
>Date: Thu, 25 Sep 2008 00:49:10 +0900
>From: Yukihiro Matsumoto <matz@ruby-lang.org>
>Subject: [ruby-dev:36523] Re: Encoding.default_internal のためのパッチ
>To: ruby-dev@ruby-lang.org (ruby developers list)
>まつもと ゆきひろです
>
>In message "Re: [ruby-dev:36517] Re: Encoding.default_internal のためのパッチ"
> on Wed, 24 Sep 2008 21:23:48 +0900, "NARUSE, Yui" <naruse@airemix.jp> writes:
>
>|Martin Duerst wrote:
>|> [ruby-core:18774] に Michael Selig から Encoding::default_internal
>|> の提案がありました。
[Martin:]
In ruby-core:18774, there was a proposal from Michael Selig re.
Encoding::default_internal.
>|まつもとさんに昼ごろ聞いたところ、まだ思案中のようです。
[Yui:]
I asked matz around noon, but he was still thinking about it.
>|結局 default_internal ってのは multi-locale を指向する
>|Ruby からすると逆行する存在なので、難しいところではあります。
[Yui:]
Because default_internal goes against Ruby's basic multi-locale
orientation, it may be difficult [to introduce default_internal].
[from here on, Matz; comments in [] by the translator]
>今日一日考えて、導入することにしました。
I thought about this for a full day, and decided to introduce it.
>以下のような仕様を考えています。
I'm thinking about the following spec:
>* default_internalはIOでinternalを指定しなかった場合のエンコー
> ディングである
default_internal is used as the internal encoding for an IO
if the internal incoding is not specified.
>* これが指定されている時IOからの入力は(バイナリでない限り)、
> このエンコーディングを持つ(必要なら変換する)。
If default_internal is used, unless it's binary, the input from
an IO will carry the encoding of default_encoding (transcoded
if necessary).
>* default_internalも-Eオプションで指定する。指定書式は
default_internal can be specified with the -E option, as follows:
> -E iso-2022-jp:utf-8
>
> のように「:」で区切ったものとする。default_externalとして
> localeを指定したい時には、空のdefault_externalを指定するた
> めに
Two encodings are separated by a colon. In the case of using the
locale as default_external, default_external can be left empty:
> -E :utf-8
>
> のように書く。逆にdefault_internalへの変換を抑制するために
> は、空のdefault_internalを指定して
If you want to avoid using default_internal, you can leave it blank:
> -E iso-2022-jp:
>
> のように書く。旧来の「-E euc-jp」は「-E euc-jp:」と同じ意
> 味になる。
The current -E euc-jp and -E euc-jp: have the same meaning.
>* (重要)なにも指定しない場合のdefault_externalはlocale。
> default_internalは「UTF-8」。
IMPORTANT: The default for default_external is the locale.
The default for default_internal is UTF-8.
>* 新設の-Lオプションを指定するとdefault_externalはlocale、
> default_internalは空になる。
With a new option -L, default_external is locale, and
default_internal is empty [i.e. no conversion].
>* (未定)Dirなどが返すパス名はdefault_externalから
> default_internalへの変換が行われる。ただし、変換エラーが発
> 生した場合、文字が化けるのは避けたい(バイナリで返すか)。
(Undecided) Data such as path names returned from Dir will
be converted from default_external to default_internal.
However, if there is a conversion error, to avoid 'mojibake',
the data will be returned as binary.
>背景
Background
>ここ数週間真剣に考えてきましたが、内部コードにUnicodeを使わな
>い理由はかなり減っていると思います。
I was thinking about this seriously for several weeks, but
I think that the reasons for not using Unicode as an internal
code have decreased considerably.
>各種変換テーブルはラウン
>ドトリップするように設計されていますし、各種コストも開発当初
>(8年以上前)はともかく現在では無視できるレベルになっています。
The various conversion tables are made so that round-trip
conversion is possible, and the various costs, compared
to when [the current multilingual architecture] was developped,
have come to a negligible level.
>そこで、内部コードUnicode(UTF-8)を支援しつつ、必要であれば無
>変換テキスト処理やより広い文字集合(の実験)も可能である仕様を
>模索しました。
Therefore, I was considering a specification that would support
Unicode (UTF-8) as an internal code, while making possible
text processing without conversion and (experiments with)
wider character sets.
>結果、デフォルトでは外部エンコーディングをロケー
>ルから取得、内部エンコーディングはUTF-8としました。
As a result, I made the defaults for external encodings
dependent on the locale, and the default for internal
incodings UTF-8.
>しかし、ロ
>ケールに従ったテキストデータを変換なしに処理したいニーズもそ
>れなりにあるでしょうから(特に書き捨てのプログラム)、そのため
>に「-L」オプションを新設しました。
However, (especially for "write and forget" programs), there
should be a considerable need for processing text data without
conversion, and for that, I introduced the -L option.
>Unicode以外のエンコーディン
>グを内部エンコーディングに使いたい場合には明示的に-Eを使えば
>よいわけです。
In case you want to use a non-Unicode encoding as the internal encoding,
you can use -E explicitly.
>この結果、各種ライブラリは
As a result, each library
> * 基本UTF-8を返せばよい、入力もUTF-8を期待
- basically can return UTF-8, and can expect UTF-8 as input
> * より親切なライブラリはdefault_internalで返す。入力は
> encodingを見て対処
- more friendly libraries will return default_internal,
and will check the encoding on input
>という二段階対応ができます。段階的に前者(レベル1)から、後者
>(レベル2)に移行すればよいのではないでしょうか。
This will allow a two-step approach. We can move step-by-step
from the former (level 1) to the later (level 2).
>現在では基本Unicodeで構わないとは思いますが、パス名については
>変換に伴うデータロスでファイルが見つけられない事態を避けるこ
>とを考えなければなりません。
Currently, I don't think there should be problems with using
Unicode as a base, but we have to consider how to avoid not
finding a file when there is data loss due to a conversion
of a path name.
>上では「未定」と書いていますが、
>「変換に失敗した時にはバイナリ」というのが現実的な対処ではな
>いかと思います。
I wrote "undefined" above, but I think "use binary if
conversion fails" may be a realistic approch.
>もちろんこれらの変更は本日のfeature freezeには間に合いません
>が、悲鳴をあげたものだと見なしてください。
Of course, this change doesn't meet today's feature freeze
deadline, but please consider this as a request for a delay.
>できるだけ早急に
>1.9.1に取り込もうと思います。
I will try to integrate this as quickly as possible into 1.9.1.
>急いで考えたものなので見落とし
>があるかもしれません。
Because I was thinking about this in a hurry, there may be
some things that I overlooked.
>その場合は遠慮なく指摘してください。
In that case, please point this out without hesitation.
>これにともない、UTF-{16,32}対応はencから外した方がよいかもし
>れないと思うようになりました。
In connection with this, I started to think it might be a good
idea to remove UTF-{16,32} from enc.
[That proposal was later abandoned when U. Nakamura brought
up a case where he had to lightly process megabytes of UTF-16LE
data, where conversion, if used, extended processing time by
a factor of two.]
>これらはUTF-8とは符号化方式の
>違いしかありません。
They only differ from UTF-8 in terms of encoding scheme.
>まったくデータロスなしに変換できる以上、
>対応するメリットは大きくなく、混乱のデメリット(たとえばCSVラ
>イブラリがUTF-16対応に苦労した)の方が大きそうです。
They can be converted without any data loss, and therefore
the merrit of dealing with them isn't big compared to the
demerit of confusion (e.g. how the CSV library has had
problems dealing with UTF-16).
> 直前で申し訳ない
Sorry [for sending this out] just before [the feature freese]
> まつもと ゆきひろ /:|)
Matz
[Regards, Martin.]
#-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp