[#19064] Fwd: [ruby-dev:36523] Re: Encoding.default_internal — Martin Duerst <duerst@...>
There has been some disconnect lately between ruby-dev and ruby-core
On Oct 1, 2008, at 5:09 AM, Martin Duerst wrote:
On Wed, Oct 1, 2008 at 9:46 AM, James Gray <james@grayproductions.net> wrote:
[#19075] Request For Removal: No Operator Concatenation — James Gray <james@...>
I'm disappointed that Ruby still supports this goofy syntax:
On Wed, Oct 1, 2008 at 1:58 PM, James Gray <james@grayproductions.net> wrote:
On Wed, Oct 1, 2008 at 1:08 PM, Gregory Brown <gregory.t.brown@gmail.com> wrote:
On Oct 1, 2008, at 1:15 PM, Jim Freeze wrote:
On Wed, Oct 1, 2008 at 1:29 PM, James Gray <james@grayproductions.net> wrote:
On Oct 1, 2008, at 1:37 PM, Jim Freeze wrote:
On Oct 1, 2008, at 11:42 AM, James Gray wrote:
On Wed, Oct 1, 2008 at 2:45 PM, Eric Hodel <drbrain@segment7.net> wrote:
On Wed, Oct 1, 2008 at 2:10 PM, Gregory Brown <gregory.t.brown@gmail.com> wrote:
On Oct 1, 2008, at 2:17 PM, Jim Freeze wrote:
On Wed, Oct 1, 2008 at 2:25 PM, James Gray <james@grayproductions.net> wrote:
On Oct 1, 2008, at 12:30 PM, Jim Freeze wrote:
Hi,
On Oct 1, 2008, at 10:33 PM, Yusuke ENDOH wrote:
[#19127] Autoload and class definition — Tomas Matousek <Tomas.Matousek@...>
I've found an interesting corner case of autoload behavior, which I think i=
[#19132] [Feature #615] "with" operator — Lavir the Whiolet <redmine@...>
Feature #615: "with" operator
Hi,
On Mon, Oct 06, 2008 at 10:46:49AM +0900, Nobuyoshi Nakada wrote:
On Mon, Oct 06, 2008 at 10:56:23PM +0900, Paul Brannan wrote:
On Mon, Oct 6, 2008 at 3:34 PM, Trans <transfire@gmail.com> wrote:
Hi --
On Tue, Oct 07, 2008 at 05:47:23AM +0900, David A. Black wrote:
Hi --
[#19168] [Bug:1.9] rubygems depend on test/unit/ui/console/testrunner — "Yusuke ENDOH" <mame@...>
Hi,
On Oct 7, 2008, at 07:43 AM, Yusuke ENDOH wrote:
Eric Hodel wrote:
[#19225] Module.freeze vs Object.freeze — Curt Hagenlocher <curth@...>
What's the difference between Module.freeze and Object.freeze? They seem t=
[#19242] Regexp Order Matters in 1.9 — James Gray <james@...>
I'm just curious, why does this work:
[#19250] default_internal encoding — Dave Thomas <dave@...>
I'm documenting default_internal for the PickAxe, and have a couple of
Hi,
On Oct 9, 2008, at 6:06 PM, Michael Selig wrote:
On Fri, 10 Oct 2008 13:09:31 +1100, James Gray <james@grayproductions.net>
On Wed, Oct 8, 2008 at 3:52 PM, Paul Brannan <pbrannan / atdesk.com> wrote:
On Fri, Oct 10, 2008 at 10:30:31AM +0900, Michael Selig wrote:
Paul Brannan wrote:
Charles Oliver Nutter wrote:
[#19294] [Bug #634] Time parsing works in 1.8 but not 1.9 — Aaron Patterson <redmine@...>
Bug #634: Time parsing works in 1.8 but not 1.9
Issue #634 has been updated by tadayoshi funaba.
[#19298] [Feature #639] New String#encode_internal method — Michael Selig <redmine@...>
Feature #639: New String#encode_internal method
Hi,
[#19304] 1.9, encoding & win32 wide char support — Lloyd Hilaiel <lloyd@...>
hello,
[#19315] [Feature #643] __DIR__ — Thomas Sawyer <redmine@...>
Feature #643: __DIR__
[#19332] Can I confirm a change to source file encoding — Dave Thomas <dave@...>
A month ago, if I had
[#19342] [Bug #649] Memory leak in a array assignment? — Henri Suur-Inkeroinen <redmine@...>
Bug #649: Memory leak in a array assignment?
On Tue, Feb 3, 2009 at 8:44 PM, Brent Roman <brent@mbari.org> wrote:
[#19343] Yet another block semantic/syntax question — "David A. Black" <dblack@...>
Hi --
[#19350] Net::HTTP.post_form bug : can't post form to correct uri which contains QueryString(QueryString part are lost) and revise — Klesh <kleshwong@...>
Hi,
You are trying to use GET-style query params instead of POSTing the
Dear Matt
From my experience, it's simply easier to process requests that way,
Thanks,
2008/10/17 Matt Todd <chiology@gmail.com>:
On Oct 19, 2008, at 8:55 AM, mathew wrote:
[#19373] capture_io in minitest — Tanaka Akira <akr@...>
capture_io changes $stdout.fileno.
[#19378] Constant names in 1.9 — Dave Thomas <dave@...>
When Ruby makes the tIDENTIFIER/tCONSTANT test, it looks to see if the =20=
Hi,
On Oct 18, 2008, at 8:32 AM, Yukihiro Matsumoto wrote:
Hi,
[#19385] [Bug #657] Thread.new { fork } — "James M. Lawrence" <redmine@...>
Bug #657: Thread.new { fork }
[#19388] [Bug #663] Benchmark.measure outputs different result when executed using command line "ruby -e ..." — Artem Vorozhtsov <redmine@...>
Bug #663: Benchmark.measure outputs different result when executed using command line "ruby -e ..."
[#19397] [Feature #666] Enumerable::to_hash — Marc-Andre Lafortune <redmine@...>
Feature #666: Enumerable::to_hash
Issue #666 has been updated by Yukihiro Matsumoto.
Hi,
Thank you for this explanation. If I understand correctly, you want methods
Hi,
Thank you for your response
On Wed, 22 Apr 2009 05:45:06 +0900
[#19410] rb_errinfo() vs rb_rubylevel_errinfo() — Paul Brannan <pbrannan@...>
What is the difference between these two functions?
Hi,
On Wed, Oct 22, 2008 at 12:34:19AM +0900, SASADA Koichi wrote:
[#19413] Is this expected, or should I report it? — Dave Thomas <dave@...>
Given
[#19422] Now that lambda has more powerful arguments... — Dave Thomas <dave@...>
is there anything that
Dave Thomas schrieb:
On Wed, Oct 22, 2008 at 04:01:45AM +0900, Dave Thomas wrote:
Hi --
On Wed, Oct 22, 2008 at 04:38:19AM +0900, David A. Black wrote:
Hi --
On Oct 21, 2008, at 4:24 PM, David A. Black wrote:
Hi --
[#19446] confused by this catch table — Paul Brannan <pbrannan@...>
irb(main):001:0> require 'internal/proc'
[#19458] Should Method@instance_methods reveal protected methods? — Dave Thomas <dave@...>
The RDoc says it just returns public methods, but
[#19465] [Bug #680] csv.rb: CSV.parse is too late when encoding is mismatch — Takeyuki Fujioka <redmine@...>
Bug #680: csv.rb: CSV.parse is too late when encoding is mismatch
Hi,
A default for the source encoding has been discussed quite a long
Hi,
Hi,
Hi,
Hi,
On Sun, 26 Oct 2008 17:26:32 +1100, Nobuyoshi Nakada <nobu@ruby-lang.org>
Hi,
On Sun, 26 Oct 2008 23:34:26 +1100, Nobuyoshi Nakada <nobu@ruby-lang.org>
Hi,
On Mon, 27 Oct 2008 16:07:54 +1100, Nobuyoshi Nakada <nobu@ruby-lang.org>
Hi,
On Mon, 27 Oct 2008 17:27:57 +1100, Nobuyoshi Nakada <nobu@ruby-lang.org>
Hi,
On Mon, 27 Oct 2008 20:55:32 +1100, Nobuyoshi Nakada <nobu@ruby-lang.org>
Hi,
On Oct 27, 2008, at 7:07 AM, Nobuyoshi Nakada wrote:
Hi,
On Oct 24, 2008, at 1:52 AM, Martin Duerst wrote:
On Oct 24, 2008, at 8:06 AM, James Gray wrote:
On Sat, 25 Oct 2008 00:07:13 +1100, James Gray <james@grayproductions.net>
On Oct 26, 2008, at 6:48 PM, Michael Selig wrote:
[#19468] [Bug:1.9] failures of test/minitest — Nobuyoshi Nakada <nobu@...>
Hi,
[#19478] Ruby 1.8.7 Throwing "Too many open files" Exception lately??? — "C.E. Thornton" <admin@...>
Group,
[#19487] [ANN] Sipper 1.1.3 Released — "Nasir Khan" <rubylearner@...>
1.1.3 of SIPr pronounced as Sipper has been released earlier this month.
[#19504] Is the stabby proc gone? broken? — "David A. Black" <dblack@...>
Hi --
[#19523] Too Many Files Error -- Test Case Produced. — "C.E. Thornton" <admin@...>
Core,
[#19555] Managing 1.9 threads in extensions — Dave Thomas <dave@...>
I'm trying to pin down the rules for folks who write extensions for
[#19561] Was there a feature freeze on October 25th? — Dave Thomas <dave@...>
Curious authors want to know... :)
[#19564] Ruby 1.9.1 preview1 is out — "Yugui (Yuki Sonoda)" <yugui@...>
Hi all,
[#19566] GC thought — "Roger Pack" <roger.pack@...>
Here is a recent patch I've been experimenting with--for any advice. [1]
On Tue, 28 Oct 2008 17:02:17 +0900, Roger Pack wrote:
> Letting the program continue execution during the mark phase could cause
On Wed, Oct 29, 2008 at 01:04:52AM +0900, Roger Pack wrote:
2008/10/28 Paul Brannan <pbrannan@atdesk.com>:
Robert Klemme wrote:
Robert Klemme wrote:
[#19578] [Bug #691] Time::zone_utc? does not follow rfc2822 — Chun Wang <redmine@...>
Bug #691: Time::zone_utc? does not follow rfc2822
[#19583] [Bug #694] eof? call on a pty IO object causes application to exit — Dave Thomas <redmine@...>
Bug #694: eof? call on a pty IO object causes application to exit
[#19590] [Feature #695] More flexibility when combining ASCII-8BIT strings with other encodings — Michael Selig <redmine@...>
Feature #695: More flexibility when combining ASCII-8BIT strings with other encodings
Hi,
At 07:14 08/10/31, Michael Selig wrote:
Hi
[#19599] Future of Continuations — "r. schempp" <ruben.schempp@...>
Hi,
On Wed, Oct 29, 2008 at 06:54:06PM +0900, r. schempp wrote:
r. schempp schrieb:
[#19604] test failure in r20022 — Mike Stok <mike@...>
I noticed this failure in my morning build of ruby trunk on my laptop:
[#19610] [Bug 1.9] gem_prelude.rb always require rubygems — Yukihiro Matsumoto <matz@...>
Hi,
[#19618] Result of backticks — Jim Deville <jdeville@...>
`echo disc world` returns "disc world\n"
[#19634] performance issues with --enable-pthread on Solaris. — Paul van den Bogaard <Paul.Vandenbogaard@...>
Introduction
[#19660] Odd TypeError in inject (1.9.1 preview 1) — "David A. Black" <dblack@...>
Hi --
On Fri, Oct 31, 2008 at 5:20 AM, David A. Black <dblack@rubypal.com> wrote:
Hi,
On Fri, Oct 31, 2008 at 8:40 AM, Nobuyoshi Nakada <nobu@ruby-lang.org>wrote:
[#19668] [Bug #703] string output duplication occurs if the same file descriptor written to in different threads — Roger Pack <redmine@...>
Bug #703: string output duplication occurs if the same file descriptor written to in different threads
Hi,
[ruby-core:19658] Re: [Feature #695] More flexibility whencombiningASCII-8BIT strings with other encodings
At 13:57 08/10/31, Michael Selig wrote:
>Hi
>
>On Fri, 31 Oct 2008 13:51:53 +1100, Martin Duerst <duerst@it.aoyama.ac.jp>
>wrote:
>
>>> Feature #695 was closed & marked done, but unfortunately it does not
>>> seem to have been implemented :-(
>>
>> I think it should have been marked part done, part rejected,
>> I guess.
>
>Some sort of explanation would also have been nice.
Sometimes things just happen. Often, that's enough, and
if not, it's always possible to ask (as you did).
Bug tracking systems give the impression of perfection,
but one always has to remember that they are only an
attempt.
>But at least we are now discussing it - I was expecting this to happen
>before implementation :-)
>
>> I don't think it is by chance that most programming languages I
>> know, even if they have a somewhat different internationalization
>> model, more focused on Unicode than Ruby, make a clear distinction
>> between characters and bytes. It also isn't by chance that one
>> of the first things people have to learn when they learn about
>> internationalization is "bytes are not characters".
>
>Yes, I agree with you, and I have raised this "ambiguity" before - in Ruby
>ASCII-8BIT can either be a byte string or a character string of uncertain
>encoding.
>The problem I am trying to address here is for simple scripts which don't
>care about internationalisation.
Well, we could make some simple scripts simpler, but only at the
expense of making bigger scripts much more brittle. In my opinion,
once you use \x string escapes or pack, you have to know about the
distinction between bytes and characters, and should be able to
add the necessary force-encoding (or whatever else is needed).
>>> My feature request would mean that "pack" and "\x" string literals could
>>> be left as ASCII-8BIT, and be "forced" to another encoding transparently
>>> depending on how the programmer uses it.
>>
>> I think this is totally the wrong way. The problems are with
>> pack and \x in string literals, and it would be a bad idea to
>> try and solve them by introducing a general "bytes become characters"
>> feature.
>
>"default_internal" has gone a long way to help solve M17N issues, but
>there still remains "encoding compatibility" issues even in simple, single
>encoding scripts, ie: between the locale's encoding and ASCII-8BIT. The
>motivation behind this feature request was to address this latter point.
>
>I agree with you that there is a problem with "\x" in string literals.
>
>However I am not sure I agree that the problem is in pack. The root of the
>problem is this ambiguity with ASCII-8BIT between bytes and characters -
>the way I think it should work is really like a "wild card" encoding.
Well, I think there is a problem in pack. It has so many different
template characters that it's impossible in general to say what
encoding the result should be. Matz did some followup work on
your proposal at revision 20057
(see http://svn.ruby-lang.org/cgi-bin/viewvc.cgi/trunk/pack.c?view=log),
which tries to get the best result possible for simple cases.
For cases that use many different template characters at the
same time, it's simply impossible to figure out what the intent
of the programmer is, so the programmer will have to tell.
>Pack is one simple example of a bunch of methods that return strings, but
>cannot easily determine what encoding to return them in.
I'd guess pack is one of the more complex ones. If you know others,
please tell us, I think nobody is claiming that all i's are dotted
and all t's crossed in this area.
>Other examples are decryption and uncompression methods where often the
>original encoding is not known. In many cases there is no alternative
>other than to return them as ASCII-8BIT and let the application worry
>about interpreting the contents.
>
>This is *forcing* the programmer to use "force_encoding()"
Or whatever else is appropriate.
>where in 1.8 it
>was not necessary, and in 1.9 it can seem rather annoying.
It can seem annoying until you realize that it's necessary.
>There is even a weird exception to this - if the ASCII-8BIT string happens
>to be all 7-bit chars, then it CAN be combined with other ASCII-compatible
>encodings.
Yes, that's one point where it may make sense to split ASCII-8BIT
and BINARY.
>This probably allows some 1.8 legacy scripts to work, but only ones
>working in ASCII.
>I do not think this sort of thing - one that works in some cases, but not
>in others - is desirable at all.
Yes, but in my view, you are just proposing to go down the slippery
slope a bit further. The chances that ASCII is ASCII (and that otherwise,
you'll find out pretty quickly when looking at the data) are much
heigher than the chances that any more specific encoding will be
'guessed' right.
>So in fact Ruby already has what you describe as a "bytes become
>characters feature", but it only works in certain circumstances!
>
>
>>> You can liken this feature to the transparent conversion of an integer
>>> to
>>> a float when doing arithmetic.
>>
>> Well, it's not very similar. The conversion of an interger to a float
>> is very predictable, but the 'conversion' of ASCII-8BIT to some
>> real encoding is just a wild guess.
>
>A "wild guess" is overstating it. If a program attempts to combine an
>ASCII-8BIT string with another encoded string, AND it happens to be a
>valid encoding, I think that the chances are very high that the program is
>expecting the byte string to be in the other encoding. I think that a
>heuristic like this is reasonable as it keeps the language backward
>compatible & neat.
>
>Furthermore as I said, this conversion already happens with ASCII-8BIT
>character strings consisting only of 7 bit chars,
Well, yes, but then that's clearly reflected in the name "ASCII-8BIT".
>so extending it to all
>encodings seems an obvious thing to do. Look at:
>
>a) 7-bit char strings work, irrespective of encoding:
>ruby -e 'p ("abc".force_encoding("ASCII-8BIT") +
>"abc".force_encoding("UTF-8")).encoding'
>=> #<Encoding:UTF-8>
>
>but:
>b) Legal 8-bit encoding string:
>ruby -e 'p ("ab\xE0".force_encoding("ASCII-8BIT") +
>"ab\xE0".force_encoding("ISO-8859-8")).encoding'
>=> -e:1:in `<main>': incompatible character encodings: ASCII-8BIT and
>ISO-8859-8 (Encoding::CompatibilityError)
>
>c) Legal multibyte encoding string:
>ruby -ve 'p ("ab\u0635".force_encoding("ASCII-8BIT") +
>"ab\u0635".force_encoding("UTF-8")).encoding'
>=> -e:1:in `<main>': incompatible character encodings: ASCII-8BIT and
>UTF-8 (Encoding::CompatibilityError)
I think you have to come up with much more realistic examples
than these.
>Certainly I don't see the downside in the conversion to a single-byte
>encoding (eg: example (b)) above. Even if it converted when it shouldn't
>have, the indexing and "codepoint values" are the same as if the result
>were ASCII-8BIT.
The bytes are of course the same. But what counts is whether we have
the right characters.
>One other idea: maybe we should distinguish between 2 encodings "BINARY"
>and "ASCII-8BIT", which are currently aliases. Essentially they are the
>same, but "BINARY" would mean "bytestring" and will generate an error if
>you try to combine it with any other encoding, while "ASCII-8BIT" would
>mean "unknown encoding", which can be combined transparently with other
>encodings.
See separate mail on this topic.
Regards, Martin.
>Maybe there is a better solution - any ideas?
>
>Cheers
>Mike
>
#-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp