[#19075] Request For Removal: No Operator Concatenation — James Gray <james@...>

I'm disappointed that Ruby still supports this goofy syntax:

30 messages 2008/10/01
[#19076] Re: Request For Removal: No Operator Concatenation — "Gregory Brown" <gregory.t.brown@...> 2008/10/01

On Wed, Oct 1, 2008 at 1:58 PM, James Gray <james@grayproductions.net> wrote:

[#19078] Re: Request For Removal: No Operator Concatenation — "Jim Freeze" <jimfreeze@...> 2008/10/01

On Wed, Oct 1, 2008 at 1:08 PM, Gregory Brown <gregory.t.brown@gmail.com> wrote:

[#19080] Re: Request For Removal: No Operator Concatenation — James Gray <james@...> 2008/10/01

On Oct 1, 2008, at 1:15 PM, Jim Freeze wrote:

[#19081] Re: Request For Removal: No Operator Concatenation — "Jim Freeze" <jimfreeze@...> 2008/10/01

On Wed, Oct 1, 2008 at 1:29 PM, James Gray <james@grayproductions.net> wrote:

[#19082] Re: Request For Removal: No Operator Concatenation — James Gray <james@...> 2008/10/01

On Oct 1, 2008, at 1:37 PM, Jim Freeze wrote:

[#19083] Re: Request For Removal: No Operator Concatenation — Eric Hodel <drbrain@...7.net> 2008/10/01

On Oct 1, 2008, at 11:42 AM, James Gray wrote:

[#19084] Re: Request For Removal: No Operator Concatenation — "Gregory Brown" <gregory.t.brown@...> 2008/10/01

On Wed, Oct 1, 2008 at 2:45 PM, Eric Hodel <drbrain@segment7.net> wrote:

[#19087] Re: Request For Removal: No Operator Concatenation — "Jim Freeze" <jimfreeze@...> 2008/10/01

On Wed, Oct 1, 2008 at 2:10 PM, Gregory Brown <gregory.t.brown@gmail.com> wrote:

[#19132] [Feature #615] "with" operator — Lavir the Whiolet <redmine@...>

Feature #615: "with" operator

33 messages 2008/10/05
[#19137] Re: [Feature #615] "with" operator — Nobuyoshi Nakada <nobu@...> 2008/10/06

Hi,

[#19138] Re: [Feature #615] "with" operator — Paul Brannan <pbrannan@...> 2008/10/06

On Mon, Oct 06, 2008 at 10:46:49AM +0900, Nobuyoshi Nakada wrote:

[#19141] Re: [Feature #615] "with" operator — _why <why@...> 2008/10/06

On Mon, Oct 06, 2008 at 10:56:23PM +0900, Paul Brannan wrote:

[#19148] Re: [Feature #615] "with" operator — Trans <transfire@...> 2008/10/06

[#19149] Re: [Feature #615] "with" operator — "Austin Ziegler" <halostatue@...> 2008/10/06

On Mon, Oct 6, 2008 at 3:34 PM, Trans <transfire@gmail.com> wrote:

[#19150] Re: [Feature #615] "with" operator — "David A. Black" <dblack@...> 2008/10/06

Hi --

[#19154] Re: [Feature #615] "with" operator — _why <why@...> 2008/10/07

On Tue, Oct 07, 2008 at 05:47:23AM +0900, David A. Black wrote:

[#19250] default_internal encoding — Dave Thomas <dave@...>

I'm documenting default_internal for the PickAxe, and have a couple of

26 messages 2008/10/09
[#19254] Re: default_internal encoding — "Michael Selig" <michael.selig@...> 2008/10/09

Hi,

[#19255] Re: performance of C function calls in 1.8 vs 1.9 — "Michael Selig" <michael.selig@...> 2008/10/10

On Wed, Oct 8, 2008 at 3:52 PM, Paul Brannan <pbrannan / atdesk.com> wrote:

[#19289] [Bug #633] dl segfaults on x86_64-linux systems — Benjamin Floering <redmine@...>

Bug #633: dl segfaults on x86_64-linux systems

19 messages 2008/10/10

[#19315] [Feature #643] __DIR__ — Thomas Sawyer <redmine@...>

Feature #643: __DIR__

14 messages 2008/10/13

[#19342] [Bug #649] Memory leak in a array assignment? — Henri Suur-Inkeroinen <redmine@...>

Bug #649: Memory leak in a array assignment?

14 messages 2008/10/15

[#19350] Net::HTTP.post_form bug : can't post form to correct uri which contains QueryString(QueryString part are lost) and revise — Klesh <kleshwong@...>

Hi,

10 messages 2008/10/16
[#19352] Re: Net::HTTP.post_form bug : can't post form to correct uri which contains QueryString(QueryString part are lost) and revise — "Matt Todd" <chiology@...> 2008/10/16

You are trying to use GET-style query params instead of POSTing the

[#19378] Constant names in 1.9 — Dave Thomas <dave@...>

When Ruby makes the tIDENTIFIER/tCONSTANT test, it looks to see if the =20=

13 messages 2008/10/18

[#19397] [Feature #666] Enumerable::to_hash — Marc-Andre Lafortune <redmine@...>

Feature #666: Enumerable::to_hash

14 messages 2008/10/20
[#23249] [Feature #666](Rejected) Enumerable::to_hash — Yukihiro Matsumoto <redmine@...> 2009/04/18

Issue #666 has been updated by Yukihiro Matsumoto.

[#19422] Now that lambda has more powerful arguments... — Dave Thomas <dave@...>

is there anything that

24 messages 2008/10/21
[#19423] Re: Now that lambda has more powerful arguments... — Wolfgang N疆asi-Donner <ed.odanow@...> 2008/10/21

Dave Thomas schrieb:

[#19424] Re: Now that lambda has more powerful arguments... — Dave Thomas <dave@...> 2008/10/21

[#19427] Re: Now that lambda has more powerful arguments... — Paul Brannan <pbrannan@...> 2008/10/21

On Wed, Oct 22, 2008 at 04:01:45AM +0900, Dave Thomas wrote:

[#19429] Re: Now that lambda has more powerful arguments... — "David A. Black" <dblack@...> 2008/10/21

Hi --

[#19430] Re: Now that lambda has more powerful arguments... — Paul Brannan <pbrannan@...> 2008/10/21

On Wed, Oct 22, 2008 at 04:38:19AM +0900, David A. Black wrote:

[#19431] Re: Now that lambda has more powerful arguments... — "David A. Black" <dblack@...> 2008/10/21

Hi --

[#19432] Re: Now that lambda has more powerful arguments... — Jim Weirich <jim.weirich@...> 2008/10/21

On Oct 21, 2008, at 4:24 PM, David A. Black wrote:

[#19465] [Bug #680] csv.rb: CSV.parse is too late when encoding is mismatch — Takeyuki Fujioka <redmine@...>

Bug #680: csv.rb: CSV.parse is too late when encoding is mismatch

41 messages 2008/10/24
[#19466] Default source encoding (Was: [Bug #680] csv.rb: CSV.parse is too late when encoding is mismatch) — "Michael Selig" <michael.selig@...> 2008/10/24

Hi,

[#19471] Re: Default source encoding (Was: [Bug #680] csv.rb: CSV.parse is toolate when encoding is mismatch) — Martin Duerst <duerst@...> 2008/10/24

A default for the source encoding has been discussed quite a long

[#19473] Re: Default source encoding (Was: [Bug #680] csv.rb: CSV.parse is toolate when encoding is mismatch) — "Michael Selig" <michael.selig@...> 2008/10/24

Hi,

[#19474] Re: Default source encoding (Was: [Bug #680] csv.rb: CSV.parse is toolate when encoding is mismatch) — Yukihiro Matsumoto <matz@...> 2008/10/24

Hi,

[#19515] String literal encoding (Was: Default source encoding (Was: [Bug #680] csv.rb: CSV.parse is toolate when encoding is mismatch)) — "Michael Selig" <michael.selig@...> 2008/10/26

Hi,

[#19517] Re: String literal encoding (Was: Default source encoding (Was: [Bug #680] csv.rb: CSV.parse is toolate when encoding is mismatch)) — Nobuyoshi Nakada <nobu@...> 2008/10/26

Hi,

[#19518] Re: String literal encoding (Was: Default source encoding (Was: [Bug #680] csv.rb: CSV.parse is toolate when encoding is mismatch)) — "Michael Selig" <michael.selig@...> 2008/10/26

On Sun, 26 Oct 2008 17:26:32 +1100, Nobuyoshi Nakada <nobu@ruby-lang.org>

[#19522] Re: String literal encoding (Was: Default source encoding (Was: [Bug #680] csv.rb: CSV.parse is toolate when encoding is mismatch)) — Nobuyoshi Nakada <nobu@...> 2008/10/26

Hi,

[#19525] Re: String literal encoding (Was: Default source encoding (Was: [Bug #680] csv.rb: CSV.parse is toolate when encoding is mismatch)) — "Michael Selig" <michael.selig@...> 2008/10/26

On Sun, 26 Oct 2008 23:34:26 +1100, Nobuyoshi Nakada <nobu@ruby-lang.org>

[#19531] Re: String literal encoding (Was: Default source encoding (Was: [Bug #680] csv.rb: CSV.parse is toolate when encoding is mismatch)) — Nobuyoshi Nakada <nobu@...> 2008/10/27

Hi,

[#19532] Re: String literal encoding (Was: Default source encoding (Was: [Bug #680] csv.rb: CSV.parse is toolate when encoding is mismatch)) — "Michael Selig" <michael.selig@...> 2008/10/27

On Mon, 27 Oct 2008 16:07:54 +1100, Nobuyoshi Nakada <nobu@ruby-lang.org>

[#19533] Re: String literal encoding (Was: Default source encoding (Was: [Bug #680] csv.rb: CSV.parse is toolate when encoding is mismatch)) — Nobuyoshi Nakada <nobu@...> 2008/10/27

Hi,

[#19535] Re: String literal encoding (Was: Default source encoding (Was: [Bug #680] csv.rb: CSV.parse is toolate when encoding is mismatch)) — "Michael Selig" <michael.selig@...> 2008/10/27

On Mon, 27 Oct 2008 17:27:57 +1100, Nobuyoshi Nakada <nobu@ruby-lang.org>

[#19538] Re: String literal encoding (Was: Default source encoding (Was: [Bug #680] csv.rb: CSV.parse is toolate when encoding is mismatch)) — Nobuyoshi Nakada <nobu@...> 2008/10/27

Hi,

[#19540] Re: String literal encoding (Was: Default source encoding (Was: [Bug #680] csv.rb: CSV.parse is toolate when encoding is mismatch)) — "Michael Selig" <michael.selig@...> 2008/10/27

On Mon, 27 Oct 2008 20:55:32 +1100, Nobuyoshi Nakada <nobu@ruby-lang.org>

[#19546] Re: String literal encoding (Was: Default source encoding (Was: [Bug #680] csv.rb: CSV.parse is toolate when encoding is mismatch)) — Nobuyoshi Nakada <nobu@...> 2008/10/27

Hi,

[#19480] Re: Default source encoding (Was: [Bug #680] csv.rb: CSV.parse is toolate when encoding is mismatch) — James Gray <james@...> 2008/10/24

On Oct 24, 2008, at 1:52 AM, Martin Duerst wrote:

[#19566] GC thought — "Roger Pack" <roger.pack@...>

Here is a recent patch I've been experimenting with--for any advice. [1]

26 messages 2008/10/28
[#19569] Re: GC thought — Ken Bloom <kbloom@...> 2008/10/28

On Tue, 28 Oct 2008 17:02:17 +0900, Roger Pack wrote:

[#19575] Re: GC thought — "Roger Pack" <roger.pack@...> 2008/10/28

> Letting the program continue execution during the mark phase could cause

[#19577] Re: GC thought — Paul Brannan <pbrannan@...> 2008/10/28

On Wed, Oct 29, 2008 at 01:04:52AM +0900, Roger Pack wrote:

[#19596] Re: GC thought — "Robert Klemme" <shortcutter@...> 2008/10/29

2008/10/28 Paul Brannan <pbrannan@atdesk.com>:

[#19590] [Feature #695] More flexibility when combining ASCII-8BIT strings with other encodings — Michael Selig <redmine@...>

Feature #695: More flexibility when combining ASCII-8BIT strings with other encodings

13 messages 2008/10/29
[#19646] Re: [Feature #695] More flexibility when combining ASCII-8BIT strings with other encodings — "Michael Selig" <michael.selig@...> 2008/10/30

Hi,

[ruby-core:19658] Re: [Feature #695] More flexibility whencombiningASCII-8BIT strings with other encodings

From: Martin Duerst <duerst@...>
Date: 2008-10-31 10:24:34 UTC
List: ruby-core #19658
At 13:57 08/10/31, Michael Selig wrote:
>Hi
>
>On Fri, 31 Oct 2008 13:51:53 +1100, Martin Duerst <duerst@it.aoyama.ac.jp>  
>wrote:
>
>>> Feature #695 was closed & marked done, but unfortunately it does not  
>>> seem to have been implemented :-(
>>
>> I think it should have been marked part done, part rejected,
>> I guess.
>
>Some sort of explanation would also have been nice.

Sometimes things just happen. Often, that's enough, and
if not, it's always possible to ask (as you did).

Bug tracking systems give the impression of perfection,
but one always has to remember that they are only an
attempt.

>But at least we are now discussing it - I was expecting this to happen  
>before implementation :-)
>
>> I don't think it is by chance that most programming languages I
>> know, even if they have a somewhat different internationalization
>> model, more focused on Unicode than Ruby, make a clear distinction
>> between characters and bytes. It also isn't by chance that one
>> of the first things people have to learn when they learn about
>> internationalization is "bytes are not characters".
>
>Yes, I agree with you, and I have raised this "ambiguity" before - in Ruby  
>ASCII-8BIT can either be a byte string or a character string of uncertain  
>encoding.



>The problem I am trying to address here is for simple scripts which don't  
>care about internationalisation.

Well, we could make some simple scripts simpler, but only at the
expense of making bigger scripts much more brittle. In my opinion,
once you use \x string escapes or pack, you have to know about the
distinction between bytes and characters, and should be able to
add the necessary force-encoding (or whatever else is needed).


>>> My feature request would mean that "pack" and "\x" string literals could
>>> be left as ASCII-8BIT, and be "forced" to another encoding transparently
>>> depending on how the programmer uses it.
>>
>> I think this is totally the wrong way. The problems are with
>> pack and \x in string literals, and it would be a bad idea to
>> try and solve them by introducing a general "bytes become characters"
>> feature.
>
>"default_internal" has gone a long way to help solve M17N issues, but  
>there still remains "encoding compatibility" issues even in simple, single  
>encoding scripts, ie: between the locale's encoding and ASCII-8BIT. The  
>motivation behind this feature request was to address this latter point.
>
>I agree with you that there is a problem with "\x" in string literals.
>
>However I am not sure I agree that the problem is in pack. The root of the  
>problem is this ambiguity with ASCII-8BIT between bytes and characters -  
>the way I think it should work is really like a "wild card" encoding.

Well, I think there is a problem in pack. It has so many different
template characters that it's impossible in general to say what
encoding the result should be. Matz did some followup work on
your proposal at revision 20057
(see http://svn.ruby-lang.org/cgi-bin/viewvc.cgi/trunk/pack.c?view=log),
which tries to get the best result possible for simple cases.
For cases that use many different template characters at the
same time, it's simply impossible to figure out what the intent
of the programmer is, so the programmer will have to tell.


>Pack is one simple example of a bunch of methods that return strings, but  
>cannot easily determine what encoding to return them in.

I'd guess pack is one of the more complex ones. If you know others,
please tell us, I think nobody is claiming that all i's are dotted
and all t's crossed in this area.

>Other examples are decryption and uncompression methods where often the  
>original encoding is not known. In many cases there is no alternative  
>other than to return them as ASCII-8BIT and let the application worry  
>about interpreting the contents.
>
>This is *forcing* the programmer to use "force_encoding()"

Or whatever else is appropriate.

>where in 1.8 it  
>was not necessary, and in 1.9 it can seem rather annoying.

It can seem annoying until you realize that it's necessary.

>There is even a weird exception to this - if the ASCII-8BIT string happens  
>to be all 7-bit chars, then it CAN be combined with other ASCII-compatible  
>encodings.

Yes, that's one point where it may make sense to split ASCII-8BIT
and BINARY.

>This probably allows some 1.8 legacy scripts to work, but only ones  
>working in ASCII.
>I do not think this sort of thing - one that works in some cases, but not  
>in others - is desirable at all.

Yes, but in my view, you are just proposing to go down the slippery
slope a bit further. The chances that ASCII is ASCII (and that otherwise,
you'll find out pretty quickly when looking at the data) are much
heigher than the chances that any more specific encoding will be
'guessed' right.

>So in fact Ruby already has what you describe as a "bytes become  
>characters feature", but it only works in certain circumstances!
>
>
>>> You can liken this feature to the transparent conversion of an integer  
>>> to
>>> a float when doing arithmetic.
>>
>> Well, it's not very similar. The conversion of an interger to a float
>> is very predictable, but the 'conversion' of ASCII-8BIT to some
>> real encoding is just a wild guess.
>
>A "wild guess" is overstating it. If a program attempts to combine an  
>ASCII-8BIT string with another encoded string, AND it happens to be a  
>valid encoding, I think that the chances are very high that the program is  
>expecting the byte string to be in the other encoding. I think that a  
>heuristic like this is reasonable as it keeps the language backward  
>compatible & neat.
>
>Furthermore as I said, this conversion already happens with ASCII-8BIT  
>character strings consisting only of 7 bit chars,

Well, yes, but then that's clearly reflected in the name "ASCII-8BIT".

>so extending it to all  
>encodings seems an obvious thing to do. Look at:
>
>a) 7-bit char strings work, irrespective of encoding:
>ruby -e 'p ("abc".force_encoding("ASCII-8BIT") +  
>"abc".force_encoding("UTF-8")).encoding'
>=> #<Encoding:UTF-8>
>
>but:
>b) Legal 8-bit encoding string:
>ruby -e 'p ("ab\xE0".force_encoding("ASCII-8BIT") +  
>"ab\xE0".force_encoding("ISO-8859-8")).encoding'
>=> -e:1:in `<main>': incompatible character encodings: ASCII-8BIT and  
>ISO-8859-8 (Encoding::CompatibilityError)
>
>c) Legal multibyte encoding string:
>ruby -ve 'p ("ab\u0635".force_encoding("ASCII-8BIT") +  
>"ab\u0635".force_encoding("UTF-8")).encoding'
>=> -e:1:in `<main>': incompatible character encodings: ASCII-8BIT and  
>UTF-8 (Encoding::CompatibilityError)

I think you have to come up with much more realistic examples
than these.


>Certainly I don't see the downside in the conversion to a single-byte  
>encoding (eg: example (b)) above. Even if it converted when it shouldn't  
>have, the indexing and "codepoint values" are the same as if the result  
>were ASCII-8BIT.

The bytes are of course the same. But what counts is whether we have
the right characters.

>One other idea: maybe we should distinguish between 2 encodings "BINARY"  
>and "ASCII-8BIT", which are currently aliases. Essentially they are the  
>same, but "BINARY" would mean "bytestring" and will generate an error if  
>you try to combine it with any other encoding, while "ASCII-8BIT" would  
>mean "unknown encoding", which can be combined transparently with other  
>encodings.

See separate mail on this topic.


Regards,   Martin.

>Maybe there is a better solution - any ideas?
>
>Cheers
>Mike
>


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp      mailto:duerst@it.aoyama.ac.jp    


In This Thread