[#14696] Inconsistency in rescuability of "return" — Charles Oliver Nutter <charles.nutter@...>

Why can you not rescue return, break, etc when they are within

21 messages 2008/01/02
[#14699] Re: Inconsistency in rescuability of "return" — Gary Wright <gwtmp01@...> 2008/01/02

[#14738] Enumerable#zip Needs Love — James Gray <james@...>

The community has been building a Ruby 1.9 compatibility tip list on

15 messages 2008/01/03
[#14755] Re: Enumerable#zip Needs Love — Martin Duerst <duerst@...> 2008/01/04

Hello James,

[#14772] Manual Memory Management — Pramukta Kumar <prak@...>

I was thinking it would be nice to be able to free large objects at

36 messages 2008/01/04
[#14788] Re: Manual Memory Management — Marcin Raczkowski <mailing.mr@...> 2008/01/05

I would only like to add that RMgick for example provides free method to

[#14824] Re: Manual Memory Management — MenTaLguY <mental@...> 2008/01/07

On Sat, 5 Jan 2008 15:49:30 +0900, Marcin Raczkowski <mailing.mr@gmail.com> wrote:

[#14825] Re: Manual Memory Management — "Evan Weaver" <evan@...> 2008/01/07

Python supports 'del reference', which decrements the reference

[#14838] Re: Manual Memory Management — Marcin Raczkowski <mailing.mr@...> 2008/01/08

Evan Weaver wrote:

[#14911] Draft of some pages about encoding in Ruby 1.9 — Dave Thomas <dave@...>

Folks:

24 messages 2008/01/10

[#14976] nil encoding as synonym for binary encoding — David Flanagan <david@...>

The following just appeared in the ChangeLog

37 messages 2008/01/11
[#14977] Re: nil encoding as synonym for binary encoding — Yukihiro Matsumoto <matz@...> 2008/01/11

Hi,

[#14978] Re: nil encoding as synonym for binary encoding — Dave Thomas <dave@...> 2008/01/11

[#14979] Re: nil encoding as synonym for binary encoding — David Flanagan <david@...> 2008/01/11

Dave Thomas wrote:

[#14993] Re: nil encoding as synonym for binary encoding — Dave Thomas <dave@...> 2008/01/11

[#14980] Re: nil encoding as synonym for binary encoding — Gary Wright <gwtmp01@...> 2008/01/11

[#14981] Re: nil encoding as synonym for binary encoding — Yukihiro Matsumoto <matz@...> 2008/01/11

Hi,

[#14995] Re: nil encoding as synonym for binary encoding — David Flanagan <david@...> 2008/01/11

Yukihiro Matsumoto writes:

[#15050] how to "borrow" the RDoc::RubyParser and HTMLGenerator — Phlip <phlip2005@...>

Core Rubies:

17 messages 2008/01/13
[#15060] Re: how to "borrow" the RDoc::RubyParser and HTMLGenerator — Eric Hodel <drbrain@...7.net> 2008/01/14

On Jan 13, 2008, at 08:54 AM, Phlip wrote:

[#15062] Re: how to "borrow" the RDoc::RubyParser and HTMLGenerator — Phlip <phlip2005@...> 2008/01/14

Eric Hodel wrote:

[#15073] Re: how to "borrow" the RDoc::RubyParser and HTMLGenerator — Eric Hodel <drbrain@...7.net> 2008/01/14

On Jan 13, 2008, at 20:35 PM, Phlip wrote:

[#15185] Friendlier methods to compare two Time objects — "Jim Cropcho" <jim.cropcho@...>

Hello,

10 messages 2008/01/22

[#15194] Can large scale projects be successful implemented around a dynamic programming language? — Jordi <mumismo@...>

A good article I have found (may have been linked by slashdot, don't know)

8 messages 2008/01/24

[#15248] Symbol#empty? ? — "David A. Black" <dblack@...>

Hi --

24 messages 2008/01/28
[#15250] Re: Symbol#empty? ? — Yukihiro Matsumoto <matz@...> 2008/01/28

Hi,

Re: nil encoding as synonym for binary encoding

From: "Michal Suchanek" <hramrach@...>
Date: 2008-01-11 16:18:38 UTC
List: ruby-core #15001
On 11/01/2008, Dave Thomas <dave@pragprog.com> wrote:
>
> On Jan 11, 2008, at 12:06 AM, David Flanagan wrote:
>
> >> I'm certainly not an expert on encoding, so take this with a grain
> >> of salt, but it seems to be that there _is_ a different between a
> >> string tagged ASCII-8BIT and a string with no encoding. A string
> >> with no encoding should remain unchanged under operations such as
> >> #upcase, whereas upcase on an ASCII-8BIT string could return a
> >> string with different content.
> >
> > Yes, but keep in mind that Encoding::BINARY is a synonym for
> > Encoding::ASCII_8BIT. I just think that nil should work like "no
> > encoding". The closest we've got right now is Encoding::BINARY.
>
> It is now. I'm suggesting that is a mistake: Encoding::Binary should
> not be an alias, but a separate encoding, so that
>
>     "cat".force_encoding("ascii-8bit").upcase  # => "CAT"
>     "cat".force_encoding("binary).upcase       # => "cat"
>
> The reason for this is that if a string is truly binary, it contains
> bytes, not characters. There's no way of breaking it into characters,
> so there's no way you can perform character-related operations on it.
> (For example, it might be UTF-32, for all you know, so upcasing it
> byte by byte would be silly). Or it might be a TCP packet, so upcasing
> it is again meaningless. Only when you know the encoding of a sequence
> of bits can you then treat them as characters.
>
> >> Or, looking at it another way, a string with no encoding contains
> >> bytes but no characters.
> >
> > Exactly.  And when you want to deal with bytes the best you can do
> > is to use Encoding::BINARY
>
> Right, not ASCII-8BIT.

Well, ascii-8bit is the way the strings are mostly used. If you
claimed "nothing known about this, no character operations allowed"
some operations would get very inconvenient.

For one, somebody suggested that
"aaa".force_encoding(Encoding::BINARY)[0] makes no sense as you do not
know how to break into characters. Perhaps byte indexing should be
allowed then?

Then if you want to find out whether there is "GIF"  at the start you
cannot because "GIF" is an ASCII string but your string is
unknown-binary. Or should the binary string allow byte comparison
(searching, ..) with strings of any encoding (even ecbdic or whatever
it is called)? If there is no searching these byte buffers are pretty
useless, right?

Since upcasing (and chomp, and splitting) can't work on binary (and in
this case it's probably even desirable) we got rid of quite a few
methods.

What about regexps then? These would be useful if we collected all the
variants of the GIF or JPEG signature for example. They could possibly
be used but regexps of multibyte encodings are troublesome. Should
they match anywhere - that is even at places other than character
boundaries?

Thanks

Michal

In This Thread