[#10193] String.ord — David Flanagan <david@...>

Hi,

41 messages 2007/02/05
[#10197] Re: String.ord — Yukihiro Matsumoto <matz@...> 2007/02/06

Hi,

[#10198] Re: String.ord — David Flanagan <david@...> 2007/02/06

Yukihiro Matsumoto wrote:

[#10199] Re: String.ord — Daniel Berger <djberg96@...> 2007/02/06

David Flanagan wrote:

[#10200] Re: String.ord — David Flanagan <david@...> 2007/02/06

Daniel Berger wrote:

[#10208] Re: String.ord — "Nikolai Weibull" <now@...> 2007/02/06

On 2/6/07, David Flanagan <david@davidflanagan.com> wrote:

[#10213] Re: String.ord — David Flanagan <david@...> 2007/02/06

Nikolai Weibull wrote:

[#10215] Re: String.ord — "Nikolai Weibull" <now@...> 2007/02/06

On 2/6/07, David Flanagan <david@davidflanagan.com> wrote:

[#10216] Re: String.ord — David Flanagan <david@...> 2007/02/07

Nikolai Weibull wrote:

[#10288] Socket library should support abstract unix sockets — <noreply@...>

Bugs item #8597, was opened at 2007-02-13 16:10

12 messages 2007/02/13

[#10321] File.basename fails on Windows root paths — <noreply@...>

Bugs item #8676, was opened at 2007-02-15 10:09

11 messages 2007/02/15

[#10323] Trouble with xmlrpc — James Edward Gray II <james@...>

Some of the Ruby code used by TextMate makes use of xmlrpc/

31 messages 2007/02/15
[#10324] Re: Trouble with xmlrpc — "Berger, Daniel" <Daniel.Berger@...> 2007/02/15

> -----Original Message-----

[#10326] Re: Trouble with xmlrpc — James Edward Gray II <james@...> 2007/02/15

On Feb 15, 2007, at 1:29 PM, Berger, Daniel wrote:

[#10342] Re: Trouble with xmlrpc — James Edward Gray II <james@...> 2007/02/16

While I am complaining about xmlrpc, we have another issue. It's

[#10343] Re: Trouble with xmlrpc — Alex Young <alex@...> 2007/02/16

James Edward Gray II wrote:

[#10344] Re: Trouble with xmlrpc — James Edward Gray II <james@...> 2007/02/16

On Feb 16, 2007, at 12:08 PM, Alex Young wrote:

Re: String.ord

From: "Nikolai Weibull" <now@...>
Date: 2007-02-07 16:57:52 UTC
List: ruby-core #10226
On 2/7/07, Sam Roberts <sroberts@uniserve.com> wrote:

> Is creating a temporary 1-byte String really that expensive? Some
> benchmarks showing an algorithm that uses a long binary string as a data
> structure performs much faster with String#ord(i) than String#[i].ord
> would probably convince everybody.

May I beg for an /important/ algorithm?

> Btw, isn't ruby 1.9 going to have character set information associated
> with strings? Would #ord(idx) return the value of the byte at a
> particular byte offset idx, or a codepoint at a character idx?

It's worse for other methods like #[], where one can wonder how
grapheme clusters are to be dealt with.  My idea was that you would
have encodings layered over other encodings for this kind of thing.

Say that you have a string s = {abc}, where a, b, and c are Unicode
characters and the {...} syntax means the string of these characters
in some encoding, and that it is encoded using UTF-8, and that a and b
constitute a grapheme cluster.  Under certain conditions you may want
to work with each codepoint separately, under other conditions each
grapheme cluster.  Normally, s.encoding would be "utf-8", but if I
want to work with grapheme clusters I may set s.encoding =
"utf-8.graphemes", where the dot introduces another "encoding axis".
In the first case, s[0] would give you the string {a}.  In the second
case, s[0] would give you the string {ab}.  Sometimes you may want to
work with the individual bytes of s.  You could then set s.encoding =
'ascii' or s.encoding = 'bytes' or something like that (ascii wouldn't
be great, as it is 7-bit and perhaps some implementation may depend on
this), then s[0] would give you the string {a_1}, where a_1 is the
first byte of the encoding of a in UTF-8.

Wow, that wasn't a very good explanation, but perhaps you'll
understand what I'm getting at.  It's about treating a String as a
sequence of bytes with some encoding layered over it to decide how to
retrieve characters from it, i.e., mostly how indexing into the String
works.

It all makes a lot of sense if you implement the encoding handling
using a virtual method table which one could then easily change for a
given String instance whenever needed.

  nikolai

In This Thread