[#10193] String.ord — David Flanagan <david@...>

Hi,

41 messages 2007/02/05
[#10197] Re: String.ord — Yukihiro Matsumoto <matz@...> 2007/02/06

Hi,

[#10198] Re: String.ord — David Flanagan <david@...> 2007/02/06

Yukihiro Matsumoto wrote:

[#10199] Re: String.ord — Daniel Berger <djberg96@...> 2007/02/06

David Flanagan wrote:

[#10200] Re: String.ord — David Flanagan <david@...> 2007/02/06

Daniel Berger wrote:

[#10208] Re: String.ord — "Nikolai Weibull" <now@...> 2007/02/06

On 2/6/07, David Flanagan <david@davidflanagan.com> wrote:

[#10213] Re: String.ord — David Flanagan <david@...> 2007/02/06

Nikolai Weibull wrote:

[#10215] Re: String.ord — "Nikolai Weibull" <now@...> 2007/02/06

On 2/6/07, David Flanagan <david@davidflanagan.com> wrote:

[#10216] Re: String.ord — David Flanagan <david@...> 2007/02/07

Nikolai Weibull wrote:

[#10288] Socket library should support abstract unix sockets — <noreply@...>

Bugs item #8597, was opened at 2007-02-13 16:10

12 messages 2007/02/13

[#10321] File.basename fails on Windows root paths — <noreply@...>

Bugs item #8676, was opened at 2007-02-15 10:09

11 messages 2007/02/15

[#10323] Trouble with xmlrpc — James Edward Gray II <james@...>

Some of the Ruby code used by TextMate makes use of xmlrpc/

31 messages 2007/02/15
[#10324] Re: Trouble with xmlrpc — "Berger, Daniel" <Daniel.Berger@...> 2007/02/15

> -----Original Message-----

[#10326] Re: Trouble with xmlrpc — James Edward Gray II <james@...> 2007/02/15

On Feb 15, 2007, at 1:29 PM, Berger, Daniel wrote:

[#10342] Re: Trouble with xmlrpc — James Edward Gray II <james@...> 2007/02/16

While I am complaining about xmlrpc, we have another issue. It's

[#10343] Re: Trouble with xmlrpc — Alex Young <alex@...> 2007/02/16

James Edward Gray II wrote:

[#10344] Re: Trouble with xmlrpc — James Edward Gray II <james@...> 2007/02/16

On Feb 16, 2007, at 12:08 PM, Alex Young wrote:

Re: String.ord

From: David Flanagan <david@...>
Date: 2007-02-06 19:23:45 UTC
List: ruby-core #10213
Nikolai Weibull wrote:

> 

> 
> Also, how often is it actually necessary to convert strings to their
> ordinal value in their encoding table?  

If you're working on binary data and want to read the raw byte string 
instead of unpacking it into an array of Fixnums?  I don't know how 
common this is in practice.  I was using a string as a compact sequence 
of bytes to represent a Sudoku grid, which is what made me bring this up.

You say that characters-as-strings makes perfect sense:

> Perhaps, but this is a tradeoff of keeping "characters" and "strings"
> in the same class.  As already mentioned,  "characters" will currently
> be represented by one-character-long Strings in 1.9/2.0.  To me, this
> makes perfect sense, considering that one of the main design goals for
> Strings in 1.9/2.0 is that they should be able to handle most any
> encoding scheme (as I've understood it).
> 

But then you muse about a new type of Fixnum to represents characters!

> Anyway, while we're on the topic, what exactly should String#ord
> return?  I'd argue that a subclass of Fixnum would make sense, which
> would have methods like #alpha?, #digit?, and so on, according to what
> information is provided by the encoding scheme.  This can easily get a
> bit too Unicode-centric, but I prefer writing

I agree with the need for methods like this, but if that's going to 
happen, I'd say the class should just be called a Character, and there 
should be a way to get Character objects directly from strings without 
having to stick the ord method in the middle.  Personally, I'd suggest 
that String.[x] with one argument should return a Character object, and 
String.[x,1] should return a String of length one.

My own musings along these lines make characters a subclass of Symbol 
rather than of Fixnum.  So ?A would be an object much like :A, but would 
have additional character-specific methods, such as #encoding, #alpha?, etc.

>  "a".ord.alpha?
> 
> to
> 
>  Codepoint.alpha?("a".ord)
> 
> or something similar.  I guess a good name for this subclass would be
> Codepoint, but then perhaps #ord isn't a very good name and #codepoint
> would make more sense.
> 
> Finally, perhaps the type of methods I've described above, i.e.,
> #alpha?, #digit?, ..., should be methods of String for strings of
> length one character, like #ord.
> 
> Let's try it out:
> 
>  "a".alpha?
> 
> yes, yes I like that.  Still, String may be getting a bit overloaded by 
> then.

I think it is asking too much to have the String class represent byte 
strings, multi-byte character strings, and individual characters.

>> I hope I'm not coming across as argumentative in this thread.
> 

> 
> http://redhanded.hobix.com/inspect/futurismUnicodeInRuby.html
> 

Thanks!

Let me also respond to a couple of things from other messages:

> Like the fact that #ordAt isn't a very Rubyish name. 

My bad.  That was a typo based on my background in Java and JavaScript. 
  I don't actually like the idea of a separate method, but if one were 
needed, ord_at would obviously be a better name than ordAt.

David Black wrote:

> It's not going to be backward compatible in any case, since [] will
> have changed.  I think the reasoning is that people use [].chr more
> than they're likely to use [].ord, so offloading the less simple
> behavior onto the ord case will save method calls in the long run. 

I would have thought that people would use s[x,1] instead of s[x].ord, 
avoiding the extra method call.

	David Flanagan


In This Thread