[#10209] Market for XML Web stuff — Matt Sergeant <matt@...>

I'm trying to get a handle on what the size of the market for AxKit would be

15 messages 2001/02/01

[#10238] RFC: RubyVM (long) — Robert Feldt <feldt@...>

Hi,

20 messages 2001/02/01
[#10364] Re: RFC: RubyVM (long) — Mathieu Bouchard <matju@...> 2001/02/05

[#10708] Suggestion for threading model — Stephen White <spwhite@...>

I've been playing around with multi-threading. I notice that there are

11 messages 2001/02/11

[#10853] Re: RubyChangeRequest #U002: new proper name for Hash#indexes, Array#indexes — "Mike Wilson" <wmwilson01@...>

10 messages 2001/02/14

[#11037] to_s and << — "Brent Rowland" <tarod@...>

list = [1, 2.3, 'four', false]

15 messages 2001/02/18

[#11094] Re: Summary: RCR #U002 - proper new name fo r indexes — Aleksi Niemel<aleksi.niemela@...>

> On Mon, 19 Feb 2001, Yukihiro Matsumoto wrote:

12 messages 2001/02/19

[#11131] Re: Summary: RCR #U002 - proper new name fo r indexes — "Conrad Schneiker" <schneik@...>

Robert Feldt wrote:

10 messages 2001/02/19

[#11251] Programming Ruby is now online — Dave Thomas <Dave@...>

36 messages 2001/02/21

[#11469] XML-RPC and KDE — schuerig@... (Michael Schuerig)

23 messages 2001/02/24
[#11490] Re: XML-RPC and KDE — schuerig@... (Michael Schuerig) 2001/02/24

Michael Neumann <neumann@s-direktnet.de> wrote:

[#11491] Negative Reviews for Ruby and Programming Ruby — Jim Freeze <jim@...> 2001/02/24

Hi all:

[#11633] RCR: shortcut for instance variable initialization — Dave Thomas <Dave@...>

13 messages 2001/02/26

[#11652] RE: RCR: shortcut for instance variable initialization — Michael Davis <mdavis@...>

I like it!

14 messages 2001/02/27

[#11700] Starting Once Again — Ron Jeffries <ronjeffries@...>

OK, I'm starting again with Ruby. I'm just assuming that I've

31 messages 2001/02/27
[#11712] RE: Starting Once Again — "Aaron Hinni" <aaron@...> 2001/02/27

> 2. So far I think running under TextPad will be better than running

[#11726] Re: Starting Once Again — Aleksi Niemel<zak@...> 2001/02/28

On Wed, 28 Feb 2001, Aaron Hinni wrote:

[ruby-talk:11544] Re: Sorting

From: "Ben Tilly" <ben_tilly@...>
Date: 2001-02-25 18:31:50 UTC
List: ruby-talk #11544
Kevin Smith <sent@qualitycode.com> wrote:
>
>Guy N. Hurst wrote:
> >your code generates:
> >["Name 1", "Name 10", "Name 100", "Name 101",
> >"Name 11", "Name 2", "Name 200", "Name 22"]
> >
> >If I add "[1]to_i" after each split, then it does things better...
>
>Thanks for pointing this out. Of course you are
>correct.
>
>That's what I had intended to do, but I got so
>excited when the array <=> array just *worked*
>(before I looked it up in the book) that I forgot
>to continue coding until the task was done. (Now
>where did I put those unit tests again?)

It gets better.  Since Array includes Enumerable,
that means that sort will just work on arrays of
arrays.

The lack of your need to write a complex sort
function for that was one of the first double-takes
that I did with Ruby (coming from Perl). :-)

> >b = strings.sort do | a, b |
> >        a.split[1].to_i <=> b.split[1].to_i
> >end
>
> >Of course, this is assuming there is a second element in the array after 
>the split, and
> >that it is convertible to an integer.
>
>Indeed. The original requirement was pretty
>vague. Prehaps the need was for each string to be
>sorted entirely, treating each word as a number
>where appropriate and text otherwise. In that
>case, I would write my own compare(array1,
>array2) method that did a split, and iterated
>through each word treating it appropriately.

And you would be about halfway down the same path
that followed...

>It may not be as clever as the regex based
>solutions, but I prefer clear, simple brute force
>unless there's some requirement that forces me in
>a different direction.

Apparent cleverness is often a matter of background.

I faced this exact problem in Perl a while ago.
What I realized is that the right solution is to
break the string into string, number, string, number
and then compare each piece appropriately.  Once
you have that in mind, if you know about how to use
REs in /g mode, you can easily do this.  It is a
little tedious, and you have boundary cases to
worry about (eg this string breaks into more pieces
than that one does) but doable.

Now what I knew is that it is generally best to move
work into pre-processing, and then extraction, so
that comparison functions can be as simple as
possible.  So you extra data into a data structure,
sort, and then extract the original data out.  This
pattern was named a Schwartzian sort by Tom
Christiansen while explaining an answer that Randal
Schwartz.  (Randal did not originate the idea!)

The reason this is good is that the comparison
function gets called n*log(n) times, while pre/post
work only runs n times.  But in Perl this is a
fairly complex hard to do because the default sort
is by string comparison.  Therefore you need to
write the pre and post, and also a sort function
that knows your data structure.  Also people use
function calls, so the pattern reads right to left
while each piece must be read left to right.  Ick.

However in Ruby all parts of this are easier!

First of all Ruby offers the scan method of a string
that makes breaking it up into string/number pairs
easy.

Secondly Ruby already has the logic built in to
allow you to transparently sort data structures
like arrays as long as they know how to <=>
themselves.

Thirdly in Ruby map and sort are method calls, not
functions.  Therefore they read left to right, and
the whole thing reads left to right everywhere.

So put it all together and you get something like
this:

    # This is the preprocessing work
    def sort_split (s)
      pieces = s.scan(/(\D*)(\d*)/)
      pieces.each {|elem|
        # Case insensitive then by number
        elem[0].downcase!
        elem[1] = elem[1].sub(/^0*/, '').to_i
        if "" != elem[0] and "-" == elem[0].slice(-1, 1)
          elem[0].chop!
          elem[1] *= -1
        end
      }
      # Returning (pieces.push, s) works, but almost by
      # accident.  Creating a new array, [sortthing, orig]
      # is a more robust pattern.
      [pieces, s]
    end

    # Build up an unsorted example to try.
    things = (-5..20).to_a.map {|i| "thing #{i}"}
    things.push(
      "thing ", "hello", "world", "This", "That", "This -", "this"
    )
    # And here is where it comes together
    puts things.map {|e| sort_split(e)} .sort.map {|e| e[-1]}
    #^^^ ^^^^^^      ^^^^^^^^^^^^^^^^^   ^^^^      ^^^^^^^^^
    # |    |               |               |            |
    #show  |               |               |            |
    #   original list      |               |            |
    #                 pre-process          |            |
    #                              can sort itself!     |
    #                                              then extract

Clever?

If I thought this up from scratch, then I would have
to be darned clever.  But this is a classic pattern
and a problem I have seen before.  When I see sorting
problems O think in terms of, "pre-process, sort, then
extract" instead of comparison functions.  Usually
easier, and for a bonus the pattern is more efficient
and scales to more complex problems.

BTW if you want to sort some fields in reverse,
you can do this:

    class StringReverse < String
      def <=> (other)
        -1 * super(other)
      end
    end

and now you can turn strings into StringReverse with

    elem[0] = StringReverse.new(elem[0])

and they are just like strings except that they will
sort in reverse.

So no cleverness on my part, just memory of a useful
pattern.

But still I am being impressed at how cleanly these
things tend to work out.  If I can only figure out in
a nutshell what Matz does to make that work... :-)

Cheers,
Ben
_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com

In This Thread

Prev Next