[#10209] Market for XML Web stuff — Matt Sergeant <matt@...>

I'm trying to get a handle on what the size of the market for AxKit would be

15 messages 2001/02/01

[#10238] RFC: RubyVM (long) — Robert Feldt <feldt@...>

Hi,

20 messages 2001/02/01
[#10364] Re: RFC: RubyVM (long) — Mathieu Bouchard <matju@...> 2001/02/05

[#10708] Suggestion for threading model — Stephen White <spwhite@...>

I've been playing around with multi-threading. I notice that there are

11 messages 2001/02/11

[#10853] Re: RubyChangeRequest #U002: new proper name for Hash#indexes, Array#indexes — "Mike Wilson" <wmwilson01@...>

10 messages 2001/02/14

[#11037] to_s and << — "Brent Rowland" <tarod@...>

list = [1, 2.3, 'four', false]

15 messages 2001/02/18

[#11094] Re: Summary: RCR #U002 - proper new name fo r indexes — Aleksi Niemel<aleksi.niemela@...>

> On Mon, 19 Feb 2001, Yukihiro Matsumoto wrote:

12 messages 2001/02/19

[#11131] Re: Summary: RCR #U002 - proper new name fo r indexes — "Conrad Schneiker" <schneik@...>

Robert Feldt wrote:

10 messages 2001/02/19

[#11251] Programming Ruby is now online — Dave Thomas <Dave@...>

36 messages 2001/02/21

[#11469] XML-RPC and KDE — schuerig@... (Michael Schuerig)

23 messages 2001/02/24
[#11490] Re: XML-RPC and KDE — schuerig@... (Michael Schuerig) 2001/02/24

Michael Neumann <neumann@s-direktnet.de> wrote:

[#11491] Negative Reviews for Ruby and Programming Ruby — Jim Freeze <jim@...> 2001/02/24

Hi all:

[#11633] RCR: shortcut for instance variable initialization — Dave Thomas <Dave@...>

13 messages 2001/02/26

[#11652] RE: RCR: shortcut for instance variable initialization — Michael Davis <mdavis@...>

I like it!

14 messages 2001/02/27

[#11700] Starting Once Again — Ron Jeffries <ronjeffries@...>

OK, I'm starting again with Ruby. I'm just assuming that I've

31 messages 2001/02/27
[#11712] RE: Starting Once Again — "Aaron Hinni" <aaron@...> 2001/02/27

> 2. So far I think running under TextPad will be better than running

[#11726] Re: Starting Once Again — Aleksi Niemel<zak@...> 2001/02/28

On Wed, 28 Feb 2001, Aaron Hinni wrote:

[ruby-talk:10393] RE: Structured text matching?

From: "Joseph McDonald" <joe@...>
Date: 2001-02-05 23:58:36 UTC
List: ruby-talk #10393

> I'm trying to match and extract pieces from structured text in a similar
> way to what sgrep (see <http://www.cs.helsinki.fi/~jjaakkol/sgrep.html>)
> does.
>
> The concrete purpose is to get titles from HTML files, that is the first
> occurrence of any text between <title> and </title>. Better still, I'd
> like to get the "X" from <html>..<head>..<title> X </title>..</head>.

I don't know of a ruby option, but perl has a really nice
HTML parser: http://search.cpan.org/search?dist=HTML-Tree

The author has an article about its use on tpj:
(password required): http://www.tpj.com/issues/currvol/tpj0503-0008.html

you can do stuff like (I quote from the article):

Sometimes the only way to pin down what you're after is by position in the
tree. For example, headlines of interest may be in the third column of the
second row of the second table element in a page:

  my $table = ( $tree->look_down('_tag','table') )[1];
  my $row2  = ( $table->look_down('_tag', 'tr' ) )[1];
  my $col3  = ( $row2->look-down('_tag', 'td')   )[2];
  ...then do things with $col3...


Or they might be all the links in a <p> element with more than two <br>
elements as children:

  my $p = $tree->look_down(
    '_tag', 'p',
    sub {
      2 > grep { ref($_) and $_->tag eq 'br' }
              $_[0]->content_list
    }
  );
  @links = $p->look_down('_tag', 'a');

All in all, I think it is a very powerful parser, and it would
be great if ruby had something simliar (which it may already have...
I just haven't seen it.).

regards,
-joe

In This Thread