[#10209] Market for XML Web stuff — Matt Sergeant <matt@...>

I'm trying to get a handle on what the size of the market for AxKit would be

15 messages 2001/02/01

[#10238] RFC: RubyVM (long) — Robert Feldt <feldt@...>

Hi,

20 messages 2001/02/01
[#10364] Re: RFC: RubyVM (long) — Mathieu Bouchard <matju@...> 2001/02/05

[#10708] Suggestion for threading model — Stephen White <spwhite@...>

I've been playing around with multi-threading. I notice that there are

11 messages 2001/02/11

[#10853] Re: RubyChangeRequest #U002: new proper name for Hash#indexes, Array#indexes — "Mike Wilson" <wmwilson01@...>

10 messages 2001/02/14

[#11037] to_s and << — "Brent Rowland" <tarod@...>

list = [1, 2.3, 'four', false]

15 messages 2001/02/18

[#11094] Re: Summary: RCR #U002 - proper new name fo r indexes — Aleksi Niemel<aleksi.niemela@...>

> On Mon, 19 Feb 2001, Yukihiro Matsumoto wrote:

12 messages 2001/02/19

[#11131] Re: Summary: RCR #U002 - proper new name fo r indexes — "Conrad Schneiker" <schneik@...>

Robert Feldt wrote:

10 messages 2001/02/19

[#11251] Programming Ruby is now online — Dave Thomas <Dave@...>

36 messages 2001/02/21

[#11469] XML-RPC and KDE — schuerig@... (Michael Schuerig)

23 messages 2001/02/24
[#11490] Re: XML-RPC and KDE — schuerig@... (Michael Schuerig) 2001/02/24

Michael Neumann <neumann@s-direktnet.de> wrote:

[#11491] Negative Reviews for Ruby and Programming Ruby — Jim Freeze <jim@...> 2001/02/24

Hi all:

[#11633] RCR: shortcut for instance variable initialization — Dave Thomas <Dave@...>

13 messages 2001/02/26

[#11652] RE: RCR: shortcut for instance variable initialization — Michael Davis <mdavis@...>

I like it!

14 messages 2001/02/27

[#11700] Starting Once Again — Ron Jeffries <ronjeffries@...>

OK, I'm starting again with Ruby. I'm just assuming that I've

31 messages 2001/02/27
[#11712] RE: Starting Once Again — "Aaron Hinni" <aaron@...> 2001/02/27

> 2. So far I think running under TextPad will be better than running

[#11726] Re: Starting Once Again — Aleksi Niemel<zak@...> 2001/02/28

On Wed, 28 Feb 2001, Aaron Hinni wrote:

[ruby-talk:10390] Re: Structured text matching?

From: schuerig@... (Michael Schuerig)
Date: 2001-02-05 23:40:01 UTC
List: ruby-talk #10390
Robert Gustavsson <0317025435@telia.com> wrote:

> "Michael Schuerig" <schuerig@acm.org> wrote in message
> news:1eodn9k.5gnwx1g5scq3N%schuerig@acm.org...
> >
> > The concrete purpose is to get titles from HTML files, that is the first
> > occurrence of any text between <title> and </title>. Better still, I'd
> > like to get the "X" from <html>..<head>..<title> X </title>..</head>.
> 
> # Sample line from a HTML file
> str = "<title>This is the title!</title><title>Another one!</title>"
> 
> # Make a regular expression match that finds a text expression that
> # 1. Starts with the text "<title>"
> # 2. Is followed by any (".") character(s), zero or more ("*"), do it
> non-greedy ("?")
> # 3. And then followed by the text "</title>" (note that the / is escaped by
> a backslash,
> # if not the Ruby interpreter would think that the forward slash indicated
> the end of the regular expression.)

[snip]

> Please note that the samples provided assumes that the start and end tags
> appear in the same string (that is, on the same line in a html file).

That's exactly the restriction I'd like to avoid...

I haven't looked into it, but I'm sure it's possible to redefine the
input record separator, slurp a complete file into a string and match a
regex against that string. This very much goes against my sense of
aesthetics. There's no need to read in the file beyond a successful
match, and there's no need to read further when an orphaned </title> or
a </head> tag are encountered.

To correctly deal with cases such as this requires parsing the input. In
the case of HTML there already is a suitable parser; for other purposes
one could use Racc to generate one (see the RAA for both). But that's
not really what I'm looking for. For lack of a better word, what I'd
like to do is "ad hoc"-parsing in a similar fashion to what sgrep
provides. Possibly my best bet is to make extract a library from sgrep
and add Ruby bindings. But before I go there, I'd like to see what the
pure-Ruby options are.


Michael

-- 
Michael Schuerig
mailto:schuerig@acm.org
http://www.schuerig.de/michael/

In This Thread