[#10209] Market for XML Web stuff — Matt Sergeant <matt@...>

I'm trying to get a handle on what the size of the market for AxKit would be

15 messages 2001/02/01

[#10238] RFC: RubyVM (long) — Robert Feldt <feldt@...>

Hi,

20 messages 2001/02/01
[#10364] Re: RFC: RubyVM (long) — Mathieu Bouchard <matju@...> 2001/02/05

[#10708] Suggestion for threading model — Stephen White <spwhite@...>

I've been playing around with multi-threading. I notice that there are

11 messages 2001/02/11

[#10853] Re: RubyChangeRequest #U002: new proper name for Hash#indexes, Array#indexes — "Mike Wilson" <wmwilson01@...>

10 messages 2001/02/14

[#11037] to_s and << — "Brent Rowland" <tarod@...>

list = [1, 2.3, 'four', false]

15 messages 2001/02/18

[#11094] Re: Summary: RCR #U002 - proper new name fo r indexes — Aleksi Niemel<aleksi.niemela@...>

> On Mon, 19 Feb 2001, Yukihiro Matsumoto wrote:

12 messages 2001/02/19

[#11131] Re: Summary: RCR #U002 - proper new name fo r indexes — "Conrad Schneiker" <schneik@...>

Robert Feldt wrote:

10 messages 2001/02/19

[#11251] Programming Ruby is now online — Dave Thomas <Dave@...>

36 messages 2001/02/21

[#11469] XML-RPC and KDE — schuerig@... (Michael Schuerig)

23 messages 2001/02/24
[#11490] Re: XML-RPC and KDE — schuerig@... (Michael Schuerig) 2001/02/24

Michael Neumann <neumann@s-direktnet.de> wrote:

[#11491] Negative Reviews for Ruby and Programming Ruby — Jim Freeze <jim@...> 2001/02/24

Hi all:

[#11633] RCR: shortcut for instance variable initialization — Dave Thomas <Dave@...>

13 messages 2001/02/26

[#11652] RE: RCR: shortcut for instance variable initialization — Michael Davis <mdavis@...>

I like it!

14 messages 2001/02/27

[#11700] Starting Once Again — Ron Jeffries <ronjeffries@...>

OK, I'm starting again with Ruby. I'm just assuming that I've

31 messages 2001/02/27
[#11712] RE: Starting Once Again — "Aaron Hinni" <aaron@...> 2001/02/27

> 2. So far I think running under TextPad will be better than running

[#11726] Re: Starting Once Again — Aleksi Niemel<zak@...> 2001/02/28

On Wed, 28 Feb 2001, Aaron Hinni wrote:

[ruby-talk:10394] Re: Structured text matching?

From: Dave Thomas <Dave@...>
Date: 2001-02-06 00:05:07 UTC
List: ruby-talk #10394
schuerig@acm.org (Michael Schuerig) writes:

> > Please note that the samples provided assumes that the start and end tags
> > appear in the same string (that is, on the same line in a html file).
> 
> That's exactly the restriction I'd like to avoid...
> 
> I haven't looked into it, but I'm sure it's possible to redefine the
> input record separator, slurp a complete file into a string and match a
> regex against that string.

str = File.open("x.html") {|f| f.read}
str =~ /.../m

> This very much goes against my sense of aesthetics. There's no need
> to read in the file beyond a successful match, and there's no need
> to read further when an orphaned </title> or a </head> tag are
> encountered.

All true, but at the same time, if you can do it in two lines rather
than writing a full parser, isn't there some compensating gain to be
had?

I've used a technique for a while now to convert structured files from 
one form to another.

1. Slurp the whole file in
2. Convert escaped characters into something distinct so they are no
   longer involved in processing.
3. Match delimiters (for example braces in LaTeX, and <>'s in
   HTML. This is where you take account of strings, commands and the
   like.
4. Perform a series of substitutions which match the command pattern
   and any arguments. The name of the command is then used either to
   look up a hash, or as the name of a method to call. The results of
   all this then get substituted back into the buffer.

It sounds messy, but the reality is that it works, and is a whole lot
simpler than doing the full parse (particularly for non-regular
languages such as LaTeX).


For your particular example, if I was worried about the potential size 
of reading in the while file, I might just read in the first (say) 2k, 
and quickly check for </head>. If I didn't find it, I'd read another
2k until I did.


   def findTitle(file)
      str = ''
      loop do
        begin
           str << file.sysread(2048)
          puts "next"
        rescue EOFError
           raise "</title> not found in file"
        end
        break if str =~ %{</title>}
      end

      return $1 if str =~ %r{<head.*?>.*?<title.*?>(.*?)</title>.*?</head>}m

      raise "Couldn't find title in file"
   end

   title = findTitle(File.open("test.html"))
   puts title

Can't say as I've tested this, but it _might_ work ;-)


Dave

In This Thread