ruby-core

On Feb 10, 2008 2:16 PM, Vincent Isambart <vincent.isambart@gmail.com>
wrote:

> Hi,
>
> > I'd like to bring up the issue of how characters are represented in
> > ruby 1.9 from a performance standpoint.  In a recent ruby-quiz
> > (parsing JSON), the fastest pure-ruby solution was simply an LL(1)
> > parser that looked at one character at a time (it beat various
> > Regexp solutions).  With ruby 1.9, the runtime increased by 4X
> > making it a slow solution.  A simple benchmark is at the end of this
> > message that counts words in an LL(1) fashion.  With ruby 1.8.6, it
> > can could the words in Homer's Iliad in 1.46s on my machine and in
> > ruby 1.9 (from ubuntu gutsy) it takes 52.87s (36X increase in
> > runtime).
>
> I'm surprised that the fastest parsing done in Ruby was with a
> handwritten parser.

I was also surprised that the LL(1) JSON parser was so fast.  I was
expecting a StringScanner solution to beat it.  I also optimized the
StringScanner solution with some of the same techniques.

> When I tried to code a small XML parser in Ruby
> the fastest solution was using StringScanner. And for your small
> example, I rewrote a version using StringScanner that's faster in both
> 1.8 and 1.9 (and it's faster in 1.9). And it's shorter and (I think)
> more readable.
>
> require 'strscan'
>
> strscan = StringScanner.new(text)
> punctuation = spacing = words = 0
> while not strscan.eos?
>   if strscan.skip(/[a-zA-Z_]+/)
>     words += 1
>   elsif strscan.skip(/\s+/)
>     spacing += 1
>   else
>     strscan.skip(/./)
>     punctuation += 1
>   end
> end

I also wrote about the same thing for this benchmark.  To make it work for
an IO (reading one line at a time), you'll need a little more.  I'm not
arguing that StringScanner solution isn't faster in this case.  It is.  For
lexers and parsers I done by hand, I usually find LL(1) and StringScanner
solutions are in the same ball-park.

The benchmark I posted was meant to show the slowdown for an application
dealing directly with characters.

In 1.8.6, you could think about making a pure-ruby Regexp-like (pattern
matching) like replacement.  This is not feasible in 1.9 because of
performance.

Most popular parsers written in Ruby (ERB, json-pure, RedCloth) use
> Regexps (some with and some without StringScanner).
>

Anything using Regexp won't have an issue.

The stuff I'm doing generates parsers and lexers from the ground up (LL(1)
with LL(*) where necessary).  I don't use Regexp mainly because Regexp is
too limiting (hard to apply to an IO).  Since I found the performance in
1.8.6 to be reasonable without Regexp (and I already can do much more than
Regexp), I didn't see the  need to deal with the complexity of adding
Regexp.

>
> > Please consider this significant performance issue in ruby 1.9.
>
> I am not sure this particular case is really a significant issue.
>

For me it definitely is.  Based on the JSON parser, I expect any of my
generated lexers or character parsers to be around 4X slower.

Thread

Prev Next

In This Thread

Prev Next