[#15359] Timeout::Error — Jeremy Thurgood <jerith@...>

Good day,

41 messages 2008/02/05
[#15366] Re: Timeout::Error — Eric Hodel <drbrain@...7.net> 2008/02/06

On Feb 5, 2008, at 06:20 AM, Jeremy Thurgood wrote:

[#15370] Re: Timeout::Error — Jeremy Thurgood <jerith@...> 2008/02/06

Eric Hodel wrote:

[#15373] Re: Timeout::Error — Nobuyoshi Nakada <nobu@...> 2008/02/06

Hi,

[#15374] Re: Timeout::Error — Jeremy Thurgood <jerith@...> 2008/02/06

Nobuyoshi Nakada wrote:

[#15412] Re: Timeout::Error — Nobuyoshi Nakada <nobu@...> 2008/02/07

Hi,

[#15413] Re: Timeout::Error — Jeremy Thurgood <jerith@...> 2008/02/07

Nobuyoshi Nakada wrote:

[#15414] Re: Timeout::Error — Nobuyoshi Nakada <nobu@...> 2008/02/07

Hi,

[#15360] reopen: can't change access mode from "w+" to "w"? — Sam Ruby <rubys@...>

I ran 'rake test' on test/spec [1], using

16 messages 2008/02/05
[#15369] Re: reopen: can't change access mode from "w+" to "w"? — Nobuyoshi Nakada <nobu@...> 2008/02/06

Hi,

[#15389] STDIN encoding differs from default source file encoding — Dave Thomas <dave@...>

This seems strange:

21 messages 2008/02/06
[#15392] Re: STDIN encoding differs from default source file encoding — Yukihiro Matsumoto <matz@...> 2008/02/06

Hi,

[#15481] very bad character performance on ruby1.9 — "Eric Mahurin" <eric.mahurin@...>

I'd like to bring up the issue of how characters are represented in

16 messages 2008/02/10

[#15528] Test::Unit maintainer — Kouhei Sutou <kou@...>

Hi Nathaniel, Ryan,

22 messages 2008/02/13

[#15551] Proc#curry — ts <decoux@...>

21 messages 2008/02/14
[#15557] Re: [1.9] Proc#curry — David Flanagan <david@...> 2008/02/15

ts wrote:

[#15558] Re: [1.9] Proc#curry — Yukihiro Matsumoto <matz@...> 2008/02/15

Hi,

[#15560] Re: Proc#curry — Trans <transfire@...> 2008/02/15

[#15585] Ruby M17N meeting summary — Martin Duerst <duerst@...>

This is a rough translation of the Japanese meeting summary

19 messages 2008/02/18

[#15596] possible bug in regexp lexing — Ryan Davis <ryand-ruby@...>

current:

17 messages 2008/02/19

[#15678] Re: [ANN] MacRuby — "Rick DeNatale" <rick.denatale@...>

On 2/27/08, Laurent Sansonetti <laurent.sansonetti@gmail.com> wrote:

18 messages 2008/02/28
[#15679] Re: [ANN] MacRuby — "Laurent Sansonetti" <laurent.sansonetti@...> 2008/02/28

On Thu, Feb 28, 2008 at 6:33 AM, Rick DeNatale <rick.denatale@gmail.com> wrote:

[#15680] Re: [ANN] MacRuby — Yukihiro Matsumoto <matz@...> 2008/02/28

Hi,

[#15683] Re: [ANN] MacRuby — "Laurent Sansonetti" <laurent.sansonetti@...> 2008/02/28

On Thu, Feb 28, 2008 at 1:51 PM, Yukihiro Matsumoto <matz@ruby-lang.org> wrote:

very bad character performance on ruby1.9

From: "Eric Mahurin" <eric.mahurin@...>
Date: 2008-02-10 18:33:54 UTC
List: ruby-core #15481
I'd like to bring up the issue of how characters are represented in
ruby 1.9from a performance standpoint.  In a recent ruby-quiz (parsing
JSON), the
fastest pure-ruby solution was simply an LL(1) parser that looked at one
character at a time (it beat various Regexp solutions).  With ruby 1.9, the
runtime increased by 4X making it a slow solution.  A simple benchmark is at
the end of this message that counts words in an LL(1) fashion.  With ruby
1.8.6, it can could the words in Homer's Iliad in 1.46s on my machine and in
ruby 1.9 (from ubuntu gutsy) it takes 52.87s (36X increase in runtime).

I'm writing a ruby DSL parser/lexer generator (could also replace Regexp
functionality).  This performance issue in ruby 1.9 is a serious problem.

The problem of course is that every character in ruby 1.9 becomes a normal
ruby object (String) in ruby 1.9, whereas in ruby 1.8 they where immediates
(Fixnums).

I'd like to propose that at least ASCII characters in ruby 1.9 be made into
immediates:

* at a minimum, characters should be read-only/frozen.  Allowing them to be
mutable will inhibit many future optimizations.
* give (small) characters a separate class with string-like (read-only)
functionality.
* possibly add a base class that String and this new character class would
be a descendent of.
* eventually make this small (i.e. ASCII or even unicode) character class
have immediate objects

If the above was done, one of these immediate characters would be to a
Fixnum as a frozen String would be to Bignum.  A possible base class of
these would be in line with the Integer class.

Please consider this significant performance issue in ruby 1.9.

Eric


#!/usr/bin/env ruby

require 'benchmark'
require 'stringio'

def io_getc(io)
    io.rewind
    io0 = io.getc
    words = 0
    strings = 0
    spacing = 0
    punctuation = 0
    while (true)
         case io0
         when ?a..?z, ?A..?Z, ?_
             words += 1
             io0 = io.getc
             io0 = io.getc while (case io0;when
?a..?z,?A..?Z,?_,?0..?9;1;end)
         when ?\s,?\t,?\n,?\r
             spacing += 1
             io0 = io.getc
             io0 = io.getc while (case io0;when ?\s,?\t,?\n,?\r;1;end)
         when nil
             break
         else
             punctuation += 1
             io0 = io.getc
         end
    end
    return words, strings, spacing, punctuation
end

file_name = "Homer - Iliad.txt"
system("wget  http://www.e-text.org/text/Homer%20-%20Iliad.txt") unless
File.exist?(file_name)
text = IO.read(file_name)

io = StringIO.new(text)
#io = File.open(file_name)

Benchmark.bmbm { |b|
    b.report("IO#getc") { p io_getc(io) }
}

In This Thread

Prev Next