[#223105] ruby programming best practice — "Shannon Fang" <xrfang@...>

As a dynamic language, Ruby is much more flexible and easier than other

18 messages 2006/11/01

[#223126] variable pointer — "akbarhome" <akbarhome@...>

@c = "donal"

17 messages 2006/11/01

[#223211] file size revisit — python152@...

Hi, folks

17 messages 2006/11/02

[#223299] Just a question to throw out there... — "Skotty" <shyguyfrenzy@...>

Another noobrube question.

23 messages 2006/11/02

[#223398] Output not clear — "Learning Ruby" <learningruby@...>

I am a newbie to Ruby and the output of the following program is not clear

14 messages 2006/11/03

[#223425] Bytecode Compiler (#100) — Ruby Quiz <james@...>

The three rules of Ruby Quiz:

27 messages 2006/11/03

[#223458] REXML ... performance & memory usage ... — Jeff Wood <jeff@...>

Wow ... I am trying to use REXML to parse through an 8.8Mb xml file ...

14 messages 2006/11/03

[#223653] Book wanted: Metaprogramming in Ruby — Jay Levitt <jay+news@...>

Now that Hal, David B, Curt, and others have some spare time:

25 messages 2006/11/06

[#223736] REXML — "pdg" <pgattphoto@...>

Hi All,

22 messages 2006/11/06

[#223831] the name of Matz — Byung-Hee HWANG <bh@...>

Hello,

51 messages 2006/11/07
[#223839] Re: [OT] the name of Matz — Yukihiro Matsumoto <matz@...> 2006/11/07

Hi,

[#223975] Re: [OT] the name of Matz — Devin Mullins <twifkak@...> 2006/11/08

Yukihiro Matsumoto wrote:

[#224630] Re: the name of Matz — "Ryo" <furufuru@...> 2006/11/12

Yukihiro Matsumoto wrote:

[#224645] Re: the name of Matz — "Robert Dober" <robert.dober@...> 2006/11/12

On 11/12/06, Ryo <furufuru@ccsr.u-tokyo.ac.jp> wrote:

[#242731] Re: the name of Matz — Harry <ruby.hardware@...> 2007/03/09

> It might be fun though if you could give a pointer to the "correct"

[#224216] Re: [OT] the name of Matz — Byung-Hee HWANG <bh@...> 2006/11/09

Yukihiro Matsumoto wrote:

[#223846] How to make a cycling counter from commandline? — "darenbell@..." <darenbell@...>

Hi, I'm looking for a way to implement this idea:

12 messages 2006/11/07

[#223930] Two way communication with the command shell (IO.popen?) — James Smith <jmdjmsmith@...>

19 messages 2006/11/08
[#223943] Re: Two way communication with the command shell (IO.popen?) — ara.t.howard@... 2006/11/08

On Wed, 8 Nov 2006, James Smith wrote:

[#223997] Re: Two way communication with the command shell (IO.popen?) — James Smith <jmdjmsmith@...> 2006/11/08

unknown wrote:

[#224012] Re: Two way communication with the command shell (IO.popen?) — ara.t.howard@... 2006/11/08

On Wed, 8 Nov 2006, James Smith wrote:

[#224327] Re: Two way communication with the command shell (IO.popen?) — James Smith <jmdjmsmith@...> 2006/11/10

unknown wrote:

[#224690] Re: testing whether a process has completed.. — James Smith <jmdjmsmith@...> 2006/11/12

OK, keeping it simple I am basically using the following code:

[#224691] Re: testing whether a process has completed.. — "Patrick Hurley" <phurley@...> 2006/11/12

On 11/12/06, James Smith <jmdjmsmith@msn.com> wrote:

[#223953] Why create web servers? — "CatLady []" <totalharmonicdistortion@...>

Hi,

16 messages 2006/11/08

[#224002] FastRI 0.1.0: faster, smarter RI docs for Ruby, DRb-enabled — Mauricio Fernandez <mfp@...>

FastRI 0.1.0: faster, smarter RI docs for Ruby, DRb-enabled

27 messages 2006/11/08

[#224013] #returning and #tap — "Trans" <transfire@...>

Had use for this today: #returning is a convenience method you'll find

57 messages 2006/11/08
[#225210] Re: #returning and #tap — Eric Hodel <drbrain@...7.net> 2006/11/15

On Nov 8, 2006, at 6:40 AM, Trans wrote:

[#225233] Re: #returning and #tap — Joel VanderWerf <vjoel@...> 2006/11/16

Eric Hodel wrote:

[#225358] Re: #returning and #tap — Eric Hodel <drbrain@...7.net> 2006/11/16

On Nov 15, 2006, at 5:40 PM, Joel VanderWerf wrote:

[#225370] Re: #returning and #tap — ara.t.howard@... 2006/11/16

On Fri, 17 Nov 2006, Eric Hodel wrote:

[#225382] Re: #returning and #tap — Joel VanderWerf <vjoel@...> 2006/11/16

ara.t.howard@noaa.gov wrote:

[#225385] Re: #returning and #tap — dblack@... 2006/11/16

Hi --

[#225388] Re: #returning and #tap — Joel VanderWerf <vjoel@...> 2006/11/16

dblack@wobblini.net wrote:

[#225393] Re: #returning and #tap — dblack@... 2006/11/16

Hi --

[#225399] Re: #returning and #tap — Joel VanderWerf <vjoel@...> 2006/11/16

dblack@wobblini.net wrote:

[#225420] Re: #returning and #tap — dblack@... 2006/11/16

Hi --

[#225476] Re: #returning and #tap — "Martin DeMello" <martindemello@...> 2006/11/17

On 11/17/06, dblack@wobblini.net <dblack@wobblini.net> wrote:

[#225488] Re: #returning and #tap — dblack@... 2006/11/17

Hi --

[#225494] Re: #returning and #tap — spooq <spoooq@...> 2006/11/17

I definitely think of it as tapping a phone line.

[#225495] Re: #returning and #tap — spooq <spoooq@...> 2006/11/17

Actually, how about giving the proc a copy of the object, rather than

[#224039] Proc as Observer — "Tim Pease" <tim.pease@...>

Working with an Observable object, I wanted to be able to add a Proc

20 messages 2006/11/08
[#224061] Re: Proc as Observer — "Trans" <transfire@...> 2006/11/08

[#224040] Simple Math Problem — Thom Loring <tloring@...>

Can anyone shed some light on a simple math problem I have encountered?

14 messages 2006/11/08

[#224087] The Ruby Way review on Slashdot — Timothy Hunter <TimHunter@...>

Whoo-hoo! My review of Hal Fulton's _The_Ruby_Way,_Second_Edition_ is on

17 messages 2006/11/08

[#224157] thousand ways to rome — Chris Mueller <damngoodcoffee@...>

Hi,

17 messages 2006/11/09

[#224246] Overwriting the Integer class for method succ! (instead of just succ) — "paul" <pjvleeuwen@...>

Hi all,

11 messages 2006/11/09

[#224331] Rails vs. Asp.Net politics — "Leslie Viljoen" <leslieviljoen@...>

I have the deciding vote in a new (rather large) web app we need to

28 messages 2006/11/10

[#224352] VCR Program Manager (#101) — Ruby Quiz <james@...>

The three rules of Ruby Quiz:

13 messages 2006/11/10

[#224398] looking for some feedback about Certification — "pat eyler" <pat.eyler@...>

Aaah, nothing like a good controversial topic to stir up a holy war

38 messages 2006/11/10
[#224401] Re: looking for some feedback about Certification — Gustav Paul <gustav@...> 2006/11/10

pat eyler wrote:

[#224439] Re: looking for some feedback about Certification — dblack@... 2006/11/11

Hi --

[#224411] turn 0.1.0 Released — "Tim Pease" <tim.pease@...>

turn version 0.1.0 has been released!

18 messages 2006/11/10

[#224532] McGovern Likes JRuby... — Charles Oliver Nutter <charles.nutter@...>

I'm not sure how to feel about this one :)

26 messages 2006/11/11
[#224570] Re: McGovern Likes JRuby... — "M. Edward (Ed) Borasky" <znmeb@...> 2006/11/11

Charles Oliver Nutter wrote:

[#224574] Re: McGovern Likes JRuby... — David Vallner <david@...> 2006/11/11

M. Edward (Ed) Borasky wrote:

[#224539] Ruby GUI with IDE — "Josh Mr." <kamipride102@...>

Hello all,

33 messages 2006/11/11
[#224543] Re: Ruby GUI with IDE — "M. Edward (Ed) Borasky" <znmeb@...> 2006/11/11

Josh Mr. wrote:

[#224546] Re: Ruby GUI with IDE — AliasX Neo <kamipride102@...> 2006/11/11

M. Edward (Ed) Borasky wrote:

[#224554] Re: Ruby GUI with IDE — David Vallner <david@...> 2006/11/11

AliasX Neo wrote:

[#224569] Re: Ruby GUI with IDE — "M. Edward (Ed) Borasky" <znmeb@...> 2006/11/11

David Vallner wrote:

[#224577] Re: Ruby GUI with IDE — Caleb Tennis <caleb@...> 2006/11/11

>>

[#224578] Re: Ruby GUI with IDE — AliasX Neo <kamipride102@...> 2006/11/11

So I guess a better format for my original question should be:

[#224580] Re: Ruby GUI with IDE — David Vallner <david@...> 2006/11/11

AliasX Neo wrote:

[#224639] regular expression too big — Peter Schrammel <peter.schrammel@...>

Hi,

31 messages 2006/11/12

[#224665] Help convert a Perl user to the Ruby Way. — Sebastian Reid <seb@...>

Hi all.

13 messages 2006/11/12

[#224777] Nitro + Og 0.40.0 — "George Moschovitis" <george.moschovitis@...>

Hello everyone,

17 messages 2006/11/13

[#224817] directory_watcher 0.1.1 — "Tim Pease" <tim.pease@...>

A class for watching files within a directory and generating events

16 messages 2006/11/13
[#224838] Re: directory_watcher 0.1.1 — "Kenosis" <kenosis@...> 2006/11/13

[#224839] Re: directory_watcher 0.1.1 — "Tim Pease" <tim.pease@...> 2006/11/13

On 11/13/06, Kenosis <kenosis@gmail.com> wrote:

[#224933] ruby indentantion — Alfonso <euoar@...>

I have just started with ruby, and something that I have observed is

23 messages 2006/11/14

[#224949] Is 2.0 Integer or Float? — "S. Robert James" <srobertjames@...>

I'd like to be able to do:

18 messages 2006/11/14

[#224997] Assoc method on large array — "gregarican" <greg.kujawa@...>

I am trying to invoke the assoc method on a large array. It seems to

13 messages 2006/11/14

[#225069] Design problem with 'inject' — Gary Boone <dr@...>

20 messages 2006/11/15

[#225109] FastRI 0.2.0: full-text searching, smarter search strategies — Mauricio Fernandez <mfp@...>

FastRI is an alternative to the ri command-line tool. It is *much* faster, and

9 messages 2006/11/15

[#225179] *Fast* way to process large files line by line — Devesh Agrawal <dagrawal@...>

Hi Folks,

20 messages 2006/11/15

[#225288] Re: parse xml file, put results in mysql db — "seb@..." <seb@...>

--- Kathy Simmons <kathys39@hotmail.com> wrote:

15 messages 2006/11/16
[#225291] Re: parse xml file, put results in mysql db — Jon Egil Strand <jes@...> 2006/11/16

>

[#225296] Re: parse xml file, put results in mysql db — Mike Fletcher <lemurific+rforum@...> 2006/11/16

Jon Egil Strand wrote:

[#225330] Re: parse xml file, put results in mysql db — Kathy Simmons <kathys39@...> 2006/11/16

Here's the full code - I'm reading in nmap output in scanfile.xml and

[#225379] IHelp 0.4.0 - full text search — "Ilmari Heikkinen" <ilmari.heikkinen@...>

View and search object documentation from irb.

13 messages 2006/11/16
[#225383] Re: [ANN] IHelp 0.4.0 - full text search — Parragh Szabolcs <parragh@...> 2006/11/16

Ilmari Heikkinen 叝ta:

[#225398] Re: [ANN] IHelp 0.4.0 - full text search — "Ilmari Heikkinen" <ilmari.heikkinen@...> 2006/11/16

Hi,

[#225412] Re: [ANN] IHelp 0.4.0 - full text search — "Ilmari Heikkinen" <ilmari.heikkinen@...> 2006/11/16

> Thanks for noticing this, should be fixed in 0.4.1.

[#225470] Re: [ANN] IHelp 0.4.0 - full text search — Parragh Szabolcs <parragh@...> 2006/11/17

Ilmari Heikkinen 叝ta:

[#225512] Literate Ruby (#102) — Ruby Quiz <james@...>

The three rules of Ruby Quiz:

12 messages 2006/11/17

[#225547] ruby equivalent PHP function is_numeric? — Josselin <josselin@...>

After reading completely my Ruby book, I cannot find a function

15 messages 2006/11/17

[#225681] Ruby vs Java vs c++ — n/a <na@...>

hi, newbie so please be tolerant.... ;)

117 messages 2006/11/18

[#225754] Ruby screen scraping — Chris Gallagher <cgallagher@...>

Hi,

28 messages 2006/11/19

[#225909] Create array of hash values — David Lelong <drlelon@...>

Hi,

13 messages 2006/11/20

[#226023] Bug in ruby? — AliasX Neo <kamipride102@...>

Well, I've spent the last hour or so debugging one of the stupidest

31 messages 2006/11/21

[#226029] array question — Li Chen <chen_li3@...>

Hi all,

41 messages 2006/11/21
[#226031] Re: array question — "Wilson Bilkovich" <wilsonb@...> 2006/11/21

On 11/20/06, Li Chen <chen_li3@yahoo.com> wrote:

[#226120] Hpricot/Rubyful Soup comparison — Wes Gamble <weyus@...>

Has anyone done a head to head comparison of Hpricot and Rubyful Soup

19 messages 2006/11/21

[#226168] New RCRchive, including new process — dblack@...

Hi everyone --

35 messages 2006/11/22

[#226210] invoke system command from within a method — Moritz Reiter <mreiter@...>

-----BEGIN PGP SIGNED MESSAGE-----

11 messages 2006/11/22

[#226228] how do I contribute to Ruby? — "Giles Bowkett" <gilesb@...>

check this out, this is the whiniest change ever, but what I want is

15 messages 2006/11/22

[#226262] Rubyish inst.var initializations — "Victor \"Zverok\" Shepelev" <vshepelev@...>

Hi all.

12 messages 2006/11/23

[#226263] Compare Array Values? — "Daniel N" <has.sox@...>

I want to check to see if two arrays contain the same values.

30 messages 2006/11/23

[#226388] Anyone else getting weird flickr errors? — "Gregory Brown" <gregory.t.brown@...>

When I post to RubyTalk, I've been getting a 'your photo upload

14 messages 2006/11/24

[#226484] Is there a simply way to get every method log itself before running? — "Richard" <RichardDummyMailbox58407@...>

Hi,

11 messages 2006/11/24

[#226537] DictionaryMatcher (#103) — Ruby Quiz <james@...>

The three rules of Ruby Quiz:

18 messages 2006/11/24

[#226553] Ruby/Python/REXX as a MUCK scripting language — Tony Belding <zobeid@...>

I'm interested in using an off-the-shelf interpreted language as a

18 messages 2006/11/25

[#226608] coding practise — sempsteen <sempsteen@...>

Hi all,

23 messages 2006/11/25

[#226707] Ruby/Rails on Gumstix — "M. Edward (Ed) Borasky" <znmeb@...>

For the past couple of weeks, I've been playing around with Ruby on a

16 messages 2006/11/26
[#226751] Re: Ruby/Rails on Gumstix — "Giles Bowkett" <gilesb@...> 2006/11/26

On 11/25/06, M. Edward (Ed) Borasky <znmeb@cesmail.net> wrote:

[#226709] Timestamp — Srinivas Sa <sr.sakhamuri@...>

How do i add two time stamps

23 messages 2006/11/26

[#226731] find index of first non zeo value in array — Josselin <josselin@...>

with :

24 messages 2006/11/26
[#226733] Re: find index of first non zeo value in array — Olivier <o.renaud@...> 2006/11/26

Le dimanche 26 novembre 2006 15:00, Josselin a 馗rit

[#226783] Two Advanced Ruby Performance Questions — Sunny Hirai <sunny@...>

First, I am a Ruby newbie but am an experienced developer of highly

27 messages 2006/11/26
[#226816] Re: Two Advanced Ruby Performance Questions — Edwin Fine <efine145-nospam01@...> 2006/11/26

This post may be stating the obvious, but here goes anyway... I hope I

[#226792] Extremely Noobish Documentation Question — Paco Paco <mepaco@...>

Hello all,

16 messages 2006/11/26

[#226806] Re: ruby and list comprehension — James Cunningham <jameshcunningham@...>

On 2006-11-25 18:47:26 -0500, Brad Tilley <rtilley@vt.edu> said:

12 messages 2006/11/26

[#227012] Is ruby a viable corporate alternative? — "Mr P" <MisterPerl@...>

Our team uses Perl for almost 100% of our projects, as we have for the

27 messages 2006/11/28

[#227041] FileUtils.touch doesn't work — Jeff Toth <jeff@...>

Why won't Ruby just install from the port? I don't know what Ruby is,

12 messages 2006/11/28

[#227108] Simple screen scraper using scrAPI — "doog" <doog@...>

I'm a Ruby novice. Does anyone have an example of a simple screen

14 messages 2006/11/28

[#227160] cidr.rb: port of Perl's Net::CIDR v0.11 available — Jos Backus <jos@...>

Module:

17 messages 2006/11/29

[#227198] Splitting a CSV file into 40,000 line chunks — Drew Olson <olsonas@...>

All -

40 messages 2006/11/29
[#227243] Re: Splitting a CSV file into 40,000 line chunks — James Edward Gray II <james@...> 2006/11/29

On Nov 29, 2006, at 9:32 AM, Drew Olson wrote:

[#227255] Re: Splitting a CSV file into 40,000 line chunks — Drew Olson <olsonas@...> 2006/11/29

Thanks for all the responses. As noted in a post above, I am trying to

[#227219] Need a range, but not getting it. . . . — Peter Bailey <pbailey@...>

Hello,

33 messages 2006/11/29

[#227282] creating directory "http://example.com" — Comfort Eagle <steve@...>

How do I create a directory 'http://example.com' without it getting

16 messages 2006/11/29

[#227302] Wrong results using named arguments — "Jason Vogel" <jasonvogel@...>

Source:

12 messages 2006/11/29

[#227336] Overwhelmed by emails — Daniel DeLorme <dan-ml@...42.com>

This list has way too many messages for the amount of free time I have. Does

12 messages 2006/11/30

[#227388] Timers, scheduling and Ruby — Damphyr <damphyr@...>

Ok, since the original post migh just appear in a month's time, lets

24 messages 2006/11/30
[#227404] Re: Timers, scheduling and Ruby — James Edward Gray II <james@...> 2006/11/30

On Nov 30, 2006, at 7:51 AM, Damphyr wrote:

[#227414] Re: Timers, scheduling and Ruby — ara.t.howard@... 2006/11/30

On Fri, 1 Dec 2006, James Edward Gray II wrote:

[#227416] Re: Timers, scheduling and Ruby — James Edward Gray II <james@...> 2006/11/30

On Nov 30, 2006, at 11:47 AM, ara.t.howard@noaa.gov wrote:

[#227461] Re: Timers, scheduling and Ruby — ara.t.howard@... 2006/11/30

On Fri, 1 Dec 2006, James Edward Gray II wrote:

[#227402] Segmentation fault, proc, eval, long string — Bob Hutchison <hutch@...>

Hi,

27 messages 2006/11/30
[#227415] Re: Segmentation fault, proc, eval, long string [Reproduced] — Bob Hutchison <hutch@...> 2006/11/30

A little more on this...

[#227569] Re: Segmentation fault, proc, eval, long string [Reproduced] — Pit Capitain <pit@...> 2006/12/01

Bob Hutchison schrieb:

[#227426] simple question, looping through each character in a string — "warhero" <beingthexemplarylists@...>

how can I accomplish something like this in ruby:

17 messages 2006/11/30
[#227438] Re: simple question, looping through each character in a string — dblack@... 2006/11/30

Hi --

[#227458] Wisdom of including Rakefile in releases — "Trans" <transfire@...>

I was poking around in the /usr/lib/ruby/gems directory today and

13 messages 2006/11/30

Re: [QUIZ] Bytecode Compiler (#100)

From: "Marcel Ward" <wardies@...>
Date: 2006-11-06 06:58:08 UTC
List: ruby-talk #223680
On 03/11/06, Ruby Quiz <james@grayproductions.net> wrote:
> Note: This quiz isn't really as much work as it might seem!

Ouch! :)  Oh well, I don't think my Ruby is up to some of the more
cunning techniques I see many have employed.

> Your compiler should support all basic arithmetic operations and explicit
> precedence (parenthesis). As standard, syntax/precedence/ associativity/etc.
> should follow Ruby itself.

By associativity does this also mean that "+--+1" - which could be
rewritten as "0+(0-(0-(0+1)))" - should be parsed into a bytecode
which, however well optimised, on execution results in the answer "1"?

I should warn that my solution below is rather long.  I went down a
possibly more traditional route of writing a generic tokenizer/lexer.
I don't know if these are still commonly used but I couldn't find an
existing implementation in the Ruby Standard Library.

I also tried to document the functions using rdoc so someone else
might make use of it. For those who haven't tried it yet, just type
'rdoc' at the command prompt and it makes a nice doc directory under
the current directory with an index.html to start browsing the
file/classes/methods. Nice!

Back to the task...

My wanting a solution that coped with all difficult expressions that
Ruby itself can deal with (using the lexicon allowed) meant having to
get things like the aforementioned negation with parentheses working:

-(---3)    # => 3

...and power precedence (as others have pointed out) combined with
negation and parentheses turned out to be tricky:

64**-(-(-3+5)**3**2)    #=> a big number

There's a big list of test cases such as these in my unit tests (included).

So having written the lexer class, I now set up the state transition
table and ask the lexer to tokenize the expression.  The tokens are
then parsed and the bytecode is generated using a simple mapping.

The code follows; there are two files in total.

Thanks for another fun challenge and congrats all round for reaching
the 100th Ruby Quiz!

Marcel

#!/usr/bin/env ruby
##################################################################
# = compiler_mw.rb - bytecode compiler
#
# Author::        Marcel Ward   <wardies ^a-t^ gmaildotcom>
# Documentation:: Marcel Ward
# Last Modified:: Monday, 06 November 2006

require 'interp'
require 'lexer_mw'

module Compiler
  # The lexer needs to know the character sets involved in deciding
  # which state transition will be fired...
  CHAR_SETS = {
        :plus => [?+], :minus => [?-],
        :digit => /\d/,
        :div_mod => [?/, ?%],  # matches '/' or '%'
        :asterisk => [?*],
        :open_paren => [?(], :close_paren => [?)]
      }

  # Tell the lexer how to parse a datastream: which tokens to
  # generate, what state to switch to, etc.
  # This table was designed according to my vague recollection of
  # the dragon book on compiler construction by Aho/Sethi/Ullman.
  STATE_TRANS_TABLE = {
    :s_start => {
        :plus =>        {:next_s_skip => :s_start},
        :minus =>       {:next_s => :s_negate},
        :digit =>       {:next_s => :s_numeric},
        :open_paren =>  {:next_s => :s_start,
                          :token => :tok_open_paren}
      },
    :s_negate => {
        :plus =>        {:next_s_skip => :s_negate},
        :minus =>       {:next_s => :s_start},
        :digit =>       {:next_s => :s_numeric},
        :open_paren =>  {:next_s_backtrack => :s_start,
                          :token => :tok_negate}
      },
    :s_numeric => {
        :plus =>        {:next_s_backtrack => :s_operator,
                          :token => :tok_int},
        :minus =>       {:next_s_backtrack => :s_operator,
                          :token => :tok_int},
        :digit =>       {:next_s => :s_numeric},
        :div_mod =>     {:next_s_backtrack => :s_operator,
                          :token => :tok_int},
        :asterisk =>    {:next_s_backtrack => :s_operator,
                          :token => :tok_int},
        :close_paren => {:next_s_backtrack => :s_operator,
                          :token => :tok_int},
        :eof =>         {:next_s_backtrack => :s_operator,
                          :token => :tok_int},
      },
    :s_operator => {
        :plus =>        {:next_s => :s_start,
                          :token => :tok_add},
        :minus =>       {:next_s => :s_start,
                          :token => :tok_subtract},
        :div_mod =>     {:next_s => :s_start,
                          :token => :tok_div_mod},
        :asterisk =>    {:next_s => :s_mult_or_power},
        :close_paren => {:next_s => :s_operator,
                          :token => :tok_close_paren},
        :eof =>         {} # when :next_s... is absent, finish
      },
    :s_mult_or_power => {
        :plus =>        {:next_s_backtrack => :s_start,
                          :token => :tok_multiply},
        :minus =>       {:next_s_backtrack => :s_start,
                          :token => :tok_multiply},
        :digit =>       {:next_s_backtrack => :s_start,
                          :token => :tok_multiply},
        :asterisk =>    {:next_s => :s_start,
                          :token => :tok_power},
        :open_paren =>  {:next_s_backtrack => :s_start,
                          :token => :tok_multiply}
      }
  }

  # Compiles a string expression _sum_ into bytecode and returns
  # the bytecode array (as per Ruby Quiz 100 requirements).
  def self.compile(sum)
    lexer = LexerMW.new()
    lexer.init_char_sets(CHAR_SETS)
    lexer.init_state_transitions(STATE_TRANS_TABLE)

    toks = lexer.tokenize(sum)

    puts toks.inspect + "\n\n" + toks.map {|a,b| b}.join(' ') \
      if $DEBUG == 1

    # Get the mnemonic stack by parsing the tokens.
    mnemonic_stack = parse(toks)
    puts "\nParsed toks => #{mnemonic_stack.inspect}" if $DEBUG == 1

    # Last stage now, we convert our internal mnemonics directly
    # to a byte stack in the required bytecode format.
    mnemonics_to_bytecode(mnemonic_stack)
  end

  MNEMONIC_TO_BYTECODE = {
      :tok_add => Interpreter::Ops::ADD,
      :tok_subtract => Interpreter::Ops::SUB,
      :tok_multiply => Interpreter::Ops::MUL,
      :tok_divide => Interpreter::Ops::DIV,
      :tok_modulo => Interpreter::Ops::MOD,
      :tok_power => Interpreter::Ops::POW
    }


  # This exception is raised by the mnemonic-to-bytecode method when
  # an integer constant cannot be pushed onto the interpreter
  # bytecode stack because it is too big to fit the
  # Interpreter::Ops::LCONST instruction.
  class OutOfRangeError < StandardError
  end

  # Convert our internal _mnemonics_ directly to a byte array and
  # return this in the required bytecode format, ready to execute.
  def self.mnemonics_to_bytecode(mnemonics)
    bc = []
    mnemonics.each do
      |mnem|
      if MNEMONIC_TO_BYTECODE.has_key? mnem
        bc << MNEMONIC_TO_BYTECODE[mnem]
      else
        # Try packing this value as a 2-or 4-byte signed string
        # and ensure we get back the same value on unpacking it.
        if [mnem] == [mnem].pack('s').unpack('s')
          # 2-bytes will be enough
          bc << Interpreter::Ops::CONST
          bc.concat([mnem].pack('n').unpack('C*'))
        elsif [mnem] == [mnem].pack('l').unpack('l')
          # 4-bytes will be enough
          bc << Interpreter::Ops::LCONST
          bc.concat([mnem].pack('N').unpack('C*'))
        else
          # It could be dangerous to silently fail when a
          # number will not fit in a 4-byte signed int.
          raise OutOfRangeError
        end
      end
    end
    bc
  end

  # If there is a mismatch in the number of parenthesis, this
  # exception is raised by the #parse routine.
  # E.g. "3+(4-2" and "(3-10))" are both considered invalid.
  class ParenthesisError < Exception
  end

  # The operator precedence hash helps the #parse method to
  # decide when to store up operators and when to flush a load
  # out.  The
  PAREN_PRECEDENCE = 0
  OP_PRECEDENCE = {
      :tok_end => -1,
      :tok_open_paren => PAREN_PRECEDENCE,
      :tok_close_paren => PAREN_PRECEDENCE,
      :tok_add => 1, :tok_subtract => 1,
      :tok_multiply => 2, :tok_div_mod => 2,
      :tok_power => 3,
      :tok_negate => 4
    }

  # Parse an array of [token,value] pairs as returned by
  # LexerMW::tokenize.  Returns our own internal quasi-bytecode
  # mnemonic array.
  def self.parse(tokens)
    operator_stack = []
    ops = []

    # Push the bottom-most element with precedence equivalent to that
    # of :tok_end so when we see :tok_end all pending operation
    # tokens on the stack get popped
    precedence_stack = [OP_PRECEDENCE[:tok_end]]

    tokens.each do
      |tok, val|
      if tok == :tok_int
        # "--3".to_i => 0 is bad, so use eval("--3") => 3 instead.
        ops << eval(val)
      else
        precedence = OP_PRECEDENCE[tok]
        if not tok == :tok_open_paren
          while precedence <= precedence_stack.last &&
                  precedence_stack.last > PAREN_PRECEDENCE
            # Workaround for the fact that the ** power operation
            # is calculated Right-to-left,
            # i.e. 2**3**4 == 2**(3**4) /= (2**3)**4
            break if tok == :tok_power &&
              precedence_stack.last == OP_PRECEDENCE[:tok_power]

            precedence_stack.pop
            ops << operator_stack.pop
          end
        end

        # Divide and modulo come out of the lexer as the same token
        # so override tok according to its corresponding value
        tok == :tok_div_mod && \
          tok = (val == '/') ? :tok_divide : :tok_modulo

        case tok
        when :tok_close_paren
          precedence_stack.pop == PAREN_PRECEDENCE \
            or raise ParenthesisError
        when :tok_negate
          # val contains just the minuses ('-', '--', '---', etc.)
          # Optimise out (x) === --(x) === ----(x), etc.
          if val.size % 2 == 1
            # No negate function for -(x) so simulate using 0 - (x)
            precedence_stack.push precedence
            operator_stack.push :tok_subtract
            ops << 0
          end
        when :tok_end
          raise ParenthesisError if precedence_stack.size != 1
        else
          precedence_stack.push precedence
          operator_stack.push tok unless tok == :tok_open_paren
        end
      end
    end
    ops
  end
end

if $0 == __FILE__
  eval DATA.read, nil, $0, __LINE__+4
end

__END__

require 'test/unit'

class TC_Compiler < Test::Unit::TestCase
  def test_simple
    @test_data = [
      '8', '124', '32767',                    # +ve CONST
      '-1', '-545', '-32768',                 # -ve CONST
      '32768', '294833', '13298833',          # +ve LCONST
      '-32769', '-429433', '-24892810',       # -ve LCONST
      '4+5', '7-3', '30+40+50', '14-52-125',  # ADD, SUB
      '512243+1877324', '40394-12388423',     # LCONST, ADD, SUB
      '3*6', '-42*-90', '94332*119939',       # MUL
      '8/3', '-35/-15', '593823/44549',       # DIV
      '8%3', '243%-59', '53%28%9',            # MOD
      '531%-81%14', '849923%59422',           #
      '-2147483648--2147483648',              # SUB -ve LCONST
      '2**14', '-4**13+2'                     # POW
    ]
    @test_data.each do
      |sum|
      assert_equal [eval(sum)],
        Interpreter.new(Compiler.compile(sum)).run,
        "whilst calculating '#{sum}'"
    end
  end

  def test_advanced
    @test_data = [
      '-(423)', '-(-523*32)', '---0',
      '-(-(-(16**--++2)))',
      '3**(9%5-1)/3+1235349%319883+24*-3',
      '+42', '((2*-4-15/3)%16)', '4**3**((2*-4-15/3)%16)',
      '64**-(-(-3+5)**3**2)', '4*165%41*341/7/2/15%15%13',
      '--(---(4**3**((2*-4-15/3)%16))+++-410--4)'
    ]
    @test_data.each do
      |sum|
      assert_equal [eval(sum)],
        Interpreter.new(Compiler.compile(sum)).run,
        "whilst calculating '#{sum}'"
    end
  end
end


#!/usr/bin/env ruby
##################################################################
# = lexer_mw.rb - generic lexical analyser
#
# Author::        Marcel Ward   <wardies ^a-t^ gmaildotcom>
# Documentation:: Marcel Ward
# Last Modified:: Monday, 06 November 2006
#
# Solution for Ruby Quiz number 100 - http://www.rubyquiz.com/

$DEBUG = 0

# If the lexer fails to find an appropriate entry in the state
# transition table for the current character and state, it
# raises this exception.
class LexerFailure < StandardError
end

# If the lexer encounters a character for which no matching charset
# has been supplied then it raises this exception.
#
# This exception will never be raised if #init_state_transitions
# has been called with an appropriate catch-all charset id.
class InvalidLexeme < StandardError
end

class LexerMW
  # Creates an instance of the lexer class.
  #
  # _lexer_eof_ascii_::
  #   defines the ASCII byte value that the lexer considers as
  #   end-of-file when it is encountered.  When #tokenize is called,
  #   the supplied datastream is automatically appended with this
  #   character.
  def initialize(lexer_eof_ascii = 0)
    @s_trans = {}
    @columns = {}
    @lex_eof = lexer_eof_ascii
  end

  # Initialize the character set columns to be used by the lexer.
  #
  # _cs_defs_::
  #   a hash containing entries of the form <tt>id => match</tt>,
  #   where _match_ defines the characters to be matched and _id_
  #   is the id that will be passed to the finite state machine
  #   to inidicate the character grouping encountered.
  # _eof_charset_id_::
  #   defines the character set identifier which the lexer will
  #   attempt to match in the state machine table when the
  #   end-of-file character defined in #new is encountered.
  #
  # The content of _match_ falls into one of two main categories:
  #
  # _regexp_:: e.g. <tt>/\d/</tt> will match any digit 0..9; or
  # _enum_::   an enumeration that describes the set of allowed
  #            character byte values, e.g.
  #            the array <tt>[?*, ?/, ?%]</tt> matches
  #            <b>*</b>, <b>/</b> or <b>%</b>, while the range
  #            <tt>(?a..?z)</tt> matches lowercase alphas.
  #
  # e.g.
  #
  #   init_char_sets({
  #       :alphanum => /[A-Z0-9]/,
  #       :underscore => [?_],
  #       :lower_vowel => [?a, ?e, ?i, ?o, ?u],
  #       :special => (0..31)
  #     },
  #     :end_line)
  #
  # It is the responsibility of the caller to ensure that the
  # match sets for each column are mutually exclusive.
  #
  # If a 'catch-all' set is needed then it is not necessary
  # to build the set of all characters not already matched.
  # Instead, see #init_state_transitions parameter list.
  #
  # Note, the contents of the hash is duplicated and stored
  # internally to avoid any inadvertent corruption from outside.
  def init_char_sets(cs_defs, eof_charset_id = :eof)
    @charsets = {}
    # Make a verbatim copy of the lexer charset columns
    cs_defs.each_pair do
      |charset_id, match|
      @charsets[charset_id] = match.dup   # works for array/regexp
    end
    # Add an end-of-file charset column for free
    @charsets[eof_charset_id] = [@lex_eof]
    puts "@charsets =\n#{@charsets.inspect}\n\n" if $DEBUG == 1
  end

  # Initialize the state transition table that will be used by the
  # finite state machine to convert incoming characters to tokens.
  #
  # _st_::
  #   a hash that defines the state transition table to be used
  #   (see below).
  # _start_state_::
  #   defines the starting state for the finite state machine.
  # _catch_all_charset_id_::
  #   defines an optional charset id to be tried if the character
  #   currently being analysed matches none of the charsets
  #   in the charset table.  The default +nil+ ensures that the
  #   InvalidLexeme exception is raised if no charsets match.
  #
  # The state transition table hash _st_ maps each valid original
  # state to a hash containing the _rules_ to match when in that
  # state.
  #
  # Each hash entry _rule_ maps one of the character set ids
  # (defined in the call to #init_char_sets) to the _actions_ to be
  # carried out if the current character being analysed by the lexer
  # matches.
  #
  # The _action_ is a hash of distinct actions to be carried out for
  # a match.  The following actions are supported:
  #
  # <tt>:next_s => _state_</tt>::
  #   sets the finite state machine next state to be _state_ and
  #   appends the current character to the lexeme string being
  #   prepared, absorbing the current character in the datastream.
  #
  # <tt>:next_s_skip => _state_</tt>::
  #   as above but the lexeme string being prepared remains static.
  #
  # <tt>:next_s_backtrack => _state_</tt>::
  #   as for _next_s_skip_ above but does not absorb the current
  #   character (it will be used for the next state test).
  #
  # <tt>:token => _tok_</tt>::
  #   appends a hash containing a single entry to the array of
  #   generated tokens, using _tok_ as the key and a copy of the
  #   prepared lexeme string as the value.
  #
  # When the end of the datastream is reached, the lexer looks for
  # a match against charset <tt>:eof</tt>.
  #
  # When the performed actions contain no +next_s+... action, the
  # lexer assumes that a final state has been reached and returns
  # the accumulated array of tokens up to that point.
  #
  # e.g.
  #
  #   init_state_transitions({
  #     :s1 => {:alpha => {next_s = :s2},
  #             :period => {:token => :tok_period}},
  #     :s2 => {:alphanum => {next_s = :s2},
  #             :underscore => {next_s_skip == :s2},
  #             :period => {next_s_backtrack = :s1}
  #             :eof => {}}, // final state, return tokens
  #     }, :s1, :other_chars)
  #
  # Note, the contents of the hash is duplicated and stored
  # internally to avoid any inadvertent corruption from outside.
  def init_state_transitions(st, start_state = :s_start,
                             catch_all_charset_id = nil)
    @start_state = start_state
    @others_key = catch_all_charset_id
    @s_trans = {}
    # Make a verbatim copy of the state transition table
    st.each_pair do
      |orig_state, lexer_rules|
      @s_trans[orig_state] = state_rules = {}
      lexer_rules.each_pair do
        |lexer_charset, lexer_actions|
        state_rules[lexer_charset] = cur_actions = {}
        lexer_actions.each_pair do
          |action, new_val|
          cur_actions[action] = new_val
        end
      end
    end
    puts "@s_trans =\n#{@s_trans.inspect}\n\n" if $DEBUG == 1
  end

  # Tokenize the datastream in _str_ according to the specific
  # character set and state transition table initialized through
  # #init_char_sets and #init_state_transitions.
  #
  # Returns an array of token elements where each element is
  # a pair of the form:
  #
  #   [:token_name, "extracted lexeme string"]
  #
  # The end token marker [:tok_end, nil] is appended to the end
  # of the result on success, e.g.
  #
  #   tokenize(str)
  #   # => [[:tok_a, "123"], [:tok_b, "abc"], [:tok_end, nil]]
  #
  # Raises the LexerFailure exception if no matching state
  # transition is found for the current state and character.
  def tokenize(str)
    state = @start_state
    lexeme = ''
    tokens = []
    # Append our end of file marker to the string to be tokenized
    str += "%c" % @lex_eof
    str.each_byte do
      |char|
      char_as_str = "%c" % char
      loop do
        match = @charsets.find {
          |id, match|
          (match.kind_of? Regexp) ? \
            (match =~ char_as_str) : (match.include? char)
          } || [@others_key, @charsets[@others_key]] or \
            raise InvalidLexeme

        # Look for the action matching our current state and the
        # character set id for our current char.
        action = @s_trans[state][match.first] or raise LexerFailure

        # If found, action contains our hash of actions, e.g.
        # {:next_s_backtrack => :s_operator, :token => :tok_int}
        puts "#{char==@lex_eof?'<eof>':char_as_str}: " \
          "#{state.inspect} - #{action.inspect}" if $DEBUG == 1

        # Build up the lexeme unless we're backtracking or skipping
        lexeme << char_as_str if action.has_key? :next_s

        tokens << [action[:token], lexeme.dup] && lexeme = '' if \
          action.has_key? :token

        # Set the next state, or - when there is no specified next
        # state - we've finished, so return the tokens.
        state = action[:next_s] || action[:next_s_skip] ||
          action[:next_s_backtrack] or
             return tokens << [:tok_end, nil]

        break unless action.has_key? :next_s_backtrack
      end
    end
    tokens
  end
end


if $0 == __FILE__
  eval DATA.read, nil, $0, __LINE__+4
end

__END__

require 'test/unit'

class TC_LexerMW < Test::Unit::TestCase
  def test_simple
    @lexer = LexerMW.new()

    @char_sets = {
        :letter => (?a..?z),
        :digit => (/\d/),
        :space => [?\s, ?_]
      }

    @lexer.init_char_sets(@char_sets)

    @st = {
        :extract_chars => {
          :letter =>  {:next_s => :extract_chars},
          :digit =>   {:next_s => :extract_chars},
          :space =>   {:next_s_skip => :extract_chars,
                       :token => :tok_text},
          :eof =>     {:token => :tok_text}
          },
        :extract_alpha => {
          :letter =>  {:next_s => :extract_alpha},
          :digit =>   {:next_s_backtrack => :extract_num,
                       :token => :tok_alpha},
          :space =>   {:next_s_skip => :extract_alpha,
                       :token => :tok_alpha},
          :other =>   {:next_s_skip => :extract_alpha},
          :eof_exit => {}
          },
        :extract_num => {
          :letter =>  {:next_s_backtrack => :extract_alpha,
                       :token => :tok_num},
          :digit =>   {:next_s => :extract_num},
          :space =>   {:next_s_skip => :extract_num},
          :others =>  {:next_s_skip => :extract_alpha,
                       :token => :tok_num}
          }
      }
    @lexer.init_state_transitions(@st, :extract_chars)
    assert_equal [
        [:tok_text, "123"], [:tok_text, "45"],
        [:tok_text, "6"], [:tok_text, "78"],
        [:tok_text, "abcd"], [:tok_text, "efghi"],
        [:tok_text, "jklmn"], [:tok_end, nil]
      ], @lexer.tokenize("123 45 6_78 abcd efghi_jklmn")

    @lexer = LexerMW.new(?$)
    @lexer.init_char_sets(@char_sets, :eof_exit)
    @lexer.init_state_transitions(@st, :extract_num, :others)
    assert_equal [
        [:tok_num, "12345678"], [:tok_alpha, "abcd"],
        [:tok_alpha, "efghi"], [:tok_num, "445"],
        [:tok_alpha, ""], [:tok_num, "1222"], [:tok_end, nil]
      ], @lexer.tokenize("123 45 6_78 abcd efghi445!12_22!ab$45")

  end
end

In This Thread