BNF-like grammar specified DIRECTLY in Ruby
From:
Eric Mahurin <eric_mahurin@...>
Date:
2005-04-14 05:33:47 UTC
List:
ruby-core #4709
Hello everybody,
I hope I'm not intruding, but I think I found
something you'll like. I think I finally found the
holy grail I was looking for in a language - the
ability to easily write a readable grammar (BNF-like)
directly in the target langauge. I attached a
syntax.rb that defines all the classes needed and
generic expression evaluator example.
The reasons I'm sending this message to the core
mailing list is that 1) I want feedback on the API, 2)
I think this should be considered for one of the
built-in classes (or at least in the standard library)
because of its simplicity and power, 3) I have some
suggestions for some of the builtin library classes
that would help, and 4) some other useful core ruby
language suggestions.
Here are some of the features of the stuff in
syntax.rb:
- allows for BNF-like syntax using operators (+:
sequence, |: alteration, *: loop, unary +/-:
positive/negative lookaheads, ===: compare to a
stream)
- defaults to inifinite lookahead (using a FLUSH
element can control it and where a syntax error is
reported)
- input stream could contain characters or any type of
generic tokens
- both a lexer and parser could be written using the
same classes
- defaults to outputting a parse tree as a multi-level
Array structure which can be converted back to the
original with simply the to_s method (wasn't
overridden).
- code can be associated with syntax objects to modify
the parse tree, qualify, or do the whole thing
- a variety of classes are there for matching atoms:
=== (character, range, string), includes?
(array/string set), [] (array/hash lookup)
- although there is no automatic backtracking like
regular expressions, you can get some of the same
effects with positive/negative lookaheads
- everything seems quite straightforward
- this a first cut. There are probably bugs. The API
probably isn't quite right. Features may be missing.
Here are the additions I made to the built-in classes:
- added/overrode operator methods to String and Range
to automagically create Syntax objects when it seems
right.
- made Range Comparable. This helped the repeat
syntax A*(i..j) be treated like A*i without having to
pay attention to whether the multiplier is an integer
or a Range. p.s. These are equivalent to A{i,j} and
A{i} in regular expressions.
- gave keys/values methods to Array so that it can
look more like a hash. When an element is false/nil
in the array it is consider to not be there for
keys/values.
Here would be a few more niceties for this syntax
stuff if Ruby had these:
- Ranges without a end (and/or begin?). If
syntactically this is a problem, nil could be used
instead. Of course you'd need to raise an exception
when you did a bad thing with them (i.e. to_a). For
example:
(b..)===x -> x>=b
(b..).each {} -> infinite loop starting with b
(..e)===x -> x<=e
(..) ===x -> true
- Allow an object to be used like a method. This
would require a new operator - I'll call it: (). An
object followed by {} or do/end would also be treated
the same (that's the part I really want). With this,
I could replace the "qualify" method with this thing.
Here is an example:
class Test;
def ()(*args,&code); ... end
end
a = Test.new
a { ... }
p.s. I just learned Ruby last week! I really love
this language! It's got the best of Perl, C++, and
Java. My main complaint is a syntactic one - the use
of "end" instead of {}. That's probably the influence
of Python.
__________________________________
Yahoo! Mail Mobile
Take Yahoo! Mail with you! Check email on your mobile phone.
http://mobile.yahoo.com/learn/mail
Attachments (2)
calc.rb
(1.11 KB, application/x-sh)
syntax.rb
(13.3 KB, text/x-ruby)
# classes in this module allow one to define BNF-like grammar directly in Ruby
# Author: Eric Mahurin
# license: free, but you are at your own risk if you use it
module Syntax
# base class where common operators are defined
class Base
def |(other)
Alteration.new(self,other)
end
def +(other)
Sequence.new(self,other)
end
def *(multiplier)
Repeat.new(self,multiplier)
end
def +@
Positive.new(self)
end
def -@
Negative.new(self)
end
def qualify(*args,&code)
Qualify.new(self,*args,&code)
end
end
# just passes the syntax through - needed for recursive syntax
class Pass < Base
def initialize(syntax=NULL)
@syntax =
if (syntax.kind_of?Base)
syntax
else
Verbatim.new(syntax)
end
end
def <<(syntax)
initialize(syntax)
end
def ===(stream)
@syntax===stream
end
end
# generic code matches to the stream (first arg to the code)
# [] operator allows additional arguments to be passed to the code
class Code < Base
def initialize(*args,&code)
@args = args
@code = code
end
def ===(stream,*args) # passing args here will bypass creating a new object
(match = @code[stream,*(@args+args)]) ||
stream.buffered || raise(Error.new(stream,"a semantic error"))
match
end
def [](*args)
self.class.new(*(@args+args),&@code)
end
end
# qualify the match with some code that takes the match
# [] operator allows additional arguments to be passed to the code
class Qualify < Base
def initialize(syntax=NULL,*args,&code)
@syntax =
if (syntax.kind_of?Base)
syntax
else
Verbatim.new(syntax)
end
@args = args
@code = code
end
def ===(stream,*args) # passing args here will bypass creating a new object
(match = (@syntax===stream)) || (return match)
(match = @code[match,*(@args+args)]) ||
stream.buffered || raise(Error.new(stream,"a semantic qualification error"))
match
end
def [](*args)
self.class.new(@syntax,*(@args+args),&@code)
end
end
# sequence of syntaxes
class Sequence < Base
def initialize(*syntaxes)
@syntax = syntaxes.collect do |syntax|
if (syntax.kind_of?Base)
syntax
else
Verbatim.new(syntax)
end
end
end
def +(other)
self.class.new(*(@syntax+[other])) # pull in multiple sequence items
end
def <<(other)
@syntax << ((other.kind_of?Base)?other:Verbatim.new(other))
end
def ===(stream)
matches = []
@syntax.each do |syntax|
match = (syntax===stream)
if (!match)
matches=NIL
break
end
matches << match if match!=TRUE
end
matches
end
end
# alternative syntaxes
class Alteration < Base
def initialize(*syntaxes)
@syntax = syntaxes.collect do |syntax|
if (syntax.kind_of?Base)
syntax
else
Verbatim.new(syntax)
end
end
end
def |(other)
self.class.new(*(@syntax+[other])) # pull in multiple alteration items
end
def <<(other)
@syntax << ((other.kind_of?Base)?other:Verbatim.new(other))
end
def ===(stream)
match = nil
@syntax.detect do |syntax|
match = stream.buffer { |stream| syntax===stream }
end
match || stream.buffered || raise(Error.new(stream,nil,"an alteration"))
match
end
alias =~ ===
end
# repeating syntax
class Repeat < Base
def initialize(syntax,multiplier)
@syntax =
if (syntax.kind_of?Base)
syntax
else
Verbatim.new(syntax)
end
@repeat =
if (multiplier.kind_of?Proc)
multiplier
else
lambda do |matches|
compare = (multiplier<=>(matches.length+1))
if (compare==0)
compare = (multiplier<=>matches.length)
end
compare
end
end
end
def ===(stream)
matches = []
while ((compare=@repeat[matches])>=0)
if (compare>0)
unless (match = (@syntax===stream))
return NIL
end
else
unless (match = stream.buffer { |stream| @syntax===stream })
break
end
end
matches << match
end
# filter out simple TRUE elements
matches = matches.find_all { |match| match!=TRUE }
matches
end
end
# positive syntactic predicate
class Positive < Base
def initialize(syntax)
@syntax =
if (syntax.kind_of?Base)
syntax
else
Verbatim.new(syntax)
end
end
def ===(stream)
stream.buffer { |stream| match = (@syntax===stream); FALSE }
if (match)
TRUE
else
stream.buffered || raise(Error.new(stream,nil,"a positive syntatic predicate"))
FALSE
end
end
end
# negative syntactic predicate
class Negative < Positive
def ===(stream)
stream.buffer { |stream| match = (@syntax===stream); FALSE }
if (!match)
TRUE
else
stream.buffered || raise(Error.new(stream,nil,"a negative syntatic predicate"))
FALSE
end
end
end
# all atoms can also use ~ to invert what's matches
# element match (uses === to match)
class Atom < Base
def initialize(pattern,length=NIL,invert=FALSE)
@pattern = pattern
@length = length
@invert = invert
end
def ~@
new(pattern,length,!invert)
end
def ===(stream)
element = stream.get(@length)
match = (@pattern===element)
match = !match if (@invert)
if (match==TRUE)
element || TRUE
else
match || begin
stream.buffered || raise(Error.new(stream,element.inspect,@pattern.inspect))
FALSE
end
end
end
end
# element set (uses include? to match)
class Set < Atom
def ===(stream)
element = stream.get(@length)
match = @pattern.include?(element)
match = !match if (@invert)
if (match==TRUE)
element || TRUE
else
match || begin
stream.buffered || raise(Error.new(stream,element.inspect,"one of these: #{@pattern.to_s}"))
FALSE
end
end
end
end
# element lookup array or hash (uses [] to match)
# translation will occur if the lookup returns anything but TRUE
class Lookup < Atom
def =~(stream)
element = stream.get(@length)
match = @pattern[element]
match = !match if (@invert)
if (match==TRUE)
element || TRUE
else
match || begin
stream.buffered || raise(Error.new(stream,element.inspect,"one of these: #{@pattern.keys.to_s}"))
FALSE
end
end
end
end
# element sequence that knows its length
class Verbatim < Atom
def initialize(pattern,invert=FALSE)
@pattern = pattern
@invert = invert
end
def ~@
new(pattern,!invert)
end
def ===(stream)
element = stream.get(@pattern.length)
if (element)
match = (@pattern===element)
match = !match if (@invert)
else
match = FALSE
end
if (match==TRUE)
element || TRUE
else
match || begin
stream.buffered || raise(Error.new(stream,element.inspect,@pattern.inspect))
FALSE
end
end
end
end
# any element
class Any < Atom
def initialize(length=NIL,invert=FALSE)
@length = length
@invert = invert
end
def ~@ # create a never matching Atom
new(length,!invert)
end
def ===(stream)
element = stream.get(@length)
!@invert && element
end
end
ANY = Any.new
# zero length constants
FLUSH = Code.new { |stream| stream.flush; TRUE }
FAIL = Code.new { FALSE }
NULL = Code.new { TRUE }
NULLS = Code.new { [] }
EOF = Code.new { !(element = stream.get) }
# exception class for handling syntax errors
class Error < RuntimeError
attr_accessor(:stream,:found,:expected)
def initialize(stream=nil,found=nil,expected=nil)
@stream = stream
@found = found
@expected = expected
end
def to_s
err = [super]
err << "found #{found.to_s}" if found
err << "expected #{expected.to_s}" if expected
err << stream.location.to_s if stream
err * ", "
end
end
end
# class acts like an iterator over a string/array/etc
# except that using buffer allows one go back to a certain point
# another class could be designed to work on an IO/File
class RandomAccessStream
def initialize(s,pos=0)
@s = s
@pos = pos
@buffered = NIL
self
end
def get(termination=NIL)
if (@pos>=@s.length)
# end of file/string/array
element = NIL
elsif (!termination)
# read one character/element
element = @s[@pos]
@pos += 1
else
# read a sub-string/sub-array
pos1 = (termination.kind_of?(Integer)) ? @pos+termination :
(t = @s.index(termination,@pos)) ? t+termination.length :
@s.length
element = @s[@pos...pos1]
@pos = pos1
end
element
end
def buffer(&code)
old_buffered = @buffered
@buffered = @pos if (!@buffered || @pos<@buffered)
pos = @pos
match = NIL
match = code[self]
if (@buffered && @buffered<=pos)
@buffered = old_buffered
elsif (!match)
raise(IndexError,"need to rewind buffer, but it was flushed")
end
@pos = pos if !match
match
end
def flush
@buffered = NIL
end
def buffered
@buffered ? TRUE : FALSE
end
def location
"index #{@pos} in #{@s.inspect}"
end
end
# put stuff in String to have Syntax objects magically appear
class String
def |(other)
Syntax::Verbatim.new(self)|other
end
def +@
+Syntax::Verbatim.new(self)
end
def -@
-Syntax::Verbatim.new(self)
end
alias _repeat *
def *(other)
if (other.kind_of?Numeric)
_repeat(other)
else
Syntax::Verbatim.new(self)*other
end
end
alias _concat +
def +(other)
if (other.kind_of?String)
_concat(other)
else
Syntax::Verbatim.new(self)+other
end
end
def ===(other)
if (other.kind_of?String)
self==other
else
Syntax::Verbatim.new(self)===other
end
end
def qualify(&code)
Syntax::Verbatim.new(self).qualify(&code)
end
end
# allow an Array to look more like a Hash with keys and values
class Array
def keys
(0...length).find_all { |i| self[i] }
end
def values
find_all { | element | element }
end
end
# make things fully comparable to Ranges
# also * makes a Syntax
class Range
include Comparable
def <=>(other)
if (other<self.begin)
+1
elsif (if exclude_end? then other>=self.end else other>self.end end)
-1
else
0
end
end
alias _old_equal ==
def ==(other)
if (other.kind_of?Range)
# undocumented previous functionality
_old_equal(other)
else
(self<=>other)==0
end
end
def *(other)
Syntax::Atom.new(self,1)*other
end
end