[#16113] Strange idea... exporting from a scope — "Hal E. Fulton" <hal9000@...>

Hello...

33 messages 2001/06/01

[#16364] Re: Garbage Collection? — Michael Davis <mdavis@...>

Windows 2000 and linux (RedHat 6.2). I have run these tests on both OSs.

12 messages 2001/06/09

[#16400] Symbolic Computation III — Mathieu Bouchard <matju@...>

14 messages 2001/06/11

[#16502] Playing with Ruby Syntax (was: Initial thoughts about Ruby From a Smalltalk Programmer) — jweirich@...

Michael> Hi Everyone, I have to say I'm utterly fascinated by Ruby

9 messages 2001/06/15

[#16661] Problem running irb with Ruby 1.6.4 under FreeBSD 4.0 — Bob Alexander <balexander@...>

I've installed Ruby 1.6.4 on a FreeBSD 4.0 machine, and get the

11 messages 2001/06/20

[#16686] opening db files made by apache dbmmanage — Fritz Heinrichmeyer <fritz.heinrichmeyer@...>

14 messages 2001/06/21

[#16801] rb_define_class() vs Class.new() — Kero van Gelder <kero@...4050.upc-d.chello.nl>

Hi,

18 messages 2001/06/23
[#16802] Re: rb_define_class() vs Class.new() — ts <decoux@...> 2001/06/23

>>>>> "K" == Kero van Gelder <kero@d4050.upc-d.chello.nl> writes:

[#16841] RE: national characters is strings — "Aleksei Guzev" <aleksei.guzev@...>

Next week I'll try to rebuild Ruby with Unicode strings. But it would be

15 messages 2001/06/25
[#16842] Re: national characters is strings — matz@... (Yukihiro Matsumoto) 2001/06/25

Hi,

[#16843] Re: national characters is strings — "Aleksei Guzev" <aleksei.guzev@...> 2001/06/25

That's good enough. But I'm afraid this could ( not would ) cause string

[#16868] Something strange with Ruby's inheritance mechanism — Eric Jacoboni <jaco@...>

As Ruby beginner, i try some "canonical" OO scripts. Doing so, I've

14 messages 2001/06/25
[#16873] RE: Something strange with Ruby's inheritance mechanism — "Aleksei Guzev" <aleksei.guzev@...> 2001/06/26

[#16879] Re: Something strange with Ruby's inheritance mechanism — Mathieu Bouchard <matju@...> 2001/06/26

On Tue, 26 Jun 2001, Aleksei Guzev wrote:

[#16869] Something strange with Ruby's inheritance mechanism — Eric Jacoboni <jaco@...>

As Ruby beginner, i try some "canonical" OO scripts. Doing so, I've

12 messages 2001/06/25

[#16881] — "Aleksei Guzev" <aleksei.guzev@...>

32 messages 2001/06/26
[#16916] Re: Method overloading (option) Was: Re: — "Wayne Blair" <wayne.blair@...> 2001/06/26

[#16920] Re: Method overloading (option) Was: Re: — matz@... (Yukihiro Matsumoto) 2001/06/26

Hi,

[#16888] finalizers, destructors and whatnot — "David Leal" <david@...>

Hi all,

16 messages 2001/06/26

[#17037] keeping an Exception object alive — David Alan Black <dblack@...>

Hello --

19 messages 2001/06/28
[#17055] Re: keeping an Exception object alive — matz@... (Yukihiro Matsumoto) 2001/06/29

Hi,

[#17066] RCR: Exception methods (was: Re: Re: keeping an Exception object alive) — David Alan Black <dblack@...> 2001/06/29

Hello --

[#17076] Re: RCR: Exception methods (was: Re: Re: keeping an Exception object alive) — matz@... (Yukihiro Matsumoto) 2001/06/29

Hi,

[#17079] Re: RCR: Exception methods (was: Re: Re: keeping an Exception object alive) — David Alan Black <dblack@...> 2001/06/29

Hello --

[#17138] Re: RCR: Exception methods (was: Re: Re: keeping an Exception object alive) — matz@... (Yukihiro Matsumoto) 2001/07/02

Hi,

[#17141] Re: RCR: Exception methods (was: Re: Re: keeping an Exception object alive) — David Alan Black <dblack@...> 2001/07/02

Hello --

[#17142] Re: RCR: Exception methods (was: Re: Re: keeping an Exception object alive) — ts <decoux@...> 2001/07/02

>>>>> "D" == David Alan Black <dblack@candle.superlink.net> writes:

[ruby-talk:16462] Opinion sought: parsing non-regular languages

From: Robert Feldt <feldt@...>
Date: 2001-06-14 07:46:43 UTC
List: ruby-talk #16462
(This is a bit long...)

Hi all,

I'm contemplating different ways for how to extend rockit to handle
non-regular features in the languages to be tokenized/lexed/scanned and
parsed. This is needed to tokenize things that cannot be expressed with
regular expressions. An example (referred to as E1 below) from the Ruby
grammar is parentheses-delimited nested strings such as

    %q(a (nested) string)

which cannot be expressed in a RE since there is no way to count and
balance the parentheses. Other examples of non-regular languages are

 E2, Hollerith strings in Fortran: nHa1,a2,...,an where leading number
     specifies the number of numbers to follow after the H.

I'd appreciate your input on possible solutions to the problem (or
alternative solutions you might have):

S1. Do nothing and let people handle this by writing their own lexer

  You can plug your own lexers into rockit today so this is the current
approach. However, I really think there is added value in having a
formalism and support for using it. The main reason for using hand-written
lexers are speed but there are actually lexer generators that generate
code that is better or very close to the performance of hand-coded
lexers. And if you really crave control why aren't you hand-coding your
regexps today, eh? And why are you interested in using a parser
generator? (Well maybe you aren't; sorry for this mail... ;-))

S2. "Hooks" for hand-coded tokenizers
  Allow an easy way for people to plug in their custom tokenizers into
  the existing regular expression (RE) based lexing framework.

  An example for E1 above would be

   Grammar Ruby
    Tokenizers
      def delimited_string(single)
        s, cp = @string, @current_position
	return nil unless s[cp,1] == "%" and s[cp+2,0] =~ /(\(|\[|{|<)/
        return nil unless s[cp+1,1] == (single ? 'q' : 'Q')
        lp = $1
	rp = {"(" => ")", "[" => "]", "{" => "}", "<" => ">"}[lp]
        # ... continue here to find the position of the balancing rp

        # Return lexeme and position of next unmatched char
        return s[(cp+3)...balancing_rp_pos], balancing_rp_pos+1
      end
    Tokens
      Blank = /\s+/  :skip:
      ...
      DelimQString = delimited_qstring(true)
      DelimIString = delimited_qstring(false)
    Productions
      Program -> ...
      ...
      Literal -> String
              |  ...
      String -> DelimQString
             |  DelimIString
             |  ...
                 
  Pro: Power of a full programming language.
       People already know Ruby so no need to learn new language.
       No need to implement more advanced lexer generator.
  Con: Language-dependent => cannot generate C version for speed etc

  I like this one though I think its a bit ugly. But its simple and
powerful.

S3. Do nothing and let people handle this post parsing

  For E2 this would mean using a regexp like /\d+H(\d+(,\d+)*)?/
  and then checking after the parse if the string is really valid.

  Pro: Simple
  Con: Why continue with parse-related activities after parsing finished?
       Cannot handle all non-regular features for example not E1 above?

S4. Extend regexp that can be used in rockit grammars so that non-regular
things can be expressed
  Add ways to count the number of chars from a char class that has been
matched so far.

  Something like, for E1,  

   /%q(.)(?while count(\1) != count(balancing(\1)))\1/

   where count(c) is the number of c chars seen during this match,
   balancing is
    def balancing(paren)
      {"(" => ")", "[" => "]", "{" => "}", "<" => ">"}[paren]
    end
   and the (?while condition) construct means "consume characters while
the condition is true".

  For E2 this would be

   /(\d+)H(\d+(,\d+)(?times $1.to_i-1))/

  where the re(?times cnt) construct means "match the re regexp cnt
times".

  Pro: Blends nicely into the existing regexp framework?
  Con: What if more constructs than while, times and count is needed? Are
they powerful enough?
       Unclear how to implement on top of existing framework. Probably
need to roll our own regexp engine with new stuff added => takes time.
 
S5. ANTLR-style lexing

  In ANTLR you use the same way to describe the lexing as you do the
parsing, ie. you have the power of CFG and lookahead etc

  Balancing strings can be lexed with something like

   Balanced : '(' (Balanced | /[^)]/)* ')'

  Pro: Only needs to learn one formalism. Generation procedures similar =>
less code to maintain.
  Con: New stuff to look into and learn => takes time
   
  Maybe its a valid question to ask why I'm developing rockit at all; 
  maybe I should have focused on RAntlr (!) in the spirit of RBison... 
  The obvious reason is that I don't know Antlr and LL-parsing but 
  know some things about regexps and LR-parsing. And the Antlr code is
large. But maybe its all about NIH... ;-)

S6. Combining lexing and parsing with merging AST building
  We can use something similar to S5 to solve E1 by combining lexing and
parsing (ie. specifying delimited strings in the productions instead of as
tokens). If we add a way to specify that lexemes of matched elements
should be merged we can build the proper AST. Something like

  ParenDelimQString -> '%q' Balanced   [QString: _,lexeme]
  Balanced -> '(' InnerBalanced* ')'   [^: +,+,+]
  InnerBalanced -> Balanced            [^]
                |  /[^)]+/             [^]

  where ^ means "lift this thing up one level in the AST" and + means
  "merge the lexeme with the lexemes of the other +-marked elements".

  Pro: Blends very nicely into the existing framework
  Con: Cannot handle E2
       Performance
       We need one Balanced production for each paren that can be a
         delimiter ('(', '[', '{' and '<') => much for specifying little 

  Maybe we can solve the last con by allowing back-references in
productions? But we'll also need parameterized productions, like

   Balanced(p, bp) -> p InnerBalanced(p, bp)* bp
   
   etc. Might be cool...

Any comments or ideas? Which solution would you prefer if you'd get to
choose?

Regards,

Robert


In This Thread

Prev Next