[#363639] Parsing geonames — 12 34 <rubyforum@...>

A Ruby newbie having trouble getting results back from geonames

16 messages 2010/06/02
[#363641] Re: Parsing geonames — Michael Fellinger <m.fellinger@...> 2010/06/02

On Wed, Jun 2, 2010 at 2:57 PM, 12 34 <rubyforum@web.knobby.ws> wrote:

[#363642] Re: Parsing geonames — 12 34 <rubyforum@...> 2010/06/02

Michael Fellinger wrote:

[#363646] installation issue with Ruby gems on Ubuntu 8.04 — Santosh Dvn <santoshdvn@...>

Hi I am installing ruby gems on ubuntu 8.04 .. while installing i got

14 messages 2010/06/02

[#363662] having a class method called only one time ? — unbewusst.sein@... (Une B騅ue)

I'd like having a class method called only one time ?

12 messages 2010/06/02

[#363756] comparing objects — Anderson Leite <andersonlfl@...>

How can I compare two objects and get true if some of his atributes are

48 messages 2010/06/03
[#364122] Re: comparing objects — Rein Henrichs <reinh@...> 2010/06/10

On 2010-06-10 06:59:40 -0700, Robert Dober said:

[#363764] Documenting Ruby 1.9: Ebook or Wiki? — Run Paint Run Run <runrun@...>

I'm writing a free ebook about Ruby 1.9 at http://ruby.runpaint.org/ .

17 messages 2010/06/04
[#363765] Re: Documenting Ruby 1.9: Ebook or Wiki? — Mohit Sindhwani <mo_mail@...> 2010/06/04

On 4/6/2010 11:30 AM, Run Paint Run Run wrote:

[#363775] Looking for ORM for 'legacy' database. — Dave Howell <groups.2009a@...>

I feel I should start with some pre-emptive apologies. I used to =

28 messages 2010/06/04
[#363895] Re: Looking for ORM for 'legacy' database. — Phrogz <phrogz@...> 2010/06/06

On Jun 4, 3:29=A0am, Dave Howell <groups.20...@grandfenwick.net> wrote:

[#363975] Re: Looking for ORM for 'legacy' database. — Dave Howell <groups.2009a@...> 2010/06/07

[#363883] inject method of Array class — RichardSchollar <richardgschollar@...>

I have only just started using Ruby (and am a total noob, in case this

14 messages 2010/06/06

[#363944] Complex numbers contradiction? — Andrew Duncan <andrew.duncan@...>

This looks correct:

13 messages 2010/06/07
[#363951] Re: Complex numbers contradiction? — Robert Dober <robert.dober@...> 2010/06/07

On Mon, Jun 7, 2010 at 9:01 PM, Andrew Duncan <andrew.duncan@sonos.com> wrote:

[#364010] Rubyzip - `dup': can't dup NilClass (TypeError) — Luka Stolyarov <lukich@...>

Hello. I've trying to figure out rubyzip. Here's the code I had:

11 messages 2010/06/08

[#364101] Why private #binding? — Intransition <transfire@...>

Why is #binding a private method? I end up doing a lot of this:

13 messages 2010/06/10

[#364268] State of the union for Ruby CLI libraries? — John Feminella <johnf@...>

I am starting construction on a somewhat complicated internal

18 messages 2010/06/13

[#364273] Loading a module without polluting my namespace — Hagbard Celine <sin3141592@...>

Hey folks!

20 messages 2010/06/13

[#364330] shorthand — Roger Pack <rogerpack2005@...>

I read this once:

14 messages 2010/06/14

[#364342] Placement of require() and missing symbols — Eric MSP Veith <eveith@...>

-----BEGIN PGP SIGNED MESSAGE-----

16 messages 2010/06/15
[#364365] Re: Placement of require() and missing symbols — Kirk Haines <wyhaines@...> 2010/06/15

On Mon, Jun 14, 2010 at 7:18 PM, Eric MSP Veith

[#364371] datamapper blues — Martin DeMello <martindemello@...>

I'm investigating the use of DataMapper to convert an old project with

14 messages 2010/06/15

[#364402] Getting rid of self — Ralph Shnelvar <ralphs@...32.com>

22 messages 2010/06/16
[#364440] Re: Getting rid of self — Josh Cheek <josh.cheek@...> 2010/06/16

On Wed, Jun 16, 2010 at 4:31 AM, Ralph Shnelvar <ralphs@dos32.com> wrote:

[#364415] Android apps using ruby — Lakshmanan Muthukrishnan <lakshmanan@...>

Hi,

16 messages 2010/06/16
[#364439] Re: Android apps using ruby — Andrew Kaspick <akaspick@...> 2010/06/16

Lakshmanan Muthukrishnan wrote:

[#364479] Re: Android apps using ruby — Lakshmanan Muthukrishnan <lakshmanan@...> 2010/06/17

Andrew Kaspick wrote:

[#364496] nothing new in ruby_core for four days ? — Michel Demazure <michel@...>

The Ruby Core forum has no new entry since four days ago.

15 messages 2010/06/17
[#364498] Re: nothing new in ruby_core for four days ? — Brian Candler <b.candler@...> 2010/06/17

Michel Demazure wrote:

[#364529] Dear gem: still no zlib. — Dave Howell <groups.2009a@...>

I really really regret ever installing SnowLeopard.=20

16 messages 2010/06/17

[#364580] String comparison. Why does Ruby consider this true? — Abder-rahman Ali <abder.rahman.ali@...>

When I try for example to compare the following strings in Ruby, I get

13 messages 2010/06/18
[#364584] Re: String comparison. Why does Ruby consider this true? — Josh Cheek <josh.cheek@...> 2010/06/18

On Fri, Jun 18, 2010 at 12:46 PM, Abder-rahman Ali <

[#364628] Random Points within a Circle (#234) — Daniel Moore <yahivin@...>

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

21 messages 2010/06/19
[#364696] Re: [QUIZ] Random Points within a Circle (#234) — Caleb Clausen <vikkous@...> 2010/06/21

On 6/19/10, Daniel Moore <yahivin@gmail.com> wrote:

[#364641] Namespacing a class — Eric MSP Veith <eveith@...>

-----BEGIN PGP SIGNED MESSAGE-----

18 messages 2010/06/20

[#364815] Count substrings in string, scan too slow — Danny Challis <dannychallis@...>

Hello everyone,

18 messages 2010/06/24
[#364817] Re: Count substrings in string, scan too slow — Jes俍 Gabriel y Gal疣 <jgabrielygalan@...> 2010/06/24

On Thu, Jun 24, 2010 at 5:04 PM, Danny Challis <dannychallis@gmail.com> wro=

[#364825] Re: Count substrings in string, scan too slow — Robert Klemme <shortcutter@...> 2010/06/24

2010/6/24 Jes=FAs Gabriel y Gal=E1n <jgabrielygalan@gmail.com>:

[#364850] Happy Intransition Day! — Ryan Davis <ryand-ruby@...>

Happy Intransition Day!

23 messages 2010/06/24

[#364930] Ruby in JavaScript, all projects are dead? — Alexey Petrushin <axyd80@...>

Hello, recently I've examined some of the projects that trying to bring

18 messages 2010/06/27

[#364988] Reading String Data as a File — Doug Jolley <ddjolley@...>

I use Net::HTTP to collect some data as a string. I now need to pass

25 messages 2010/06/29
[#364989] Re: Reading String Data as a File — Ryan Davis <ryand-ruby@...> 2010/06/29

[#364996] Re: Reading String Data as a File — Brian Candler <b.candler@...> 2010/06/29

Ryan Davis wrote:

[#365016] Re: Reading String Data as a File — Doug Jolley <ddjolley@...> 2010/06/29

> If it takes only a pathname argument, then you're

[#365024] Re: Reading String Data as a File — Tony Arcieri <tony.arcieri@...> 2010/06/29

On Tue, Jun 29, 2010 at 11:50 AM, Doug Jolley <ddjolley@gmail.com> wrote:

[#365036] Re: Reading String Data as a File — Robert Klemme <shortcutter@...> 2010/06/30

2010/6/29 Tony Arcieri <tony.arcieri@medioh.com>:

[#365049] Re: Reading String Data as a File — Brian Candler <b.candler@...> 2010/06/30

Robert Klemme wrote:

[#365039] pathname.rb:270: warning: `*' interpreted as argument prefix — "R.. Kumar 1.9.1 OSX" <sentinel1879@...>

/opt/local/lib/ruby1.9/1.9.1/pathname.rb:270: warning: `*' interpreted

12 messages 2010/06/30
[#365048] Re: pathname.rb:270: warning: `*' interpreted as argument prefix — Josh Cheek <josh.cheek@...> 2010/06/30

On Wed, Jun 30, 2010 at 6:11 AM, R.. Kumar 1.9.1 OSX <sentinel1879@gmail.com

Re: questions of idiom

From: Robert Klemme <shortcutter@...>
Date: 2010-06-07 20:10:10 UTC
List: ruby-talk #363961
On 07.06.2010 21:28, Collins wrote:
> Hello List
>
> I am relatively new to ruby.  I have set myself the problem of writing
> a lexical analyzer in ruby to learn some of it's capabilites.  I have
> pasted the code for that class and for the calling test harness
> below.  I beg the lists indulgence in several ways
>
> 1) has this problem already been solved in a "gem"?  I'd love to see
> how a more sophisticated rubyist solves it

There are certainly parser and lexer generators for Ruby.  I cannot 
remember one off the top of my head but you'll likely find one in RAA:

http://raa.ruby-lang.org/search.rhtml?search=lexer
http://raa.ruby-lang.org/search.rhtml?search=parser

> 2) There is object manipulation with which I'm still not comfortable.
> In particular, in the buffer manipulation code in the method analyze
> makes me unhappy and I'd be happy to receive instructions in a better
> way to do it
> 3) Every lanuage has its idioms.  I'm not at all sure that I'm using
> the best or most "ruby-like" way of doing certain things.  Again I
> welcome suggestions.
>
>    Thanks in advance
>
>      Collins
>
> ##### code snippet 1 ######
> class Rule
>    attr_reader :tokID, :re
>    def initialize(_tokID, _re)
>      @tokID = _tokID
>      @re = _re
>    end
>
>    def to_s
>      self.class.name + ": " + @tokID.to_s + "::= " + @re.to_s
>    end
> end

There are several things in the code above: we use tok_id instead of 
tokID for members and instance variables.  Only classes use CamelCase. 
Also it seems highly uncommon to start identifiers with an underscore. 
An alternative way to create the String would be

def to_s
   "#{self.class.name}: #{@tok_id}::= #{@re}"
end

String interpolation implicitly applies #to_s.

Finally you can define that class in one line:

Rule = Struct.new :tok_id, :re

> class Match
>    attr_reader :rule, :lexeme
>    def initialize(_r, _s)
>      @rule = _r
>      @lexeme = _s
>    end
>
>    def to_s
>      self.class.name + ": " + @rule.to_s + "\nmatches: " + @lexeme.to_s
>    end
> end
>
> class Lexer
>    attr_reader :errString
>    # keep a collection of regular expressions and values to return as
> token
>    # types
>    # then match text to the longest substring yet seen
>    def initialize
>      @rules = Array.new

If you are lazy you can as well do

@rules = []

>      @buff = String.new

We usually simply do

@buff = ""

This also creates a new empty String and is easier to spot.

>      @aFile = nil
>      @errString = nil
>    end
>
>    def addToken (tokID, re)
>      if re.class.name == "String"
>        @rules<<  Rule.new(tokID, Regexp.new(re))
>      elsif re.class.name == "Regexp"
>        @rules<<   Rule.new(tokID, re)
>      else
>        print "unsupported type in addToken: ", re.class.name, "\n"
>      end
>    end

def add_token(tok_id, re)
   @rules << Rule.new(tok_id,
     case re
     when String
       Regexp.new(re)
     when Regexp
       re
     else
       raise ArgumentError, "Neither String nor regexp"
     end)
end

"case" works by using the #=== operator which happens do be defined for 
classes as kind_of? check.  It's also safer to work with class instances 
than with class names.

In error cases we throw exceptions and let the someone up the call chain 
decide how to deal with the error.  In your case you just get output on 
the console which might not be appropriate in all cases.

>    def findMatch
>      maxLexeme, maxMatch = String.new, nil
>      matchCount, rule2 = 0, nil
>      @rules.each { |rule|
>        # loop invariant:
>        #  maxLexeme contains the longest matching prefix of @buff found
> so far,
>        #  matchCount contains the number of rules that have matched
> maxLexeme,
>        #  maxMatch contains the proposed return value
>        #  rule2 contains a subsequent rule that matches maxLexeme
>        #
>        # if rule matches from beginning of @buff AND
>        #    does not match all of @buff AND
>        #    match is longer than previous longest match
>        # then update maxMatch and maxLexeme and matchCount and rule2
>        #
>        # but... we have to avoid matching and keep looking if we make
> it to the
>        # end of @buff with a match active (it could still collect more
>        # characters) OR if more than one match is still active.  If the
> end of
>        # the @buff is also the end of the file then it's ok to match to
> the end
>        #
>        # TODO: think about prepending an anchor to the regexp to
> eliminate the
>        #       false matches (those not to the beginning of the @buff)
>        #
>
>        md = rule.re.match(@buff)
>        if !md.nil?&&  md.pre_match.length == 0
>          if md[0].length == @buff.length&&  !@aFile.eof?
> 	  # @buff is potentially ambiguous and there is more file to parse
> 	  return nil
> 	elsif md[0].length>  maxLexeme.length
> 	  # either matching less than whole buffer or at eof AND
> 	  # match is longer than any prior match
> 	  matchCount, rule2 = 1, nil
> 	  maxLexeme, maxMatch = md[0], Match.new(rule,md[0])
> 	elsif  md[0].length == maxLexeme.length
> 	  # a subsequent match of equal length has been found
> 	  matchCount += 1
> 	  rule2 = rule
> 	else
> 	  # short match... skip
> 	end
>        else
>          # either rule did not match @buff OR
> 	#        rule did not match the start of @buff
>        end
>      }
>      if !maxMatch.nil?&&  matchCount == 1
>        #return an unambiguous match
>        return maxMatch
>      elsif !maxMatch.nil?&&  matchCount>  1
>        print "ambiguous: ", maxLexeme, " : ", maxMatch.rule.to_s, " :
> ",
>               rule2.to_s, "\n"
>        return nil
>      else
>        # no match was found
>        return nil
>      end
>    end

Somehow this method seems a bit lengthy.  I did not look too deep into 
the details but I'd probably pick a different strategy.  First, I'd 
anchor expressions (like you suggested in your comment).  Then I'd just do

matches = {}

@rules.each do |rule|
   m = rule.re.match and matches[rule] = m
end

matches

Now you know that

case matches.size
when 0
   # nothing matches any more, take last match and strip
   # buffer
when 1
   # single match, remember as last match
else
   # many matches, continue
end

If you place that in a loop that adds a character to the buffer at a 
time and then invokes find_match you can do the evaluation like 
indicated above.

>    def analyze
>      aMatch = findMatch
>      if !aMatch.nil?
>        #remove matched text from buff
>        oldBuff = String.new(@buff)
>        newBuff = @buff[aMatch.lexeme.length,@buff.length-1]
>        if oldBuff != aMatch.lexeme + newBuff
>          puts oldBuff
>          puts "compare failure!"
>          puts aMatch.lexeme + newBuff
>        end
>        @buff = newBuff
>      end
>      return aMatch
>    end
>
>    def parseFile(_name)
>      @fileName = _name
>      @aFile = File.new(@fileName, "r")
>      @aFile.each {|line|

Better use the block form of File.open() or use File.foreach.  That way 
you can be sure that the file handle is always properly closed.  See

http://blog.rubybestpractices.com/posts/rklemme/001-Using_blocks_for_Robustness.html

I'd also choose a different reading strategy - one character at a time 
or fixed buffer width.  But I would not read lines

>        # add lines from file to @buff... after each addition yield as
> many
>        # tokens as possible
>        @buff += line

@buff << line

is more efficient.

>        # comsume all the tokens from @buff that can be found... when no
> more
>        # can be found analyze will return nil... so we'll get another
> line
>        aMatch = analyze
>        while !aMatch.nil?
>          # deliver one<token, lexeme pair>  at a time to caller...
> 	# by convention a nil tokID is one about which the caller does not
> 	# care to hear...
>          yield aMatch.rule.tokID, aMatch.lexeme if !
> aMatch.rule.tokID.nil?
> 	aMatch = analyze
>        end
>      }
>      # @buff contains the earliest unmatched text... if @buff is not
> empty when
>      # we finish with the file, this is an error
>      if !@buff.empty?
>        @errString = "error: unmatched text:\n" + @buff[0,[80,
> @buff.length].min]
>        return false
>      else
>        @errStrng =  "no errors detected\n"
>        return true
>      end
>    end
> end
>
> ##### code snippet 2 ######
>
> WhiteSpaceToken = 0
> CommentToken = 1
> QuotedStringToken = 2
> WordToken = 3

I would rather use Symbols as token keys (names).  They are similarly 
efficient but make the constant definitions superfluous.

> require "lexer"
> l = Lexer.new
> l.addToken(nil, Regexp.new("\\s+", Regexp::MULTILINE))
> l.addToken(nil, Regexp.new("#.*[\\n\\r]+"))
> #l.addToken(QuotedStringToken, Regexp.new('["][^"]*["]',
> Regexp::MULTILINE))
> l.addToken(QuotedStringToken,'["]((\\\")|[^\\\"])*"')
> l.addToken(WordToken,Regexp.new("\\w+"))
> foo = l.parseFile("testFile1") { |token, lexeme|
>    print token.to_s + ":" + lexeme.to_s + "\n"
> }
> if foo
>    print "pass!\n"
> else
>    print "fail: " + l.errString + "\n"
> end

There are of course completely different strategies to tackle this. 
Lexers usually are built as a DFA or NFA (like every Regexp 
implementation uses internally).  You would then feed a character at a 
time to the FA and derive token types from states.

Also, another option would be to lump all tokens into a single regular 
expression with group matches for every token type and analysing 
matchign groups, e.g.

re = %r{
     (\s+) # white space
   | ("(?:\\.|[^"\\])*")  # quoted string
}x

File.read(file_name).scan re do |m|
   case
   when $1
     printf "whitespace %p\n", $1
   when $2
     printf "quote %p\n", $2
   else
     raise "Cannot be, scanner error"
   end
end

Kind regards

	robert


-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

In This Thread