[#285488] Zed Shaw - Ruby has dodged a bullet — Chuck Remes <cremes.devlist@...>

Much like watching a car accident in slow motion, I could scarcely

96 messages 2008/01/01

[#285678] Windows Compilation Madness — Gary Wright <gwtmp01@...>

No point in keeping this discussion in the Zed thread...

24 messages 2008/01/02

[#285742] change the symbol in an interpolated string value? — scooterm@...

PROBLEM:

11 messages 2008/01/03

[#285784] Ruby install on Kubuntu Linux - why so spread out — Tom Cloyd <tc@...>

I've migrated from WinXP in recent days, and I'm having trouble getting

19 messages 2008/01/03
[#285786] Re: Ruby install on Kubuntu Linux - why so spread out — Casimir <pikEISPAMMMseli@...> 2008/01/03

Tom Cloyd kirjoitti:

[#285792] Re: Ruby install on Kubuntu Linux - why so spread out — Peter Hickman <peter@...> 2008/01/03

Casimir I'm going to have to call you on this. As you may have just

[#285838] "wrong number of arguments" What? I must be thick or somethi — ole __ <oliver.saunders@...>

class PermutationIterator

10 messages 2008/01/03

[#285873] Time.gm(1969) chokes on Windows — Tim Ferrell <s0nspark@...>

18 messages 2008/01/03

[#285910] Get your hands dirty: Help bootstrap Ruby on Windows. — Luis Lavena <luislavena@...>

This was asked, and asked, and asked too many times, so here we go.

20 messages 2008/01/03

[#285951] Finding if an array contains a data type — Sam Phoneix <dominicjefferies@...>

Say I wanted to see if an array contained a number or not. Would I use

11 messages 2008/01/04

[#285981] How to put a Ruby website online without rails — Softmind Technology <softmindtechnology@...>

Hi,

28 messages 2008/01/04
[#286230] Re: How to put a Ruby website online without rails — James Britt <james.britt@...> 2008/01/05

Giles Bowkett wrote:

[#286252] Re: How to put a Ruby website online without rails — yudi <yudi.xue@...> 2008/01/05

Ramaze looks good, gotta take a look. Thanks :-)

[#286449] Re: How to put a Ruby website online without rails — James Dinkel <jdinkel@...> 2008/01/07

Here is a forum post that tells you how to embed ruby script into

[#285988] problem with sockets — ian@...

Hi

13 messages 2008/01/04

[#286012] Studying Blackjack (#151) — Ruby Quiz <james@...>

The three rules of Ruby Quiz:

42 messages 2008/01/04

[#286056] Re: zed shaw zed shaw zed shaw — "Rick DeNatale" <rick.denatale@...>

On Jan 4, 2008 11:52 AM, Jeremy McAnally <jeremymcanally@gmail.com> wrote:

16 messages 2008/01/04

[#286077] DBI doesn't seem to install correctly on CentOS — Xeno Campanoli <xcampanoli@...>

I tried installing with the old tarfile sequence using setup.rb, as I

14 messages 2008/01/04
[#286079] Re: DBI doesn't seem to install correctly on CentOS — Joshua Schairbaum <joshua.schairbaum@...> 2008/01/04

Try this to use DBI with MySQL:

[#286120] Best Ruby book for experienced programmer — Kamil Chmielewski <kamilski81@...>

Hello,

19 messages 2008/01/04

[#286221] Vintage 0.0.1 - The super slim, micro web framework based on the idea of the old Merb — "Jeremy McAnally" <jeremymcanally@...>

Vintage is a very small web framework based on the original idea of

23 messages 2008/01/05
[#286346] Re: [ANN] Vintage 0.0.1 - The super slim, micro web framework based on the idea of the old Merb — "ara.t.howard" <ara.t.howard@...> 2008/01/06

[#286273] Marshal Pipe — "Carlos J. Hernandez" <carlosjhr64@...>

I've just re-discovered pipes.

16 messages 2008/01/05
[#286515] Re: Marshal Pipe — Eric Hodel <drbrain@...7.net> 2008/01/07

On Jan 5, 2008, at 15:37 PM, Carlos J. Hernandez wrote:

[#286374] DateTime new_offset unexpected results — Greg Go <plant@...>

Hello, everybody:

13 messages 2008/01/06
[#286484] Re: DateTime new_offset unexpected results — "Todd Benson" <caduceass@...> 2008/01/07

On Jan 6, 2008 2:04 PM, Greg Go <plant@ultraspace.com> wrote:

[#286375] Ruby's Kernel::exec (and system and %x) — JJ <jjnoakes@...>

I was reading about Kernel::exec (and the related Kernel::system

29 messages 2008/01/06

[#286473] Where do I put the ".to_f" ? — Peter Bailey <pbailey@...>

Hello,

13 messages 2008/01/07

[#286493] Bacon 0.9, a small RSpec clone — Christian Neukirchen <chneukirchen@...>

Hello,

13 messages 2008/01/07

[#286508] Embedding 1.9 — Dave Thomas <dave@...>

Folks:

18 messages 2008/01/07

[#286558] Thin 0.5.1 LOLCAT released — Marc-andre Marc <macournoyer@...>

Hey all,

13 messages 2008/01/08

[#286656] JRuby 1.1 RC 1 Released — Thomas Enebo <Thomas.Enebo@...>

The JRuby community is pleased to announce the release of JRuby 1.1 RC 1

19 messages 2008/01/08
[#286710] Re: [ANN] JRuby 1.1 RC 1 Released — "Sander Land" <sander.land@...> 2008/01/08

With all the blog posts about JRuby being posted, I thought I would

[#286696] dow ruby's strftime not attempt POSIX-compliance? — Jochen Hayek <jochen+ruby-forum@...>

Why is ruby's core class Time acting like this:

21 messages 2008/01/08
[#286701] Re: dow ruby's strftime not attempt POSIX-compliance? — Suraj Kurapati <snk@...> 2008/01/08

Jochen Hayek wrote:

[#286702] Re: dow ruby's strftime not attempt POSIX-compliance? — Suraj Kurapati <snk@...> 2008/01/08

Suraj Kurapati wrote:

[#286717] Re: does ruby's strftime not attempt POSIX-compliance? — Jochen Hayek <jochen+ruby-forum@...> 2008/01/09

Suraj Kurapati wrote:

[#287161] Re: does ruby's strftime not attempt POSIX-compliance? — Suraj Kurapati <snk@...> 2008/01/12

Jochen Hayek wrote:

[#287217] Re: does ruby's strftime not attempt POSIX-compliance? — Jochen Hayek <jochen+ruby-forum@...> 2008/01/12

Pls let me assure you in the beginning of this note,

[#287219] Re: does ruby's strftime not attempt POSIX-compliance? — Suraj Kurapati <snk@...> 2008/01/12

Jochen Hayek wrote:

[#287344] Re: does ruby's strftime not attempt POSIX-compliance? — Yukihiro Matsumoto <matz@...> 2008/01/14

Hi,

[#286742] why does this code leak? — ara howard <ara.t.howard@...>

cfp2:~ > cat a.rb

35 messages 2008/01/09
[#286991] Re: why does this code leak? — "=?ISO-8859-2?Q?Rados=B3aw_Bu=B3at?=" <radek.bulat@...> 2008/01/10

T24gSmFuIDEwLCAyMDA4IDEwOjE1IFBNLCBSb2JlcnQgRG9iZXIgPHJvYmVydC5kb2JlckBnbWFp

[#287063] Here Document syntax is stringent - trailing blank — Todd Burch <promos@...>

text = <<EOD

12 messages 2008/01/11

[#287067] Need command line to run a file 4 times — jackster the jackle <contact@...>

Hi Ruby Forum...

13 messages 2008/01/11
[#287074] Re: Need command line to run a file 4 times — Xavier Noria <fxn@...> 2008/01/11

On Jan 11, 2008, at 2:47 PM, jackster the jackle wrote:

[#287080] Counting the files in a directory.... — "Kyle Schmitt" <kyleaschmitt@...>

SSdtIHdyaXRpbmcgc29tZSBzY3JpcHRzIHRvIGhlbHAgbWFuYWdlIGEgbWFpbCBzY2FubmVyIHVz

18 messages 2008/01/11

[#287095] IRB Keyboard Input Issues — Oliver Saunders <oliver.saunders@...>

When I press the up arrow in IRB to access history I see this:

13 messages 2008/01/11

[#287188] Stream Parsing with REXML — Marc Hoeppner <marc.hoeppner@...>

Hi (again, sort of) :)

13 messages 2008/01/12

[#287190] Callable class with block — blondinet <jblondinet@...>

Hi everyone,

12 messages 2008/01/12

[#287199] Re: gem build documentation for new gems? — Eric Hodel <drbrain@...7.net>

On Jan 12, 2008, at 07:57 AM, Giles Bowkett wrote:

11 messages 2008/01/12

[#287226] Ruby from source on Leopard excruciatingly slow internet talk — "Chris Shea" <chris@...>

Just a couple of days ago I "upgraded" from Tiger to Leopard, and the

19 messages 2008/01/13

[#287282] regualr expression (need help) — Heinrich Piard <linux@...>

Hi all,

15 messages 2008/01/13

[#287368] regular expression — Johnathan Smith <stu_09@...>

hey

15 messages 2008/01/14
[#287371] Re: regular expression — "Robert Klemme" <shortcutter@...> 2008/01/14

2008/1/14, Johnathan Smith <stu_09@hotmail.com>:

[#287414] any tricks to speed up ruby? — Roger Pack <rogerpack2005@...>

31 messages 2008/01/14
[#287423] Re: any tricks to speed up ruby? — James Britt <james.britt@...> 2008/01/14

Roger Pack wrote:

[#287583] Re: any tricks to speed up ruby? — "M. Edward (Ed) Borasky" <znmeb@...> 2008/01/16

James Britt wrote:

[#287525] Overridding A Method Via A Mixin — Andrew Stewart <boss@...>

Hi Everyone,

17 messages 2008/01/15
[#287526] Re: Overridding A Method Via A Mixin — Robert Klemme <shortcutter@...> 2008/01/15

On 15.01.2008 18:32, Andrew Stewart wrote:

[#287666] ruby annoyances ... mine is line continuation, what's yours? — "scooterm@..." <scooterm@...>

Just thinking how really smart Ruby is in most areas makes it all the

11 messages 2008/01/16

[#287720] FXRuby or Shoes? — Doug Livesey <biot023@...>

Sorry I've posted this in the generic Ruby thread, but I wanted as

14 messages 2008/01/17

[#287735] LDAP Server not connected error — Varun Goel <varun.rajeshkumar@...>

hi all i made authentication function like this

12 messages 2008/01/17
[#287752] Re: LDAP Server not connected error — "Matt Todd" <chiology@...> 2008/01/17

Pluck out the actual LDAP code into IRB and see if it works. I've not

[#287757] Ruby syntax in "respond_to do |format| line -- can clarify? — Joshua Beall <jbeall.ruby@...>

Hi All,

12 messages 2008/01/17

[#287819] passing method references in python & ruby — "rpardee@..." <rpardee@...>

Hey all,

21 messages 2008/01/18
[#287897] Re: passing method references in python & ruby — "rpardee@..." <rpardee@...> 2008/01/18

On Jan 17, 8:12 pm, Justin Collins <justincoll...@ucla.edu> wrote:

[#287914] Re: passing method references in python & ruby — "=?ISO-8859-2?Q?Rados=B3aw_Bu=B3at?=" <radek.bulat@...> 2008/01/18

SW4gcHl0aG9uIHdoZW4geW91IHVzZSBtZXRob2QgbmFtZSB3aXRob3V0IHBhcmVudGhlc2lzZXMg

[#287917] Re: passing method references in python & ruby — Justin Collins <justincollins@...> 2008/01/18

Rados梶w Buウat wrote:

[#287989] Re: passing method references in python & ruby — Robert Klemme <shortcutter@...> 2008/01/19

On 18.01.2008 20:11, Justin Collins wrote:

[#288034] Re: passing method references in python & ruby — Justin Collins <justincollins@...> 2008/01/19

Robert Klemme wrote:

[#287869] Longest Repeated Substring (#153) — Ruby Quiz <james@...>

The three rules of Ruby Quiz:

62 messages 2008/01/18
[#287931] Re: Longest Repeated Substring (#153) — yermej <yermej@...> 2008/01/18

On Jan 18, 2:38 pm, Ken Bloom <kbl...@gmail.com> wrote:

[#287932] Re: Longest Repeated Substring (#153) — "=?ISO-8859-2?Q?Rados=B3aw_Bu=B3at?=" <radek.bulat@...> 2008/01/18

T24gSmFuIDE4LCAyMDA4IDEwOjAwIFBNLCB5ZXJtZWogPHllcm1lakBnbWFpbC5jb20+IHdyb3Rl

[#287895] Thin 0.5.3 Purple Yogurt release — Marc-AndrCournoyer <macournoyer@...>

Hey all,

15 messages 2008/01/18
[#287962] Re: [ANN] Thin 0.5.3 Purple Yogurt release — "s.ross" <cwdinfo@...> 2008/01/19

This has been working so well on my Mac dev machine, I built it out on =20=

[#287973] Re: [ANN] Thin 0.5.3 Purple Yogurt release — Bob Hutchison <hutch@...> 2008/01/19

[#287984] Re: [ANN] Thin 0.5.3 Purple Yogurt release — "s.ross" <cwdinfo@...> 2008/01/19

Got it up and running ... for a while. It's running as root/root and

[#287999] Re: [ANN] Thin 0.5.3 Purple Yogurt release — Marc-AndrCournoyer <macournoyer@...> 2008/01/19

Thin runs in a single thread, no need for a mutex there.

[#287953] Looking for code to determine length of audio files — "Rick DeNatale" <rick.denatale@...>

I'm working on a project which has the requirement to automatically

11 messages 2008/01/19

[#287971] Ruby with Mac OS X — Tj Superfly <nonstickglue@...>

Hello everyone,

22 messages 2008/01/19
[#287976] Re: Ruby with Mac OS X — "Hassan Schroeder" <hassan.schroeder@...> 2008/01/19

On Jan 18, 2008 8:37 PM, Tj Superfly <nonstickglue@verizon.net> wrote:

[#287978] Re: Ruby with Mac OS X — Tj Superfly <nonstickglue@...> 2008/01/19

She can't find the run file.

[#287979] Re: Ruby with Mac OS X — "Hassan Schroeder" <hassan.schroeder@...> 2008/01/19

On Jan 18, 2008 9:25 PM, Tj Superfly <nonstickglue@verizon.net> wrote:

[#287980] Re: Ruby with Mac OS X — Tj Superfly <nonstickglue@...> 2008/01/19

Hassan Schroeder wrote:

[#287985] Re: Ruby with Mac OS X — "s.ross" <cwdinfo@...> 2008/01/19

[#288000] Re: Ruby with Mac OS X — Tj Superfly <nonstickglue@...> 2008/01/19

Okay, we've got programs running!

[#288008] Re: Ruby with Mac OS X — "Windham, Kristopher R." <kriswindham@...> 2008/01/19

not sure what you are trying to do there..

[#288009] Re: Ruby with Mac OS X — Tj Superfly <nonstickglue@...> 2008/01/19

Well, I'm using it with arrays, here's a clip of the code that I'm

[#288133] Announcing Revactor: an Actor model implementation for Ruby 1.9 — "Tony Arcieri" <tony@...>

I'm pleased to announce the initial public availability of Revactor, an

25 messages 2008/01/21

[#288258] why must I initialize this variable? — matt@... (matt neuburg)

Here's a simple variable initialization / scope question. In the

14 messages 2008/01/22

[#288290] Determine where a method is being called from? — Ben Johnson <bjohnson@...>

Is it possible to determine if a public instance method is being called

11 messages 2008/01/22

[#288303] How to improve this code? — Jair Rillo Junior <jrjuniorsp@...>

Hi guys,

16 messages 2008/01/23

[#288308] Learning to build a MUD — Sean Dolbec <helbuns@...>

13 messages 2008/01/23

[#288348] Parsing HTML / following links etc — Dan Cuddeford <dancudds@...>

Hello all,

11 messages 2008/01/23

[#288391] Scripts run using load in "for" loop run out of order — Fa Sidd <siddiqifh@...>

Hi

13 messages 2008/01/23

[#288626] Making Change (#154) — Ruby Quiz <james@...>

The three rules of Ruby Quiz:

106 messages 2008/01/25
[#288653] Re: [QUIZ] Making Change (#154) — Dominik Honnef <dominikho@...> 2008/01/25

On [Sat, 26.01.2008 00:50], Ruby Quiz wrote:

[#288656] Re: [QUIZ] Making Change (#154) — "Jes俍 Gabriel y Gal疣" <jgabrielygalan@...> 2008/01/25

On Jan 25, 2008 9:53 PM, Dominik Honnef <dominikho@gmx.net> wrote:

[#288658] Re: [QUIZ] Making Change (#154) — James Gray <james@...> 2008/01/25

On Jan 25, 2008, at 3:09 PM, Jes=FAs Gabriel y Gal=E1n wrote:

[#288692] regular expression negate a word (not character) — Summercool <Summercoolness@...>

24 messages 2008/01/26

[#288773] Extracting Data from a Webpage — Tj Superfly <nonstickglue@...>

Hello everyone.

17 messages 2008/01/27

[#288847] Treetop parser (or PEG in general?) questions — Phrogz <phrogz@...>

I've been looking for something like treetop for a while now. Very

19 messages 2008/01/28

[#288916] Final Two Quizzes — James Gray <james@...>

As most of you know, we have two Ruby Quiz problems left (http://groups.google.com/group/comp.lang.ruby/msg/6f46393932c22e49

32 messages 2008/01/28
[#288978] Re: Final Two Quizzes — fedzor <fedzor@...> 2008/01/28

[#288980] Re: Final Two Quizzes — James Gray <james@...> 2008/01/28

On Jan 28, 2008, at 5:38 PM, fedzor wrote:

[#289016] Re: Final Two Quizzes — fedzor <fedzor@...> 2008/01/29

[#289026] Re: Final Two Quizzes — James Gray <james@...> 2008/01/29

On Jan 29, 2008, at 6:33 AM, fedzor wrote:

[#289048] Re: Final Two Quizzes — "Thomas Wieczorek" <wieczo.yo@...> 2008/01/29

Thank you for the many quizzes. I came to Ruby through a blog entry

[#289051] Re: Final Two Quizzes — James Gray <james@...> 2008/01/29

On Jan 29, 2008, at 3:23 PM, Thomas Wieczorek wrote:

[#289057] Re: Final Two Quizzes — Dominik Honnef <dominikho@...> 2008/01/29

On [Wed, 30.01.2008 06:34], James Gray wrote:

[#288973] Treetop Email Parser — Phrogz <phrogz@...>

(I would post this to the treetop mailing list...except there doesn't

14 messages 2008/01/28

[#289018] Proposal/RFQ: Hash#values/keys with block — "Dirk Traulsen" <dirk.traulsen@...>

Hi!

10 messages 2008/01/29

[#289046] newbie file write problem — Lars Zeb <larzeb@...>

This is my first attempt at ruby. I've written a class (SicCode) which

15 messages 2008/01/29

[#289060] For Loop question — Mark Mr <pimea.mark@...>

ok basically i cant quite figure out how to do a for loop i want in

13 messages 2008/01/29

[#289196] counting the number of repititions in an array — Adam Akhtar <adamtemporary@...>

Hi if i want to count the number of times values are repeated in an

10 messages 2008/01/30

[#289206] Ruby Puzzle Challenge — Wyatt Greene <greenewm@...>

Write a Ruby program that prints out the numbers 1 to 100, one number

14 messages 2008/01/30

[#289210] Learn how to program — "Regnum@... Regnum@..." <regnum@...>

Hi!, i want to learn how to program.Is ruby a good option?. Thanks.

30 messages 2008/01/30
[#289351] Re: Learn how to program — longint <michael.mello@...> 2008/01/31

On Jan 30, 4:30=A0pm, "Reg...@argentina.com Reg...@argentina.com"

[#289401] Re: Learn how to program — "Michael Bevilacqua-Linn" <michael.bevilacqualinn@...> 2008/02/01

IMHO,

[#289286] can ik make this more beautifull? — Remco Hh <remco@...>

nowadays i try to improve my coding style to produce nicer and beter

17 messages 2008/01/31

[#289355] Gedankenexperiment on method duck type safety — Tim Connor <timocratic@...>

*braces for the flames to follow*

13 messages 2008/01/31

[#289364] GUI for a newbie — Matthew Borgeson <hibridmatthias@...>

Hello All-

28 messages 2008/01/31

[#289375] how do you install a local gem? — 7stud -- <bbxx789_05ss@...>

Hi,

20 messages 2008/01/31
[#289378] Re: how do you install a local gem? — Stefano Crocco <stefano.crocco@...> 2008/01/31

Alle Thursday 31 January 2008, 7stud -- ha scritto:

[#289386] Re: how do you install a local gem? — 7stud -- <bbxx789_05ss@...> 2008/01/31

Stefano Crocco wrote:

[#289388] Re: how do you install a local gem? — Stefano Crocco <stefano.crocco@...> 2008/01/31

Alle Thursday 31 January 2008, 7stud -- ha scritto:

[#289398] Re: how do you install a local gem? — 7stud -- <bbxx789_05ss@...> 2008/02/01

Stefano Crocco wrote:

[SUMMARY] Longest Repeated Substring (#153)

From: Ruby Quiz <james@...>
Date: 2008-01-24 16:11:46 UTC
List: ruby-talk #288534
I first saw this problem years ago.  It was one of the Perl's Quiz of the Week
problems.  I remember being surprised at how the solution was sort of
counter-intuitive to me.  I mean that my first thought was:  start at half the
string size, try to find a string of that length that occurs twice in the full
text, reduce the length by one character and recheck until you find a match. 
Yoan Blanc coded that strategy up for us:

	text = STDIN.read
	
	(text.length/2).downto 1 do |l|
		match = Regexp.new("(.{#{l}})\\1").match(text)
		if match
			puts text[match.offset(1)[0]..(match.offset(1)[1]-1)]
			break
		end
	end

Technically the regular expression engine will choke on this solution for a
significantly large text.  The reason is that quantifiers in {…} limits are
only allowed to be so large.  The theory is as I described though.

However, even if it could match any size, this solution turns out to be far too
slow to be practical.  The reason is that given a megabyte or more of text, the
longest repeated substring is typically less than 200 bytes.  That means you do
a whole lot of wasted checks before you are even in the range that a match can
occur.

Yoan realized this and submitted a second solution that flipped the search
around.  Starting with a match of zero characters and finding the next biggest
puts you in the right neighborhood from the get go.

The other key realized in Yoan's second solution is that if we know there is a
four character repeated substring at some index, we can also count on the fact
that there are one, two, and three character repeated substrings at the same
index.  Given that, you can keep checking at the same index until you don't find
a match for a given size.  The size found just before that was the biggest at
that index.

The Perl guys found further means to optimize this approach, by trimming the
search space.  However, this too breaks down in the face of significantly large
inputs.  That's why we have suffix trees.

The idea of a suffix tree is to build a list of every suffix that occurs in the
text.  Thus the quiz example "banana" breaks down to:

	banana
	 anana
	  nana
	   ana
	    na
	     a

That may not look like a lot of help yet, but watch what happens if we sort the
entries:

	a
	ana
	anana
	banana
	na
	nana

See how the common prefixes group together?  We can now compare adjacent entries
of our suffix tree for prefixes they have in common.  In this case, the second
and third element share an "an" and that's one of the possible answers.  Note
that the second and third share "ana" as well, but selecting that one causes
overlap:

	[ana]na
	  [ana]

There's a catch though.  Consider this example input from Eric I.:  ananana. 
The suffix tree is:

	ananana
	 nanana
	  anana
	   nana
	    ana
	     na
	      a

That sorts into:

	a
	ana
	anana
	ananana
	na
	nana
	nanana

In this case comparing just the second and third entries gives us "an", because
"ana" would overlap.  But, if you compare the second and fourth entries, "ana"
becomes a valid answer.  That tells us that it's sometimes necessary to look
ahead more than just one step.

Enough explaining, let's read some code.  I'm going to show Eric's solution
below, because it was very well commented and that helped me understand it. 
However, I'm going to remove those comment in the following listings in the
interests of space.  I also made a couple of tiny edits that I felt simplified
the code a touch.

First we have a simple helper method:

	def longest_common_prefix(s1, s2, max = nil)
	 min = [s1.size, s2.size, max].compact.min
	 min.times do |i|
	   return s1.slice(0, i) if s1[i] != s2[i]
	 end
	 return s1.slice(0, min)
	end
	
	# ...

Given two Strings, this code will find the longest prefix they share.  It begins
by finding the smallest length among the passed Strings and an optional provided
maximum.  It then walks those indexes looking for mismatched characters.

When it finds a mismatch, it returns what came before.  How it does that is a
little clever though, since it seems to use the same numbers.  Consider that we
are comparing "abc" and "abd" on the third character.  That means i = 2 and
s1[i] != s2[i].  That will cause the code to return s1.slice(0, i), but remember
that i is used as a length there, not an index.  That's why we get the first two
characters back ("ab").

If no mismatches arise during the search, they match all the way to our limit
and that is returned.

The next method is where all the action is and it's a doozy, so we will take it
in pieces.  The first chunk builds and sorts the suffix tree:

	# ...
	
	def longest_repeated_substring(string)
	 size = string.length
	
	 suffixes = Array.new(size)
	 size.times do |i|
	   suffixes[i] = string.slice(i, size)
	 end
	
	 suffixes.sort!
	 
	 # ...

This code might seem wildly inefficient (memory-wise) at first glance, but Ruby
will surprise you here.  When you slice() a substring like this, Ruby cheats and
doesn't actually make a copy of the text.  Instead, the new object is a pointer
into the old object.  Now, if you change either String down the road, Ruby must
and will finish the job, turning it into a full copy.  We call this behavior
"Copy on Write."  Since we won't be changing these Strings though, just
examining them, we're safe with the tiny pointers for the duration.

Note that the size variable is always used as the substring length here, though
it's too long in all but the first case.  Ruby will stop at the end of the
String even if you provide a longer length, so it does the same thing and avoids
unneeded math or Range object construction.

OK, let's begin the search through the tree:

	 # ...
	 
	 best = ""
	 at_least_size = 1    # the size to meet or exceed to be the new best
	 distance = nil
	 neighbors_to_check = 1
	
	 (1...size).each do |i|
	   s1 = suffixes[i]
	
	   neighbors_to_check.downto(1) do |neighbor|
	     s2 = suffixes[i - neighbor]
	
	     distance = (s1.size - s2.size).abs
	     if distance < at_least_size
	       if s1.size >= at_least_size &&
	           s2.size >= at_least_size &&
	           s1.slice(0, at_least_size) == s2.slice(0, at_least_size)
	         neighbors_to_check = [neighbors_to_check, neighbor + 1].max
	       else
	         neighbors_to_check = neighbor
	       end
	       next
	     end
	     
	     # ...

First, some variables are set to hold the best match so far, the size a match
would need to be to beat it, the distance between the substrings, and how far
down the tree we need to search.  After, that we enter a simple loop trying each
suffix.

The neighbors_to_check iteration is our look ahead for similar suffixes that
share our prefix.  We usually only need to look one step ahead, but these checks
watch for overlap cases and extends our search when needed.

Note that we figure a distance between substrings here, even though we didn't
remember where they started.  That's possible because all suffixes are anchored
to the far edge of our original text.  Knowing that, comparing lengths is the
same as comparing starting indexes.

A distance below our needed match size warns us that there is overlap and we may
need to look further ahead in this case.  The if conditions just ensure the
Strings are of the length we should even consider and that they match at least
that much.  If all of that turns out to be true, our search ahead is extended.

Now we are ready to do the actual comparisons:

	     # ...
	     
	     unless s1.slice(0, at_least_size) == s2.slice(0, at_least_size)
	       neighbors_to_check = neighbor
	       next
	     end
	
	     best = longest_common_prefix(s1, s2, distance)
	     at_least_size = best.size + 1
	     if best.size == distance
	       neighbors_to_check = [neighbors_to_check, neighbor + 1].max
	     else
	       neighbors_to_check = neighbor
	     end
	   end
	 end
	
	 best
	end
	
	# ...

The unless check we see here is just a short circuit optimization.  There's no
need to check for common prefixes if the Strings aren't a match at least as long
as our current best.

If we've made it this far, we know the current two Strings share a prefix that
meets or exceeds our current best.  A hand-off is made to
longest_common_prefix() to grab our new best and the rest of the iterator resets
the search parameters.

The best we found in the entire search is then returned as the solution.

Here's the interface code that wraps this method:

	# ...
	
	if $0 == __FILE__
	 string = nil
	 if ARGV[0] == "-f"
	   string = File.read(ARGV[1])
	 elsif ARGV.size == 0
	   string = STDIN.read
	 elsif ARGV[0] =~ /^-/ || ARGV.size > 1
	   STDERR.puts "usage:"
	   STDERR.puts "    #{$0} (note: input comes from standard input)"
	   STDERR.puts "    #{$0} string"
	   STDERR.puts "    #{$0} -f filename"
	   exit
	 else
	   string = ARGV[0]
	 end
	
	 result = longest_repeated_substring(string)
	 puts %Q{"#{result}" (#{result.length} characters)}
	end

Most of the above code just sorts out the command-line arguments.  The checks
determine whether to read the text from a file, STDIN, or the arguments
themselves.

The last two lines call the workhorse method we just finished examining and
print details about the best match found.

My thanks to all who stretched my brain with all of the excellent discussion
about this week's problem.

Tomorrow we will try Ruby's hand at coin counting...

In This Thread

Prev Next