[#223105] ruby programming best practice — "Shannon Fang" <xrfang@...>

As a dynamic language, Ruby is much more flexible and easier than other

18 messages 2006/11/01

[#223126] variable pointer — "akbarhome" <akbarhome@...>

@c = "donal"

17 messages 2006/11/01

[#223211] file size revisit — python152@...

Hi, folks

17 messages 2006/11/02

[#223299] Just a question to throw out there... — "Skotty" <shyguyfrenzy@...>

Another noobrube question.

23 messages 2006/11/02

[#223398] Output not clear — "Learning Ruby" <learningruby@...>

I am a newbie to Ruby and the output of the following program is not clear

14 messages 2006/11/03

[#223425] Bytecode Compiler (#100) — Ruby Quiz <james@...>

The three rules of Ruby Quiz:

27 messages 2006/11/03

[#223458] REXML ... performance & memory usage ... — Jeff Wood <jeff@...>

Wow ... I am trying to use REXML to parse through an 8.8Mb xml file ...

14 messages 2006/11/03

[#223653] Book wanted: Metaprogramming in Ruby — Jay Levitt <jay+news@...>

Now that Hal, David B, Curt, and others have some spare time:

25 messages 2006/11/06

[#223736] REXML — "pdg" <pgattphoto@...>

Hi All,

22 messages 2006/11/06

[#223831] the name of Matz — Byung-Hee HWANG <bh@...>

Hello,

51 messages 2006/11/07
[#223839] Re: [OT] the name of Matz — Yukihiro Matsumoto <matz@...> 2006/11/07

Hi,

[#224216] Re: [OT] the name of Matz — Byung-Hee HWANG <bh@...> 2006/11/09

Yukihiro Matsumoto wrote:

[#223975] Re: [OT] the name of Matz — Devin Mullins <twifkak@...> 2006/11/08

Yukihiro Matsumoto wrote:

[#224630] Re: the name of Matz — "Ryo" <furufuru@...> 2006/11/12

Yukihiro Matsumoto wrote:

[#224645] Re: the name of Matz — "Robert Dober" <robert.dober@...> 2006/11/12

On 11/12/06, Ryo <furufuru@ccsr.u-tokyo.ac.jp> wrote:

[#242731] Re: the name of Matz — Harry <ruby.hardware@...> 2007/03/09

> It might be fun though if you could give a pointer to the "correct"

[#223846] How to make a cycling counter from commandline? — "darenbell@..." <darenbell@...>

Hi, I'm looking for a way to implement this idea:

12 messages 2006/11/07

[#223930] Two way communication with the command shell (IO.popen?) — James Smith <jmdjmsmith@...>

19 messages 2006/11/08
[#223943] Re: Two way communication with the command shell (IO.popen?) — ara.t.howard@... 2006/11/08

On Wed, 8 Nov 2006, James Smith wrote:

[#223997] Re: Two way communication with the command shell (IO.popen?) — James Smith <jmdjmsmith@...> 2006/11/08

unknown wrote:

[#224012] Re: Two way communication with the command shell (IO.popen?) — ara.t.howard@... 2006/11/08

On Wed, 8 Nov 2006, James Smith wrote:

[#224327] Re: Two way communication with the command shell (IO.popen?) — James Smith <jmdjmsmith@...> 2006/11/10

unknown wrote:

[#224690] Re: testing whether a process has completed.. — James Smith <jmdjmsmith@...> 2006/11/12

OK, keeping it simple I am basically using the following code:

[#224691] Re: testing whether a process has completed.. — "Patrick Hurley" <phurley@...> 2006/11/12

On 11/12/06, James Smith <jmdjmsmith@msn.com> wrote:

[#223953] Why create web servers? — "CatLady []" <totalharmonicdistortion@...>

Hi,

16 messages 2006/11/08

[#224002] FastRI 0.1.0: faster, smarter RI docs for Ruby, DRb-enabled — Mauricio Fernandez <mfp@...>

FastRI 0.1.0: faster, smarter RI docs for Ruby, DRb-enabled

27 messages 2006/11/08

[#224013] #returning and #tap — "Trans" <transfire@...>

Had use for this today: #returning is a convenience method you'll find

57 messages 2006/11/08
[#225210] Re: #returning and #tap — Eric Hodel <drbrain@...7.net> 2006/11/15

On Nov 8, 2006, at 6:40 AM, Trans wrote:

[#225233] Re: #returning and #tap — Joel VanderWerf <vjoel@...> 2006/11/16

Eric Hodel wrote:

[#225358] Re: #returning and #tap — Eric Hodel <drbrain@...7.net> 2006/11/16

On Nov 15, 2006, at 5:40 PM, Joel VanderWerf wrote:

[#225370] Re: #returning and #tap — ara.t.howard@... 2006/11/16

On Fri, 17 Nov 2006, Eric Hodel wrote:

[#225382] Re: #returning and #tap — Joel VanderWerf <vjoel@...> 2006/11/16

ara.t.howard@noaa.gov wrote:

[#225385] Re: #returning and #tap — dblack@... 2006/11/16

Hi --

[#225388] Re: #returning and #tap — Joel VanderWerf <vjoel@...> 2006/11/16

dblack@wobblini.net wrote:

[#225393] Re: #returning and #tap — dblack@... 2006/11/16

Hi --

[#225399] Re: #returning and #tap — Joel VanderWerf <vjoel@...> 2006/11/16

dblack@wobblini.net wrote:

[#225420] Re: #returning and #tap — dblack@... 2006/11/16

Hi --

[#225476] Re: #returning and #tap — "Martin DeMello" <martindemello@...> 2006/11/17

On 11/17/06, dblack@wobblini.net <dblack@wobblini.net> wrote:

[#225488] Re: #returning and #tap — dblack@... 2006/11/17

Hi --

[#225494] Re: #returning and #tap — spooq <spoooq@...> 2006/11/17

I definitely think of it as tapping a phone line.

[#225495] Re: #returning and #tap — spooq <spoooq@...> 2006/11/17

Actually, how about giving the proc a copy of the object, rather than

[#224039] Proc as Observer — "Tim Pease" <tim.pease@...>

Working with an Observable object, I wanted to be able to add a Proc

20 messages 2006/11/08
[#224061] Re: Proc as Observer — "Trans" <transfire@...> 2006/11/08

[#224040] Simple Math Problem — Thom Loring <tloring@...>

Can anyone shed some light on a simple math problem I have encountered?

14 messages 2006/11/08

[#224087] The Ruby Way review on Slashdot — Timothy Hunter <TimHunter@...>

Whoo-hoo! My review of Hal Fulton's _The_Ruby_Way,_Second_Edition_ is on

17 messages 2006/11/08

[#224157] thousand ways to rome — Chris Mueller <damngoodcoffee@...>

Hi,

17 messages 2006/11/09

[#224246] Overwriting the Integer class for method succ! (instead of just succ) — "paul" <pjvleeuwen@...>

Hi all,

11 messages 2006/11/09

[#224331] Rails vs. Asp.Net politics — "Leslie Viljoen" <leslieviljoen@...>

I have the deciding vote in a new (rather large) web app we need to

28 messages 2006/11/10

[#224352] VCR Program Manager (#101) — Ruby Quiz <james@...>

The three rules of Ruby Quiz:

13 messages 2006/11/10

[#224398] looking for some feedback about Certification — "pat eyler" <pat.eyler@...>

Aaah, nothing like a good controversial topic to stir up a holy war

38 messages 2006/11/10
[#224401] Re: looking for some feedback about Certification — Gustav Paul <gustav@...> 2006/11/10

pat eyler wrote:

[#224439] Re: looking for some feedback about Certification — dblack@... 2006/11/11

Hi --

[#224411] turn 0.1.0 Released — "Tim Pease" <tim.pease@...>

turn version 0.1.0 has been released!

18 messages 2006/11/10

[#224532] McGovern Likes JRuby... — Charles Oliver Nutter <charles.nutter@...>

I'm not sure how to feel about this one :)

26 messages 2006/11/11
[#224570] Re: McGovern Likes JRuby... — "M. Edward (Ed) Borasky" <znmeb@...> 2006/11/11

Charles Oliver Nutter wrote:

[#224574] Re: McGovern Likes JRuby... — David Vallner <david@...> 2006/11/11

M. Edward (Ed) Borasky wrote:

[#224539] Ruby GUI with IDE — "Josh Mr." <kamipride102@...>

Hello all,

33 messages 2006/11/11
[#224543] Re: Ruby GUI with IDE — "M. Edward (Ed) Borasky" <znmeb@...> 2006/11/11

Josh Mr. wrote:

[#224546] Re: Ruby GUI with IDE — AliasX Neo <kamipride102@...> 2006/11/11

M. Edward (Ed) Borasky wrote:

[#224554] Re: Ruby GUI with IDE — David Vallner <david@...> 2006/11/11

AliasX Neo wrote:

[#224569] Re: Ruby GUI with IDE — "M. Edward (Ed) Borasky" <znmeb@...> 2006/11/11

David Vallner wrote:

[#224577] Re: Ruby GUI with IDE — Caleb Tennis <caleb@...> 2006/11/11

>>

[#224578] Re: Ruby GUI with IDE — AliasX Neo <kamipride102@...> 2006/11/11

So I guess a better format for my original question should be:

[#224580] Re: Ruby GUI with IDE — David Vallner <david@...> 2006/11/11

AliasX Neo wrote:

[#224639] regular expression too big — Peter Schrammel <peter.schrammel@...>

Hi,

31 messages 2006/11/12

[#224665] Help convert a Perl user to the Ruby Way. — Sebastian Reid <seb@...>

Hi all.

13 messages 2006/11/12

[#224777] Nitro + Og 0.40.0 — "George Moschovitis" <george.moschovitis@...>

Hello everyone,

17 messages 2006/11/13

[#224817] directory_watcher 0.1.1 — "Tim Pease" <tim.pease@...>

A class for watching files within a directory and generating events

16 messages 2006/11/13
[#224838] Re: directory_watcher 0.1.1 — "Kenosis" <kenosis@...> 2006/11/13

[#224839] Re: directory_watcher 0.1.1 — "Tim Pease" <tim.pease@...> 2006/11/13

On 11/13/06, Kenosis <kenosis@gmail.com> wrote:

[#224933] ruby indentantion — Alfonso <euoar@...>

I have just started with ruby, and something that I have observed is

23 messages 2006/11/14

[#224949] Is 2.0 Integer or Float? — "S. Robert James" <srobertjames@...>

I'd like to be able to do:

18 messages 2006/11/14

[#224997] Assoc method on large array — "gregarican" <greg.kujawa@...>

I am trying to invoke the assoc method on a large array. It seems to

13 messages 2006/11/14

[#225069] Design problem with 'inject' — Gary Boone <dr@...>

20 messages 2006/11/15

[#225109] FastRI 0.2.0: full-text searching, smarter search strategies — Mauricio Fernandez <mfp@...>

FastRI is an alternative to the ri command-line tool. It is *much* faster, and

9 messages 2006/11/15

[#225179] *Fast* way to process large files line by line — Devesh Agrawal <dagrawal@...>

Hi Folks,

20 messages 2006/11/15

[#225288] Re: parse xml file, put results in mysql db — "seb@..." <seb@...>

--- Kathy Simmons <kathys39@hotmail.com> wrote:

15 messages 2006/11/16
[#225291] Re: parse xml file, put results in mysql db — Jon Egil Strand <jes@...> 2006/11/16

>

[#225296] Re: parse xml file, put results in mysql db — Mike Fletcher <lemurific+rforum@...> 2006/11/16

Jon Egil Strand wrote:

[#225330] Re: parse xml file, put results in mysql db — Kathy Simmons <kathys39@...> 2006/11/16

Here's the full code - I'm reading in nmap output in scanfile.xml and

[#225379] IHelp 0.4.0 - full text search — "Ilmari Heikkinen" <ilmari.heikkinen@...>

View and search object documentation from irb.

13 messages 2006/11/16
[#225383] Re: [ANN] IHelp 0.4.0 - full text search — Parragh Szabolcs <parragh@...> 2006/11/16

Ilmari Heikkinen 叝ta:

[#225398] Re: [ANN] IHelp 0.4.0 - full text search — "Ilmari Heikkinen" <ilmari.heikkinen@...> 2006/11/16

Hi,

[#225412] Re: [ANN] IHelp 0.4.0 - full text search — "Ilmari Heikkinen" <ilmari.heikkinen@...> 2006/11/16

> Thanks for noticing this, should be fixed in 0.4.1.

[#225470] Re: [ANN] IHelp 0.4.0 - full text search — Parragh Szabolcs <parragh@...> 2006/11/17

Ilmari Heikkinen 叝ta:

[#225512] Literate Ruby (#102) — Ruby Quiz <james@...>

The three rules of Ruby Quiz:

12 messages 2006/11/17

[#225547] ruby equivalent PHP function is_numeric? — Josselin <josselin@...>

After reading completely my Ruby book, I cannot find a function

15 messages 2006/11/17

[#225681] Ruby vs Java vs c++ — n/a <na@...>

hi, newbie so please be tolerant.... ;)

117 messages 2006/11/18

[#225754] Ruby screen scraping — Chris Gallagher <cgallagher@...>

Hi,

28 messages 2006/11/19

[#225909] Create array of hash values — David Lelong <drlelon@...>

Hi,

13 messages 2006/11/20

[#226023] Bug in ruby? — AliasX Neo <kamipride102@...>

Well, I've spent the last hour or so debugging one of the stupidest

31 messages 2006/11/21

[#226029] array question — Li Chen <chen_li3@...>

Hi all,

41 messages 2006/11/21
[#226031] Re: array question — "Wilson Bilkovich" <wilsonb@...> 2006/11/21

On 11/20/06, Li Chen <chen_li3@yahoo.com> wrote:

[#226120] Hpricot/Rubyful Soup comparison — Wes Gamble <weyus@...>

Has anyone done a head to head comparison of Hpricot and Rubyful Soup

19 messages 2006/11/21

[#226168] New RCRchive, including new process — dblack@...

Hi everyone --

35 messages 2006/11/22

[#226210] invoke system command from within a method — Moritz Reiter <mreiter@...>

-----BEGIN PGP SIGNED MESSAGE-----

11 messages 2006/11/22

[#226228] how do I contribute to Ruby? — "Giles Bowkett" <gilesb@...>

check this out, this is the whiniest change ever, but what I want is

15 messages 2006/11/22

[#226262] Rubyish inst.var initializations — "Victor \"Zverok\" Shepelev" <vshepelev@...>

Hi all.

12 messages 2006/11/23

[#226263] Compare Array Values? — "Daniel N" <has.sox@...>

I want to check to see if two arrays contain the same values.

30 messages 2006/11/23

[#226388] Anyone else getting weird flickr errors? — "Gregory Brown" <gregory.t.brown@...>

When I post to RubyTalk, I've been getting a 'your photo upload

14 messages 2006/11/24

[#226484] Is there a simply way to get every method log itself before running? — "Richard" <RichardDummyMailbox58407@...>

Hi,

11 messages 2006/11/24

[#226537] DictionaryMatcher (#103) — Ruby Quiz <james@...>

The three rules of Ruby Quiz:

18 messages 2006/11/24

[#226553] Ruby/Python/REXX as a MUCK scripting language — Tony Belding <zobeid@...>

I'm interested in using an off-the-shelf interpreted language as a

18 messages 2006/11/25

[#226608] coding practise — sempsteen <sempsteen@...>

Hi all,

23 messages 2006/11/25

[#226707] Ruby/Rails on Gumstix — "M. Edward (Ed) Borasky" <znmeb@...>

For the past couple of weeks, I've been playing around with Ruby on a

16 messages 2006/11/26
[#226751] Re: Ruby/Rails on Gumstix — "Giles Bowkett" <gilesb@...> 2006/11/26

On 11/25/06, M. Edward (Ed) Borasky <znmeb@cesmail.net> wrote:

[#226709] Timestamp — Srinivas Sa <sr.sakhamuri@...>

How do i add two time stamps

23 messages 2006/11/26

[#226731] find index of first non zeo value in array — Josselin <josselin@...>

with :

24 messages 2006/11/26
[#226733] Re: find index of first non zeo value in array — Olivier <o.renaud@...> 2006/11/26

Le dimanche 26 novembre 2006 15:00, Josselin a 馗rit

[#226783] Two Advanced Ruby Performance Questions — Sunny Hirai <sunny@...>

First, I am a Ruby newbie but am an experienced developer of highly

27 messages 2006/11/26
[#226816] Re: Two Advanced Ruby Performance Questions — Edwin Fine <efine145-nospam01@...> 2006/11/26

This post may be stating the obvious, but here goes anyway... I hope I

[#226792] Extremely Noobish Documentation Question — Paco Paco <mepaco@...>

Hello all,

16 messages 2006/11/26

[#226806] Re: ruby and list comprehension — James Cunningham <jameshcunningham@...>

On 2006-11-25 18:47:26 -0500, Brad Tilley <rtilley@vt.edu> said:

12 messages 2006/11/26

[#227012] Is ruby a viable corporate alternative? — "Mr P" <MisterPerl@...>

Our team uses Perl for almost 100% of our projects, as we have for the

27 messages 2006/11/28

[#227041] FileUtils.touch doesn't work — Jeff Toth <jeff@...>

Why won't Ruby just install from the port? I don't know what Ruby is,

12 messages 2006/11/28

[#227108] Simple screen scraper using scrAPI — "doog" <doog@...>

I'm a Ruby novice. Does anyone have an example of a simple screen

14 messages 2006/11/28

[#227160] cidr.rb: port of Perl's Net::CIDR v0.11 available — Jos Backus <jos@...>

Module:

17 messages 2006/11/29

[#227198] Splitting a CSV file into 40,000 line chunks — Drew Olson <olsonas@...>

All -

40 messages 2006/11/29
[#227243] Re: Splitting a CSV file into 40,000 line chunks — James Edward Gray II <james@...> 2006/11/29

On Nov 29, 2006, at 9:32 AM, Drew Olson wrote:

[#227255] Re: Splitting a CSV file into 40,000 line chunks — Drew Olson <olsonas@...> 2006/11/29

Thanks for all the responses. As noted in a post above, I am trying to

[#227219] Need a range, but not getting it. . . . — Peter Bailey <pbailey@...>

Hello,

33 messages 2006/11/29

[#227282] creating directory "http://example.com" — Comfort Eagle <steve@...>

How do I create a directory 'http://example.com' without it getting

16 messages 2006/11/29

[#227302] Wrong results using named arguments — "Jason Vogel" <jasonvogel@...>

Source:

12 messages 2006/11/29

[#227336] Overwhelmed by emails — Daniel DeLorme <dan-ml@...42.com>

This list has way too many messages for the amount of free time I have. Does

12 messages 2006/11/30

[#227388] Timers, scheduling and Ruby — Damphyr <damphyr@...>

Ok, since the original post migh just appear in a month's time, lets

24 messages 2006/11/30
[#227404] Re: Timers, scheduling and Ruby — James Edward Gray II <james@...> 2006/11/30

On Nov 30, 2006, at 7:51 AM, Damphyr wrote:

[#227414] Re: Timers, scheduling and Ruby — ara.t.howard@... 2006/11/30

On Fri, 1 Dec 2006, James Edward Gray II wrote:

[#227416] Re: Timers, scheduling and Ruby — James Edward Gray II <james@...> 2006/11/30

On Nov 30, 2006, at 11:47 AM, ara.t.howard@noaa.gov wrote:

[#227461] Re: Timers, scheduling and Ruby — ara.t.howard@... 2006/11/30

On Fri, 1 Dec 2006, James Edward Gray II wrote:

[#227402] Segmentation fault, proc, eval, long string — Bob Hutchison <hutch@...>

Hi,

27 messages 2006/11/30
[#227415] Re: Segmentation fault, proc, eval, long string [Reproduced] — Bob Hutchison <hutch@...> 2006/11/30

A little more on this...

[#227569] Re: Segmentation fault, proc, eval, long string [Reproduced] — Pit Capitain <pit@...> 2006/12/01

Bob Hutchison schrieb:

[#227426] simple question, looping through each character in a string — "warhero" <beingthexemplarylists@...>

how can I accomplish something like this in ruby:

17 messages 2006/11/30
[#227438] Re: simple question, looping through each character in a string — dblack@... 2006/11/30

Hi --

[#227458] Wisdom of including Rakefile in releases — "Trans" <transfire@...>

I was poking around in the /usr/lib/ruby/gems directory today and

13 messages 2006/11/30

Re: Ruby screen scraping

From: Paul Lutus <nospam@...>
Date: 2006-11-20 08:45:09 UTC
List: ruby-talk #225848
James Edward Gray II wrote:

> On Nov 19, 2006, at 8:50 PM, Paul Lutus wrote:
> 
>> Chris Gallagher wrote:
>>
>>> OK that code all works great but i have one last question :)

/ ...

>>>> My question is, how would i modify the
>>> code in order to get it to capture say a block of text such as:
>>>
>>>  <p>this is text that i want to scrape</p>
>>>
>>> any ideas?
>>
>> Really simple:
>>
>> array = page_content.scan(%r{<p>(.*?)</p>}m).flatten
>>
>> Returns an array, each cell of which is a paragraph from the
>> original page.

/ ...

>> In Ruby, writing normal code is so easy that the traditional cautions
>> against adopting miraculous libraries should be amplified tenfold.
> 
> I hope you're not arguing that HTML should be parsed with simple
> regular expression instead of a real parser.  I think most would
> agree with me when I say that strategy seldom holds up for long.

That depends on the complexity of the problem to be solved, and the
reliability of the source page's HTML formatting.

For a page that can pass validation of one kind or another or that is XHTML,
the simplest kinds of parsers provide terrific results. For legacy pages
and those that can be expected to have "relaxed" syntax, more robust
parsers are required.

But I must say I regularly see requests here for parsers that can be
expected to do anything, but often as not and IMHO, such a library
represents too much complexity for the majority of routine HTML/XML parsing
tasks with Web pages and documents that are often generated, not
hand-written.

This thread is an example. Beginning with the generic equivalent of "Is
there a library that can ..." followed almost immediately by "Great! But
how do I make it do this ...", requesting a really trivial extraction step
that can be accomplished in a single line of Ruby.

I find this rather ironic, since Ruby is meant to provide an easy way to
create solutions to everyday problems. One then sees a blizzard of
libraries whose purpose is to shield the user from the complexities of the
language, in a way that the remedy is often more complex than the problem
it is meant to solve.

In this thread, the OP started out by examining the alternatives among
specialized libraries meant to address the general problem, but apparently
never considered writing code to solve the problem directly. After choosing
a library, the OP realized he didn't see an obvious way to solve the
original problem -- extracting specific content from the source pages.

As to modern XHTML Web pages that can pass a validator, I know from direct
recent experience that they yield to the simplest parser design, and can be
relied on to produce a tree of organized content, stripped of tags and
XHTML-specific formatting, in a handful of lines of Ruby code. It is hard
to justify bringing out the big guns for a task like this, when one could
instead use a small self-documenting routine such as I suggested.

In the bad old days of assembly and comparatively heavy, inflexible
languages like C, C++ and the like, it is easy to see why people would be
motivated to create specialized libraries to solve generic problems just
once for all time. In fact, the argument can be made that Ruby is just such
a library of generics, broadly speaking an extension/amplification of the
STL project.

Now we see people writing easy-to-use application libraries, each composed
using the easy-to-use Ruby library, but that are sometimes harder to sort
out, or make practical use of, than a short bit of code would have been.

Lest my readers think I am going overboard here on a topic dear to my heart,
let me quote the OP once again:

>>> OK that code all works great but i have one last question :)
>>>
>>> This is allowing me to scrape the values of the class values on
>>> tags and
>>> any other attribues such as that. My question is, how would i
>>> modify the
>>> code in order to get it to capture say a block of text such as:
>>>
>>>  <p>this is text that i want to scrape</p>
>>>
>>> any ideas?

In other words, after choosing a library and playing with it for a while, he
found himself back in square one, unable to solve the original problem.

To quote one of my favorite authors (William Burroughs), it seems people are
busy inventing cures for which there are no diseases.

-- 
Paul Lutus
http://www.arachnoid.com

In This Thread