[#387246] newbie question — sunny parker <info@2020proj.com>

i am coming from php and dont seem to quite understand how ruby works

13 messages 2011/09/01

[#387330] installing naive bayes classifier — aya abdelsalam <ayoya_91@...>

Hello

10 messages 2011/09/02

[#387344] Beginner needing help - Writing right-angle triangle program — Kane Williams <theburrick@...>

I've been going through a Haskell tutorial (Just to see what it's like)

12 messages 2011/09/03

[#387356] Which version should I download? — Vladimir Van Bauenhoffer <cluny_gisslaren@...>

Im new to programming and Im thinking of downloading and starting with

17 messages 2011/09/03

[#387392] loops problem — jack jones <shehio_22@...>

for (j = @array.length ; j > counter ; j = j-1) # counter is a variable

13 messages 2011/09/04

[#387469] posts on Unix systems programming — Eric Wong <normalperson@...>

I would like to do a series of mailing list posts on the subject of Unix

28 messages 2011/09/06

[#387530] Unexpected behavior of Ruby array — Suvankar Satpati <suvankar.17@...>

I was going through the exercises at http://rubykoans.com/ and got

11 messages 2011/09/08

[#387544] Executing the output of a look — dwight schrute <spambocks@...>

Hi,

14 messages 2011/09/08

[#387586] Creating a hash from two arrays — simon harrison <simonharrison.uk@...>

Hi. Can anyone help with this? I'd like to end with a hash like so:

15 messages 2011/09/09

[#387596] newbie ruby installation malloc issue — "mark e." <mark_f_edwards@...>

hi all -

12 messages 2011/09/09

[#387614] how to write data in binary to a file? — frank hi <yw_hi@163.com>

Hi,

11 messages 2011/09/10

[#387646] How do I make output generate a float without an excess numbers of decimal places? — Kane Williams <theburrick@...>

For example, my current code is

11 messages 2011/09/11

[#387725] Any downsides to writing paranthesises? — Vladimir Van Bauenhoffer <cluny_gisslaren@...>

Im a newbie programmer who is trying to learn Ruby after having just

18 messages 2011/09/12

[#387811] Get interpreter path — Michal Suchanek <hramrach@...>

Hello,

26 messages 2011/09/14
[#387842] Re: Get interpreter path — Phillip Gawlowski <cmdjackryan@...> 2011/09/14

On Wed, Sep 14, 2011 at 3:35 PM, Michal Suchanek <hramrach@centrum.cz> wrote:

[#387844] Re: Get interpreter path — Michal Suchanek <hramrach@...> 2011/09/14

On 14 September 2011 20:47, Phillip Gawlowski <cmdjackryan@gmail.com> wrote:

[#387814] Tough Ruby Homework — Rory Pascua <rorypascua@...>

I'm trying to take a long piece of text, find a word, and get that word

18 messages 2011/09/14

[#387853] Can I Safely Use Rubinius While Learning? — Aaron Jackson <jacksonaaronc@...>

Greetings,

18 messages 2011/09/15

[#387915] Some newbie questions — Vladimir Van Bauenhoffer <cluny_gisslaren@...>

I got some newbie questions which I would very much appreciate if

14 messages 2011/09/15

[#388003] Ruby Speed Question — Kevin Anon <oblivious.sage@...>

Wrote my first Ruby program recently for a class assignment where we had

12 messages 2011/09/18

[#388078] appending \n to each element in an array — Joe Collins <joec_49@...>

I have an array

13 messages 2011/09/20

[#388123] Turning on a special program at special time and turning off the computer at another special time — "amir e." <aef1370@...>

I decided to write a program in RUBY wherein these items have been done

11 messages 2011/09/21
[#388124] Re: Turning on a special program at special time and turning off the computer at another special time — andrew mcelroy <sophrinix@...> 2011/09/21

That sounds like a program a special program a terrorist would write. Are

[#388198] Conditional statements with multiple arguments — "Thomas B." <sinixlol@...>

Good afternoon everyone,

18 messages 2011/09/24

[#388203] Ruby 1.9.3 RC1 is out — "Yuki Sonoda (Yugui)" <yugui@...>

-----BEGIN PGP SIGNED MESSAGE-----

19 messages 2011/09/24
[#388208] Re: [ANN] Ruby 1.9.3 RC1 is out — Quintus <sutniuq@...> 2011/09/24

-----BEGIN PGP SIGNED MESSAGE-----

[#388209] Re: [ANN] Ruby 1.9.3 RC1 is out — Chris White <cwprogram@...> 2011/09/24

[#388214] Re: [ANN] Ruby 1.9.3 RC1 is out — Quintus <sutniuq@...> 2011/09/24

-----BEGIN PGP SIGNED MESSAGE-----

[#388216] Re: [ANN] Ruby 1.9.3 RC1 is out — Yusuke Endoh <mame@...> 2011/09/24

Hello,

[#388248] Looking for better/familiar approach to command line opts — "Perl J." <perljunkie@...>

So I guess the warning to the reader upfront is... I'm a bit of a Perl

14 messages 2011/09/25

[#388333] Get all classes from a list of files — Jeroen van Ingen <jeroeningen@...>

I have a list of ruby files. I would like to create objects from all

11 messages 2011/09/28

[#388342] Ruby Syntax @keywords ||= [ ] — Bhavesh Sharma <sharmabhavesh@...>

Sorry if this comes across as a dumb question, but what does the

11 messages 2011/09/28

[#388366] IO.readlines will not accept variable with file name Why? — Joda jenson <jodajen2@...>

I am fairly new to Ruby and I am stuck on this. Would someone have a

13 messages 2011/09/29
[#388368] Re: IO.readlines will not accept variable with file name Why? — Robert Klemme <shortcutter@...> 2011/09/29

On Thu, Sep 29, 2011 at 11:14 AM, Joda jenson <jodajen2@yahoo.com> wrote:

Populate an array based on object

From: M R Lemon <matthew.lemon@...>
Date: 2011-09-26 17:47:47 UTC
List: ruby-talk #388287
Hi,

I have been trying for some time to crack a problem which I am sure if very
simple! I am learning Ruby and programming in general and having lots of
fun.  But help would be appreciated...

I am writing a program to scrape body text from a series of web pages so
that they can be presented in a text file.

The format for the URLs of the series of pages I am interested in is:

www.targeturl.com/episode_x?=page1
...
...
www.targeturl.com/episode_x?=page[y]


Basically, "episode_x" has y number of pages, starting at 1.

I am using Nokogiri to grab the text from the page and can quite easily get
the text from page1, but I want to loop through page2, grab its text, page3,
grab its text, etc, until I reach page[y] which is where the text ends, and
to Nokogiri - this means there is no more text on that page (i.e. body_text
== nil).

Before attempting to grab the body text and append to a text file, my
strategy is to populate an array of 'valid' urls, based on a test which
involves Nokogiri finding text in the body tag, starting at page1.  I want
the loop to finish when the test finds body_text == nil, leaving me with a
collection of URLs which I know to definitely contain body text.

After a lot of playing around, I have got this far, but there is no looping
going on.  I am getting the page okay and am testing for a certain condition
which results in "Empty!" being appended to the array (essentially, when
body_text == nil).  But I can't work out how to loop.

 def get_text(base_url, page_number)

    @target_url = base_url + page_number.to_s
    @noko_doc = Nokogiri::HTML(open(@target_url))

    @text = ''
    @noko_doc.css('div.body_recap').each do |text|
       @text << text.content
       @text = @text.strip!
       return @text
     end
  end

def collect_urls(base_url, page_number)
  @valid_urls = []
  text = get_text(base_url, page_number)
  if text =~ /\A\s*Previous/
    @valid_urls << "END!"
  else @valid_urls << @target_url
    return @valid_urls
  end
end
end
-- 

Any help or comments very welcome!

Thanks all.

Matt

In This Thread

Prev Next