[#401849] If statement — Masoud Ahmadi <lists@...>

Will anyone be able to point out what I am doing wrong.

15 messages 2012/12/02

[#401987] Trying to get "translator" to work — JD KF <lists@...>

So, basically, I'm trying to get the below code to work properly for

12 messages 2012/12/06

[#402012] Need help to select some listbox item in different listbox together — Jonathan Masato <lists@...>

Hello,

10 messages 2012/12/07

[#402045] if n belongs to set a and m belongs to set b repeat some steps, How? — "zubair a." <lists@...>

We can do so in java and similar languages like:

11 messages 2012/12/08

[#402078] Time.new(2001, 12, 3).to_i returns wrong value — Robert Buck <lists@...>

I am doing something that not many do, I am writing a database driver

9 messages 2012/12/09

[#402145] How I can create/extract a variable/hash into the current binding in Ruby? — Ramon de C Valle <rcvalle@...>

Hi,

12 messages 2012/12/12

[#402205] Wondering About Flatiron School — "Kevin Y." <lists@...>

Hi everyone!,

35 messages 2012/12/15
[#402207] Re: Wondering About Flatiron School — Chad Perrin <code@...> 2012/12/15

On Sat, Dec 15, 2012 at 11:51:08AM +0900, Kevin Y. wrote:

[#402214] Ruby quick reference arranged in ASCII sequence? — Old Grantonian <lists@...>

As a ruby beginner, I would be grateful for any links to a ruby

17 messages 2012/12/15

[#402226] print - and strip text between tags using Nokogiri — Paul Mena <lists@...>

I'm a Ruby Newbie trying to write a program to process thousands of HTML

13 messages 2012/12/15

[#402332] Perl to Ruby: regex captures to assignment. — "Derrick B." <lists@...>

Hello all,

37 messages 2012/12/19
[#402342] Re: Perl to Ruby: regex captures to assignment. — "Derrick B." <lists@...> 2012/12/20

First of all, thanks for the fast responses!

[#402352] Re: Perl to Ruby: regex captures to assignment. — Robert Klemme <shortcutter@...> 2012/12/20

On Thu, Dec 20, 2012 at 1:38 AM, Derrick B. <lists@ruby-forum.com> wrote:

[#402357] Re: Perl to Ruby: regex captures to assignment. — "Derrick B." <lists@...> 2012/12/20

Robert Klemme wrote in post #1089733:

[#402359] trying to strip characters from a line — Paul Mena <lists@...>

I'm reading a table from a MySQL database and then processing it row by

18 messages 2012/12/20

[#402394] simple division: -9 / 5 = -2 what? — "Derrick B." <lists@...>

$ irb

13 messages 2012/12/22

[#402412] POLS and string-handling — Paul Magnussen <lists@...>

Hi,

14 messages 2012/12/22

[#402460] "Open" dialog of Windows — "Damián M. González" <lists@...>

Hi guys, been researching about pop up the "open" file dialog of

11 messages 2012/12/24

[#402466] How do I install Ruby on my Ubuntu 12.10 partition. — Kaye Ng <lists@...>

I already have Ruby installed on my Windows 7 partition.

23 messages 2012/12/25

[#402510] Ruby Association Certified Ruby Programmer — Sean Westfall <lists@...>

How well respected is this certification in the industry: Ruby

27 messages 2012/12/27
[#402528] Re: Ruby Association Certified Ruby Programmer — Peter Hickman <peterhickman386@...> 2012/12/27

On 27 December 2012 01:28, Sean Westfall <lists@ruby-forum.com> wrote:

[#402519] using shebang with rvm? — Wesley Rishel <lists@...>

What would be the appropriate path to use after a shebang in the first

10 messages 2012/12/27

[#402555] numeric? — Brandon Weaver <keystonelemur@...>

I've found a bit of an annoyance trying to find out if a number is numeric

20 messages 2012/12/27

[#402580] Ruby Koans regarding Hashes. — "Derrick B." <lists@...>

I am trying to understand this, so let me know how I do. :) I know

18 messages 2012/12/28

[#402609] can't open new ruby program under "new" context menu — "Lee V." <lists@...>

I'm stuck on the new version at trying to do something very simple.

10 messages 2012/12/28
[#402618] Re: can't open new ruby program under "new" context menu — "Lee V." <lists@...> 2012/12/28

I just uninstalled what I had and reinstalled using the steps given in

[#402645] Re: can't open new ruby program under "new" context menu — "Derrick B." <lists@...> 2012/12/29

Lee V. wrote in post #1090514:

[#402653] Re: can't open new ruby program under "new" context menu — Lee Veinot <lee_veinot@...> 2012/12/30

Well, I'm up to page 43 in Chris Pine's book and having a lot of fun, but I still can't figure out two basic things.  One is what I've already asked you about.  I'm just going to paste what his book says so you can see what I'm having trouble with:

[#402642] require "test/unit" — "Mattias A." <lists@...>

Hi,

17 messages 2012/12/29
[#402667] Re: require "test/unit" — "Mattias A." <lists@...> 2012/12/31

Hi Dami叩n M. Gonz叩lez!

[#402747] Re: require "test/unit" — "Derrick B." <lists@...> 2013/01/04

Mattias A. wrote in post #1090700:

[#402749] Re: require "test/unit" — sto.mar@... 2013/01/04

Am 04.01.2013 19:48, schrieb Derrick B.:

Re: print - and strip text between tags using Nokogiri

From: Robert Klemme <shortcutter@...>
Date: 2012-12-15 23:28:30 UTC
List: ruby-talk #402228
On Sun, Dec 16, 2012 at 12:10 AM, Paul Mena <lists@ruby-forum.com> wrote:
> I'm a Ruby Newbie trying to write a program to process thousands of HTML
> files, extracting pertinent text and inserting it into a MySQL database.
> Ruby seems ideally suited to the task in general, and I've already used
> Nokogiri to extract comment text.  What I need to do next is to print -
> and then ultimately delete or strip - the text between "pre" tags.
>
> Picture some html like this:
>
> <html>
> <head>
> <title>My Title</title>
> </head>
> <body>
> <h1>My Heading</h1>
> <strong>From:</strong>Me<br>
> <strong>Date:</strong> Wed Dec 05 2012 - 18:17:49 EST
> <!-- body="start" -->
> <p>
> text line 1
> <br>
> text line 2
> <br>
> text line 3
> <br>
> <p><pre>
> very important text
> more important text
> would you believe even more important text?
> </pre>
> <p><!-- body="end" -->
> </body>
> </html>
>
> I basically need to do 2 things: 1) to print only the text between the 2
> "pre" tags, and then 2) to print all of the non-tagged text between the
> "body" comments - minus the text between the "pre" tags.  I've been
> messing with this for a couple of hours - unsuccessfully - but I'm still
> convinced that this is the right tool for the job.

If you need to do more HTML and XML manipulation, learning XPath is a
good investment!  You can look here for a start:
http://www.w3schools.com/Xpath/default.asp

_One_ way to achieve what you want:

require 'nokogiri'

text = <<HTML
<html>
<head>
<title>My Title</title>
</head>
<body>
<h1>My Heading</h1>
<strong>From:</strong>Me<br>
<strong>Date:</strong> Wed Dec 05 2012 - 18:17:49 EST
<!-- body="start" -->
<p>
text line 1
<br>
text line 2
<br>
text line 3
<br>
<p><pre>
very important text
more important text
would you believe even more important text?
</pre>
<p><!-- body="end" -->
</body>
</html>
HTML

dom = Nokogiri.HTML(text)

puts dom.xpath('/html/body//pre/text()').map(&:to_s)

puts '---'

puts dom.xpath('/html/body//text()[not(ancestor::pre)]').map(&:to_s)

You can also process nodes individually if you replace ".map..." with
".each" and a block which receives the node and does something with
it.

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

In This Thread