[#401849] If statement — Masoud Ahmadi <lists@...>

Will anyone be able to point out what I am doing wrong.

15 messages 2012/12/02

[#401987] Trying to get "translator" to work — JD KF <lists@...>

So, basically, I'm trying to get the below code to work properly for

12 messages 2012/12/06

[#402012] Need help to select some listbox item in different listbox together — Jonathan Masato <lists@...>

Hello,

10 messages 2012/12/07

[#402045] if n belongs to set a and m belongs to set b repeat some steps, How? — "zubair a." <lists@...>

We can do so in java and similar languages like:

11 messages 2012/12/08

[#402078] Time.new(2001, 12, 3).to_i returns wrong value — Robert Buck <lists@...>

I am doing something that not many do, I am writing a database driver

9 messages 2012/12/09

[#402145] How I can create/extract a variable/hash into the current binding in Ruby? — Ramon de C Valle <rcvalle@...>

Hi,

12 messages 2012/12/12

[#402205] Wondering About Flatiron School — "Kevin Y." <lists@...>

Hi everyone!,

35 messages 2012/12/15
[#402207] Re: Wondering About Flatiron School — Chad Perrin <code@...> 2012/12/15

On Sat, Dec 15, 2012 at 11:51:08AM +0900, Kevin Y. wrote:

[#402214] Ruby quick reference arranged in ASCII sequence? — Old Grantonian <lists@...>

As a ruby beginner, I would be grateful for any links to a ruby

17 messages 2012/12/15

[#402226] print - and strip text between tags using Nokogiri — Paul Mena <lists@...>

I'm a Ruby Newbie trying to write a program to process thousands of HTML

13 messages 2012/12/15

[#402332] Perl to Ruby: regex captures to assignment. — "Derrick B." <lists@...>

Hello all,

37 messages 2012/12/19
[#402342] Re: Perl to Ruby: regex captures to assignment. — "Derrick B." <lists@...> 2012/12/20

First of all, thanks for the fast responses!

[#402352] Re: Perl to Ruby: regex captures to assignment. — Robert Klemme <shortcutter@...> 2012/12/20

On Thu, Dec 20, 2012 at 1:38 AM, Derrick B. <lists@ruby-forum.com> wrote:

[#402357] Re: Perl to Ruby: regex captures to assignment. — "Derrick B." <lists@...> 2012/12/20

Robert Klemme wrote in post #1089733:

[#402359] trying to strip characters from a line — Paul Mena <lists@...>

I'm reading a table from a MySQL database and then processing it row by

18 messages 2012/12/20

[#402394] simple division: -9 / 5 = -2 what? — "Derrick B." <lists@...>

$ irb

13 messages 2012/12/22

[#402412] POLS and string-handling — Paul Magnussen <lists@...>

Hi,

14 messages 2012/12/22

[#402460] "Open" dialog of Windows — "Damián M. González" <lists@...>

Hi guys, been researching about pop up the "open" file dialog of

11 messages 2012/12/24

[#402466] How do I install Ruby on my Ubuntu 12.10 partition. — Kaye Ng <lists@...>

I already have Ruby installed on my Windows 7 partition.

23 messages 2012/12/25

[#402510] Ruby Association Certified Ruby Programmer — Sean Westfall <lists@...>

How well respected is this certification in the industry: Ruby

27 messages 2012/12/27
[#402528] Re: Ruby Association Certified Ruby Programmer — Peter Hickman <peterhickman386@...> 2012/12/27

On 27 December 2012 01:28, Sean Westfall <lists@ruby-forum.com> wrote:

[#402555] numeric? — Brandon Weaver <keystonelemur@...>

I've found a bit of an annoyance trying to find out if a number is numeric

20 messages 2012/12/27

[#402580] Ruby Koans regarding Hashes. — "Derrick B." <lists@...>

I am trying to understand this, so let me know how I do. :) I know

18 messages 2012/12/28

[#402609] can't open new ruby program under "new" context menu — "Lee V." <lists@...>

I'm stuck on the new version at trying to do something very simple.

10 messages 2012/12/28

[#402642] require "test/unit" — "Mattias A." <lists@...>

Hi,

17 messages 2012/12/29
[#402667] Re: require "test/unit" — "Mattias A." <lists@...> 2012/12/31

Hi Dami=C3=A1n M. Gonz=C3=A1lez!

[#402747] Re: require "test/unit" — "Derrick B." <lists@...> 2013/01/04

Mattias A. wrote in post #1090700:

[#402749] Re: require "test/unit" — sto.mar@... 2013/01/04

Am 04.01.2013 19:48, schrieb Derrick B.:

Re: print - and strip text between tags using Nokogiri

From: Robert Klemme <shortcutter@...>
Date: 2012-12-15 23:28:30 UTC
List: ruby-talk #402228
On Sun, Dec 16, 2012 at 12:10 AM, Paul Mena <lists@ruby-forum.com> wrote:
> I'm a Ruby Newbie trying to write a program to process thousands of HTML
> files, extracting pertinent text and inserting it into a MySQL database.
> Ruby seems ideally suited to the task in general, and I've already used
> Nokogiri to extract comment text.  What I need to do next is to print -
> and then ultimately delete or strip - the text between "pre" tags.
>
> Picture some html like this:
>
> <html>
> <head>
> <title>My Title</title>
> </head>
> <body>
> <h1>My Heading</h1>
> <strong>From:</strong>Me<br>
> <strong>Date:</strong> Wed Dec 05 2012 - 18:17:49 EST
> <!-- body="start" -->
> <p>
> text line 1
> <br>
> text line 2
> <br>
> text line 3
> <br>
> <p><pre>
> very important text
> more important text
> would you believe even more important text?
> </pre>
> <p><!-- body="end" -->
> </body>
> </html>
>
> I basically need to do 2 things: 1) to print only the text between the 2
> "pre" tags, and then 2) to print all of the non-tagged text between the
> "body" comments - minus the text between the "pre" tags.  I've been
> messing with this for a couple of hours - unsuccessfully - but I'm still
> convinced that this is the right tool for the job.

If you need to do more HTML and XML manipulation, learning XPath is a
good investment!  You can look here for a start:
http://www.w3schools.com/Xpath/default.asp

_One_ way to achieve what you want:

require 'nokogiri'

text = <<HTML
<html>
<head>
<title>My Title</title>
</head>
<body>
<h1>My Heading</h1>
<strong>From:</strong>Me<br>
<strong>Date:</strong> Wed Dec 05 2012 - 18:17:49 EST
<!-- body="start" -->
<p>
text line 1
<br>
text line 2
<br>
text line 3
<br>
<p><pre>
very important text
more important text
would you believe even more important text?
</pre>
<p><!-- body="end" -->
</body>
</html>
HTML

dom = Nokogiri.HTML(text)

puts dom.xpath('/html/body//pre/text()').map(&:to_s)

puts '---'

puts dom.xpath('/html/body//text()[not(ancestor::pre)]').map(&:to_s)

You can also process nodes individually if you replace ".map..." with
".each" and a block which receives the node and does something with
it.

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

In This Thread