[#223105] ruby programming best practice — "Shannon Fang" <xrfang@...>

As a dynamic language, Ruby is much more flexible and easier than other

18 messages 2006/11/01

[#223126] variable pointer — "akbarhome" <akbarhome@...>

@c = "donal"

17 messages 2006/11/01

[#223211] file size revisit — python152@...

Hi, folks

17 messages 2006/11/02

[#223299] Just a question to throw out there... — "Skotty" <shyguyfrenzy@...>

Another noobrube question.

23 messages 2006/11/02

[#223398] Output not clear — "Learning Ruby" <learningruby@...>

I am a newbie to Ruby and the output of the following program is not clear

14 messages 2006/11/03

[#223425] Bytecode Compiler (#100) — Ruby Quiz <james@...>

The three rules of Ruby Quiz:

27 messages 2006/11/03

[#223458] REXML ... performance & memory usage ... — Jeff Wood <jeff@...>

Wow ... I am trying to use REXML to parse through an 8.8Mb xml file ...

14 messages 2006/11/03

[#223653] Book wanted: Metaprogramming in Ruby — Jay Levitt <jay+news@...>

Now that Hal, David B, Curt, and others have some spare time:

25 messages 2006/11/06

[#223736] REXML — "pdg" <pgattphoto@...>

Hi All,

22 messages 2006/11/06

[#223831] the name of Matz — Byung-Hee HWANG <bh@...>

Hello,

51 messages 2006/11/07
[#223839] Re: [OT] the name of Matz — Yukihiro Matsumoto <matz@...> 2006/11/07

Hi,

[#223975] Re: [OT] the name of Matz — Devin Mullins <twifkak@...> 2006/11/08

Yukihiro Matsumoto wrote:

[#224630] Re: the name of Matz — "Ryo" <furufuru@...> 2006/11/12

Yukihiro Matsumoto wrote:

[#224645] Re: the name of Matz — "Robert Dober" <robert.dober@...> 2006/11/12

On 11/12/06, Ryo <furufuru@ccsr.u-tokyo.ac.jp> wrote:

[#242731] Re: the name of Matz — Harry <ruby.hardware@...> 2007/03/09

> It might be fun though if you could give a pointer to the "correct"

[#224216] Re: [OT] the name of Matz — Byung-Hee HWANG <bh@...> 2006/11/09

Yukihiro Matsumoto wrote:

[#223846] How to make a cycling counter from commandline? — "darenbell@..." <darenbell@...>

Hi, I'm looking for a way to implement this idea:

12 messages 2006/11/07

[#223930] Two way communication with the command shell (IO.popen?) — James Smith <jmdjmsmith@...>

19 messages 2006/11/08
[#223943] Re: Two way communication with the command shell (IO.popen?) — ara.t.howard@... 2006/11/08

On Wed, 8 Nov 2006, James Smith wrote:

[#223997] Re: Two way communication with the command shell (IO.popen?) — James Smith <jmdjmsmith@...> 2006/11/08

unknown wrote:

[#224012] Re: Two way communication with the command shell (IO.popen?) — ara.t.howard@... 2006/11/08

On Wed, 8 Nov 2006, James Smith wrote:

[#224327] Re: Two way communication with the command shell (IO.popen?) — James Smith <jmdjmsmith@...> 2006/11/10

unknown wrote:

[#224690] Re: testing whether a process has completed.. — James Smith <jmdjmsmith@...> 2006/11/12

OK, keeping it simple I am basically using the following code:

[#224691] Re: testing whether a process has completed.. — "Patrick Hurley" <phurley@...> 2006/11/12

On 11/12/06, James Smith <jmdjmsmith@msn.com> wrote:

[#223953] Why create web servers? — "CatLady []" <totalharmonicdistortion@...>

Hi,

16 messages 2006/11/08

[#224002] FastRI 0.1.0: faster, smarter RI docs for Ruby, DRb-enabled — Mauricio Fernandez <mfp@...>

FastRI 0.1.0: faster, smarter RI docs for Ruby, DRb-enabled

27 messages 2006/11/08

[#224013] #returning and #tap — "Trans" <transfire@...>

Had use for this today: #returning is a convenience method you'll find

57 messages 2006/11/08
[#225210] Re: #returning and #tap — Eric Hodel <drbrain@...7.net> 2006/11/15

On Nov 8, 2006, at 6:40 AM, Trans wrote:

[#225233] Re: #returning and #tap — Joel VanderWerf <vjoel@...> 2006/11/16

Eric Hodel wrote:

[#225358] Re: #returning and #tap — Eric Hodel <drbrain@...7.net> 2006/11/16

On Nov 15, 2006, at 5:40 PM, Joel VanderWerf wrote:

[#225370] Re: #returning and #tap — ara.t.howard@... 2006/11/16

On Fri, 17 Nov 2006, Eric Hodel wrote:

[#225382] Re: #returning and #tap — Joel VanderWerf <vjoel@...> 2006/11/16

ara.t.howard@noaa.gov wrote:

[#225385] Re: #returning and #tap — dblack@... 2006/11/16

Hi --

[#225388] Re: #returning and #tap — Joel VanderWerf <vjoel@...> 2006/11/16

dblack@wobblini.net wrote:

[#225393] Re: #returning and #tap — dblack@... 2006/11/16

Hi --

[#225399] Re: #returning and #tap — Joel VanderWerf <vjoel@...> 2006/11/16

dblack@wobblini.net wrote:

[#225420] Re: #returning and #tap — dblack@... 2006/11/16

Hi --

[#225476] Re: #returning and #tap — "Martin DeMello" <martindemello@...> 2006/11/17

On 11/17/06, dblack@wobblini.net <dblack@wobblini.net> wrote:

[#225488] Re: #returning and #tap — dblack@... 2006/11/17

Hi --

[#225494] Re: #returning and #tap — spooq <spoooq@...> 2006/11/17

I definitely think of it as tapping a phone line.

[#225495] Re: #returning and #tap — spooq <spoooq@...> 2006/11/17

Actually, how about giving the proc a copy of the object, rather than

[#224039] Proc as Observer — "Tim Pease" <tim.pease@...>

Working with an Observable object, I wanted to be able to add a Proc

20 messages 2006/11/08
[#224061] Re: Proc as Observer — "Trans" <transfire@...> 2006/11/08

[#224040] Simple Math Problem — Thom Loring <tloring@...>

Can anyone shed some light on a simple math problem I have encountered?

14 messages 2006/11/08

[#224087] The Ruby Way review on Slashdot — Timothy Hunter <TimHunter@...>

Whoo-hoo! My review of Hal Fulton's _The_Ruby_Way,_Second_Edition_ is on

17 messages 2006/11/08

[#224157] thousand ways to rome — Chris Mueller <damngoodcoffee@...>

Hi,

17 messages 2006/11/09

[#224246] Overwriting the Integer class for method succ! (instead of just succ) — "paul" <pjvleeuwen@...>

Hi all,

11 messages 2006/11/09

[#224331] Rails vs. Asp.Net politics — "Leslie Viljoen" <leslieviljoen@...>

I have the deciding vote in a new (rather large) web app we need to

28 messages 2006/11/10

[#224352] VCR Program Manager (#101) — Ruby Quiz <james@...>

The three rules of Ruby Quiz:

13 messages 2006/11/10

[#224398] looking for some feedback about Certification — "pat eyler" <pat.eyler@...>

Aaah, nothing like a good controversial topic to stir up a holy war

38 messages 2006/11/10
[#224401] Re: looking for some feedback about Certification — Gustav Paul <gustav@...> 2006/11/10

pat eyler wrote:

[#224439] Re: looking for some feedback about Certification — dblack@... 2006/11/11

Hi --

[#224411] turn 0.1.0 Released — "Tim Pease" <tim.pease@...>

turn version 0.1.0 has been released!

18 messages 2006/11/10

[#224532] McGovern Likes JRuby... — Charles Oliver Nutter <charles.nutter@...>

I'm not sure how to feel about this one :)

26 messages 2006/11/11
[#224570] Re: McGovern Likes JRuby... — "M. Edward (Ed) Borasky" <znmeb@...> 2006/11/11

Charles Oliver Nutter wrote:

[#224574] Re: McGovern Likes JRuby... — David Vallner <david@...> 2006/11/11

M. Edward (Ed) Borasky wrote:

[#224539] Ruby GUI with IDE — "Josh Mr." <kamipride102@...>

Hello all,

33 messages 2006/11/11
[#224543] Re: Ruby GUI with IDE — "M. Edward (Ed) Borasky" <znmeb@...> 2006/11/11

Josh Mr. wrote:

[#224546] Re: Ruby GUI with IDE — AliasX Neo <kamipride102@...> 2006/11/11

M. Edward (Ed) Borasky wrote:

[#224554] Re: Ruby GUI with IDE — David Vallner <david@...> 2006/11/11

AliasX Neo wrote:

[#224569] Re: Ruby GUI with IDE — "M. Edward (Ed) Borasky" <znmeb@...> 2006/11/11

David Vallner wrote:

[#224577] Re: Ruby GUI with IDE — Caleb Tennis <caleb@...> 2006/11/11

>>

[#224578] Re: Ruby GUI with IDE — AliasX Neo <kamipride102@...> 2006/11/11

So I guess a better format for my original question should be:

[#224580] Re: Ruby GUI with IDE — David Vallner <david@...> 2006/11/11

AliasX Neo wrote:

[#224639] regular expression too big — Peter Schrammel <peter.schrammel@...>

Hi,

31 messages 2006/11/12

[#224665] Help convert a Perl user to the Ruby Way. — Sebastian Reid <seb@...>

Hi all.

13 messages 2006/11/12

[#224777] Nitro + Og 0.40.0 — "George Moschovitis" <george.moschovitis@...>

Hello everyone,

17 messages 2006/11/13

[#224817] directory_watcher 0.1.1 — "Tim Pease" <tim.pease@...>

A class for watching files within a directory and generating events

16 messages 2006/11/13
[#224838] Re: directory_watcher 0.1.1 — "Kenosis" <kenosis@...> 2006/11/13

[#224839] Re: directory_watcher 0.1.1 — "Tim Pease" <tim.pease@...> 2006/11/13

On 11/13/06, Kenosis <kenosis@gmail.com> wrote:

[#224933] ruby indentantion — Alfonso <euoar@...>

I have just started with ruby, and something that I have observed is

23 messages 2006/11/14

[#224949] Is 2.0 Integer or Float? — "S. Robert James" <srobertjames@...>

I'd like to be able to do:

18 messages 2006/11/14

[#224997] Assoc method on large array — "gregarican" <greg.kujawa@...>

I am trying to invoke the assoc method on a large array. It seems to

13 messages 2006/11/14

[#225069] Design problem with 'inject' — Gary Boone <dr@...>

20 messages 2006/11/15

[#225109] FastRI 0.2.0: full-text searching, smarter search strategies — Mauricio Fernandez <mfp@...>

FastRI is an alternative to the ri command-line tool. It is *much* faster, and

9 messages 2006/11/15

[#225179] *Fast* way to process large files line by line — Devesh Agrawal <dagrawal@...>

Hi Folks,

20 messages 2006/11/15

[#225288] Re: parse xml file, put results in mysql db — "seb@..." <seb@...>

--- Kathy Simmons <kathys39@hotmail.com> wrote:

15 messages 2006/11/16
[#225291] Re: parse xml file, put results in mysql db — Jon Egil Strand <jes@...> 2006/11/16

>

[#225296] Re: parse xml file, put results in mysql db — Mike Fletcher <lemurific+rforum@...> 2006/11/16

Jon Egil Strand wrote:

[#225330] Re: parse xml file, put results in mysql db — Kathy Simmons <kathys39@...> 2006/11/16

Here's the full code - I'm reading in nmap output in scanfile.xml and

[#225379] IHelp 0.4.0 - full text search — "Ilmari Heikkinen" <ilmari.heikkinen@...>

View and search object documentation from irb.

13 messages 2006/11/16
[#225383] Re: [ANN] IHelp 0.4.0 - full text search — Parragh Szabolcs <parragh@...> 2006/11/16

Ilmari Heikkinen 叝ta:

[#225398] Re: [ANN] IHelp 0.4.0 - full text search — "Ilmari Heikkinen" <ilmari.heikkinen@...> 2006/11/16

Hi,

[#225412] Re: [ANN] IHelp 0.4.0 - full text search — "Ilmari Heikkinen" <ilmari.heikkinen@...> 2006/11/16

> Thanks for noticing this, should be fixed in 0.4.1.

[#225470] Re: [ANN] IHelp 0.4.0 - full text search — Parragh Szabolcs <parragh@...> 2006/11/17

Ilmari Heikkinen 叝ta:

[#225512] Literate Ruby (#102) — Ruby Quiz <james@...>

The three rules of Ruby Quiz:

12 messages 2006/11/17

[#225547] ruby equivalent PHP function is_numeric? — Josselin <josselin@...>

After reading completely my Ruby book, I cannot find a function

15 messages 2006/11/17

[#225681] Ruby vs Java vs c++ — n/a <na@...>

hi, newbie so please be tolerant.... ;)

117 messages 2006/11/18

[#225754] Ruby screen scraping — Chris Gallagher <cgallagher@...>

Hi,

28 messages 2006/11/19

[#225909] Create array of hash values — David Lelong <drlelon@...>

Hi,

13 messages 2006/11/20

[#226023] Bug in ruby? — AliasX Neo <kamipride102@...>

Well, I've spent the last hour or so debugging one of the stupidest

31 messages 2006/11/21

[#226029] array question — Li Chen <chen_li3@...>

Hi all,

41 messages 2006/11/21
[#226031] Re: array question — "Wilson Bilkovich" <wilsonb@...> 2006/11/21

On 11/20/06, Li Chen <chen_li3@yahoo.com> wrote:

[#226120] Hpricot/Rubyful Soup comparison — Wes Gamble <weyus@...>

Has anyone done a head to head comparison of Hpricot and Rubyful Soup

19 messages 2006/11/21

[#226168] New RCRchive, including new process — dblack@...

Hi everyone --

35 messages 2006/11/22

[#226210] invoke system command from within a method — Moritz Reiter <mreiter@...>

-----BEGIN PGP SIGNED MESSAGE-----

11 messages 2006/11/22

[#226228] how do I contribute to Ruby? — "Giles Bowkett" <gilesb@...>

check this out, this is the whiniest change ever, but what I want is

15 messages 2006/11/22

[#226262] Rubyish inst.var initializations — "Victor \"Zverok\" Shepelev" <vshepelev@...>

Hi all.

12 messages 2006/11/23

[#226263] Compare Array Values? — "Daniel N" <has.sox@...>

I want to check to see if two arrays contain the same values.

30 messages 2006/11/23

[#226388] Anyone else getting weird flickr errors? — "Gregory Brown" <gregory.t.brown@...>

When I post to RubyTalk, I've been getting a 'your photo upload

14 messages 2006/11/24

[#226484] Is there a simply way to get every method log itself before running? — "Richard" <RichardDummyMailbox58407@...>

Hi,

11 messages 2006/11/24

[#226537] DictionaryMatcher (#103) — Ruby Quiz <james@...>

The three rules of Ruby Quiz:

18 messages 2006/11/24

[#226553] Ruby/Python/REXX as a MUCK scripting language — Tony Belding <zobeid@...>

I'm interested in using an off-the-shelf interpreted language as a

18 messages 2006/11/25

[#226608] coding practise — sempsteen <sempsteen@...>

Hi all,

23 messages 2006/11/25

[#226707] Ruby/Rails on Gumstix — "M. Edward (Ed) Borasky" <znmeb@...>

For the past couple of weeks, I've been playing around with Ruby on a

16 messages 2006/11/26
[#226751] Re: Ruby/Rails on Gumstix — "Giles Bowkett" <gilesb@...> 2006/11/26

On 11/25/06, M. Edward (Ed) Borasky <znmeb@cesmail.net> wrote:

[#226709] Timestamp — Srinivas Sa <sr.sakhamuri@...>

How do i add two time stamps

23 messages 2006/11/26

[#226731] find index of first non zeo value in array — Josselin <josselin@...>

with :

24 messages 2006/11/26
[#226733] Re: find index of first non zeo value in array — Olivier <o.renaud@...> 2006/11/26

Le dimanche 26 novembre 2006 15:00, Josselin a 馗rit

[#226783] Two Advanced Ruby Performance Questions — Sunny Hirai <sunny@...>

First, I am a Ruby newbie but am an experienced developer of highly

27 messages 2006/11/26
[#226816] Re: Two Advanced Ruby Performance Questions — Edwin Fine <efine145-nospam01@...> 2006/11/26

This post may be stating the obvious, but here goes anyway... I hope I

[#226792] Extremely Noobish Documentation Question — Paco Paco <mepaco@...>

Hello all,

16 messages 2006/11/26

[#226806] Re: ruby and list comprehension — James Cunningham <jameshcunningham@...>

On 2006-11-25 18:47:26 -0500, Brad Tilley <rtilley@vt.edu> said:

12 messages 2006/11/26

[#227012] Is ruby a viable corporate alternative? — "Mr P" <MisterPerl@...>

Our team uses Perl for almost 100% of our projects, as we have for the

27 messages 2006/11/28

[#227041] FileUtils.touch doesn't work — Jeff Toth <jeff@...>

Why won't Ruby just install from the port? I don't know what Ruby is,

12 messages 2006/11/28

[#227108] Simple screen scraper using scrAPI — "doog" <doog@...>

I'm a Ruby novice. Does anyone have an example of a simple screen

14 messages 2006/11/28

[#227160] cidr.rb: port of Perl's Net::CIDR v0.11 available — Jos Backus <jos@...>

Module:

17 messages 2006/11/29

[#227198] Splitting a CSV file into 40,000 line chunks — Drew Olson <olsonas@...>

All -

40 messages 2006/11/29
[#227243] Re: Splitting a CSV file into 40,000 line chunks — James Edward Gray II <james@...> 2006/11/29

On Nov 29, 2006, at 9:32 AM, Drew Olson wrote:

[#227255] Re: Splitting a CSV file into 40,000 line chunks — Drew Olson <olsonas@...> 2006/11/29

Thanks for all the responses. As noted in a post above, I am trying to

[#227219] Need a range, but not getting it. . . . — Peter Bailey <pbailey@...>

Hello,

33 messages 2006/11/29

[#227282] creating directory "http://example.com" — Comfort Eagle <steve@...>

How do I create a directory 'http://example.com' without it getting

16 messages 2006/11/29

[#227302] Wrong results using named arguments — "Jason Vogel" <jasonvogel@...>

Source:

12 messages 2006/11/29

[#227336] Overwhelmed by emails — Daniel DeLorme <dan-ml@...42.com>

This list has way too many messages for the amount of free time I have. Does

12 messages 2006/11/30

[#227388] Timers, scheduling and Ruby — Damphyr <damphyr@...>

Ok, since the original post migh just appear in a month's time, lets

24 messages 2006/11/30
[#227404] Re: Timers, scheduling and Ruby — James Edward Gray II <james@...> 2006/11/30

On Nov 30, 2006, at 7:51 AM, Damphyr wrote:

[#227414] Re: Timers, scheduling and Ruby — ara.t.howard@... 2006/11/30

On Fri, 1 Dec 2006, James Edward Gray II wrote:

[#227416] Re: Timers, scheduling and Ruby — James Edward Gray II <james@...> 2006/11/30

On Nov 30, 2006, at 11:47 AM, ara.t.howard@noaa.gov wrote:

[#227461] Re: Timers, scheduling and Ruby — ara.t.howard@... 2006/11/30

On Fri, 1 Dec 2006, James Edward Gray II wrote:

[#227402] Segmentation fault, proc, eval, long string — Bob Hutchison <hutch@...>

Hi,

27 messages 2006/11/30
[#227415] Re: Segmentation fault, proc, eval, long string [Reproduced] — Bob Hutchison <hutch@...> 2006/11/30

A little more on this...

[#227569] Re: Segmentation fault, proc, eval, long string [Reproduced] — Pit Capitain <pit@...> 2006/12/01

Bob Hutchison schrieb:

[#227426] simple question, looping through each character in a string — "warhero" <beingthexemplarylists@...>

how can I accomplish something like this in ruby:

17 messages 2006/11/30
[#227438] Re: simple question, looping through each character in a string — dblack@... 2006/11/30

Hi --

[#227458] Wisdom of including Rakefile in releases — "Trans" <transfire@...>

I was poking around in the /usr/lib/ruby/gems directory today and

13 messages 2006/11/30

*Fast* way to process large files line by line

From: Devesh Agrawal <dagrawal@...>
Date: 2006-11-15 19:21:03 UTC
List: ruby-talk #225179
Hi Folks,

	I am using ruby to analyse a huge (around 60G) amount of my networking 
experiment data. Let me briefly describe my technique: I have to read 
around 40 files (of around 1.5G each) named f1,f2 ... .Each file fi 
contains traceroutes to lots of destinations at different times. I.E a 
file is basically a list of traceroutes launched from a given src (src = 
filename) launched at diff times. I want to get a structure like 
following: (list of all traceroutes from *all* src's at time 1), (list 
of all traceroutes from *all* src's at time 2)... and so on.

	For this I am using the following psuedocode:

	outputfile.open
	open all files f1..fn
	while (!(all files have eof))
		(f1..fn).each{|f|
			next if f.eof
			line = f.readline
			parse the line, and get a structure P out of it
			put P into a hashtable: H[P.time] << P

			check for eof conditions on f

			if (H has more than k keys ? (ie has it become very large))
				H.keys.sort{|t|
					outputfile << Marshal.dump(H[t])
					H.delete(t)
				}
			end
		}
	end
	close all files

//Btw I can't use an array instead of a hashtable H, as the P.time's 
read across all files needn't be same.

This is performing miserbly SLOW. I have the following questions:

	i. How fast is f.readline ?. I want to use the maximum buffering 
possible for largest speed gains. In ruby how do I set the buffer size. 
I looked through io.c, and it seems that readline essentially uses getc 
(stopping when it gets a newline). How can I set the buffer size for the 
underlying libc FILE* ? Oh btw, each line is approx 200-400 bytes.

	ii. Marshal.dump is also very slow. Is there an alternative, Yaml is 
even worse.

	iii. Is it bad to have around 40-50 files opened at the same time ?.

	iv. The program does use a lot of memory but not so much, around 30-40
pc of 1G ram machine is used by it. So I think paging in/out is not a 
problem.

	v. Would coding the realine part in C using rubyinline offer me speed 
advantages ?

	vi. I am thinking of trying the following to reduce the time it takes, 
I would very much welcome your comments:

		a. Remove Marshal.dump [I don't need to strictly serialize objects, 
only dump the data and read it back] and replace it with some string 
form which is more compact. Actually is it possible to have something 
like fixed length structures like in C: Example I would want P to be 
like this: Struct P{ char foo[100], int a[100]} ?. So this way I think 
the IO would be faster as I could just dump a fixed number of bytes to a 
file.

		b. Try to reduce the memory consumption of this by reducing k further 
so as the program doesn't page in/out.

		c. Can someone point me to a good sample code for reading a file line 
by line in C and then putting it into a ruby hashtable ?.
		d. How much of the slowness is due to the fact that it is ruby and not 
C ?

To give you an idea of how slow this is actually: Just reading all the 
files
line by line takes around 8-9 hrs. Whereas the above thing easily takes 
5-6
days  !!. And I am quite unable to run profile on my code as it is just 
too slow.

I would be very grateful for your comments, and particularly if you have 
any suggestions/experience on doing this in a fast way.

--Devesh Agrawal



-- 
Posted via http://www.ruby-forum.com/.

In This Thread

Prev Next