[#377882] remove array bracket — Kamarulnizam Rahim <niezam54@...>

Hi when i run my script, the output is as followed:

18 messages 2011/02/02

[#378046] Setter method for Hash value — Rolf Pedersen <rolfhsp@...>

Hi

20 messages 2011/02/03
[#378052] Re: Setter method for Hash value — Brian Candler <b.candler@...> 2011/02/03

Rolf Pedersen wrote in post #979431:

[#378056] Re: Setter method for Hash value — Rolf Pedersen <rolfhsp@...> 2011/02/03

Hi Brian (and others who have contributed with suggestions along the same

[#378144] C extension: How to check if a VALUE is still alive (not being GC'ed)? — Iñaki Baz Castillo <ibc@...>

Hi, I'm coding an async DNS resolver for EventMachine based on udns (a

13 messages 2011/02/05
[#378171] Re: C extension: How to check if a VALUE is still alive (not being GC'ed)? — Tony Arcieri <tony.arcieri@...> 2011/02/06

On Sat, Feb 5, 2011 at 4:02 PM, I=F1aki Baz Castillo <ibc@aliax.net> wrote:

[#378179] Re: C extension: How to check if a VALUE is still alive (not being GC'ed)? — Iñaki Baz Castillo <ibc@...> 2011/02/06

2011/2/6 Tony Arcieri <tony.arcieri@medioh.com>:

[#378199] Choosing an office suite — Hilary Bailey <my77elephants@...>

I am trying to decide which office suite to choose from. The only

30 messages 2011/02/07
[#378229] Re: Choosing an office suite — Phillip Gawlowski <cmdjackryan@...> 2011/02/07

On Mon, Feb 7, 2011 at 11:15 AM, Hilary Bailey <my77elephants@gmail.com> wrote:

[#378202] making hash key from arrays — Arihan Sinha <arihan_sinha@...>

Hi All,

11 messages 2011/02/07

[#378254] "permission denied" happening too often — Peter Bailey <pbailey@...>

Hello,

15 messages 2011/02/08
[#378256] Re: "permission denied" happening too often — Anurag Priyam <anurag08priyam@...> 2011/02/08

> I've got Ruby scripts that have been working fine for years now. But,

[#378257] Re: "permission denied" happening too often — Markus Schirp <mbj@...> 2011/02/08

You can also try to strace your script. In the logs you'll find the system

[#378259] Re: "permission denied" happening too often — Peter Bailey <pbailey@...> 2011/02/08

Markus Schirp wrote in post #980289:

[#378307] undefined class/module YAML::PrivateType - Error — "Priya D." <dharsininitt@...>

Hi,

11 messages 2011/02/09

[#378341] System calls with ` in parameters — "Gerad S." <geradstemke@...>

Hi All,

12 messages 2011/02/09

[#378618] Defining class methods — Tony Arcieri <tony.arcieri@...>

It seems there are 3 ways of defining class methods (at least in common

12 messages 2011/02/16

[#378685] LiveAST: a pure Ruby 1.9.2 library obtaining live abstract syntax trees — "James M. Lawrence" <quixoticsycophant@...>

= LiveAST

13 messages 2011/02/18

[#378753] posix_mq : Problem installing on HPUX — Tadeusz Bochan <tad.bochan@...>

Hello,

13 messages 2011/02/20

[#378890] a, b = Array.new(2).map!{|x| data.dup} — Stefan Salewski <mail@...>

I think I can replace this code

19 messages 2011/02/23
[#378892] Re: a, b = Array.new(2).map!{|x| data.dup} — niklas | brueckenschlaeger <niklas@...> 2011/02/23

Are you sure you can't rework your code to *not* copy data 5x? I assume

[#378899] Re: a, b = Array.new(2).map!{|x| data.dup} — Stefan Salewski <mail@...> 2011/02/23

On Thu, 2011-02-24 at 07:00 +0900, niklas | brueckenschlaeger wrote:

[#378941] Automatic question generator libs in Ruby Language — Sniper Abandon <sathish.salem.1984@...>

is there any Automatic question generator libraries in Ruby Language ?

20 messages 2011/02/24
[#379058] Re: Automatic question generator libs in Ruby Language — Sniper Abandon <sathish.salem.1984@...> 2011/02/27

suppose if i have a paragraph (arround 250 words)

[#379172] Re: Automatic question generator libs in Ruby Language — Shadowfirebird <shadowfirebird@...> 2011/03/01

> i want to get all the possible question from that paragraph

[#379174] Re: Automatic question generator libs in Ruby Language — Peter Zotov <whitequark@...> 2011/03/01

On Tue, 1 Mar 2011 19:31:36 +0900, Shadowfirebird wrote:

[#379175] Re: Automatic question generator libs in Ruby Language — Adam Prescott <mentionuse@...> 2011/03/01

On Tue, Mar 1, 2011 at 10:55 AM, Peter Zotov <whitequark@whitequark.org>wrote:

[#379177] Re: Automatic question generator libs in Ruby Language — Peter Zotov <whitequark@...> 2011/03/01

On Tue, 1 Mar 2011 20:02:13 +0900, Adam Prescott wrote:

[#379179] Re: Automatic question generator libs in Ruby Language — Adam Prescott <mentionuse@...> 2011/03/01

On Tue, Mar 1, 2011 at 12:28 PM, Peter Zotov <whitequark@whitequark.org>wrote:

[#378949] why is $1 in a grep() equal to nil? — 7stud -- <bbxx789_05ss@...>

class DataSource

16 messages 2011/02/24
[#378953] Re: why is $1 in a grep() equal to nil? — Eric Christopherson <echristopherson@...> 2011/02/25

On Thu, Feb 24, 2011 at 2:59 PM, 7stud -- <bbxx789_05ss@yahoo.com> wrote:

[#378958] parsing rule for this code? — 7stud -- <bbxx789_05ss@...>

1)

11 messages 2011/02/25

[#379000] Symbol#to_proc helping out with #select to beat Scala-s solution — Jarmo Pertman <jarmo.p@...>

Hey!

9 messages 2011/02/25

[#379074] finding a tag in a binary file — rob stanton <tnotnats@...>

I have a binary file in which I'd like to find multiple strings of 10

12 messages 2011/02/27

Re: Fast alternatives to "File" and "IO" for large numbers of files ?

From: Robert Klemme <shortcutter@...>
Date: 2011-02-24 08:52:34 UTC
List: ruby-talk #378925
On Thu, Feb 24, 2011 at 4:09 AM, Philip Rhoades <phil@pricom.com.au> wrote:
> I have script that does:
>
> - statistical processing from data in 50x32x20 (32,000) large input files
>
> - writes a small text file (22 lines with one or more columns of numbers)
> for each input file
>
> - read all small files back in again for final processing.
>
> Profiling shows that IO is taking up more than 60% of the time - short of
> making fewer, larger files for the data (which is inconvenient for random
> viewing/ processing of individual results) are there other alternatives to
> using the "File" and "IO" classes that would be faster?

I think whatever you do, as long as you do not get rid of the IO or
improve IO access patterns your performance gains will only be
marginally.  Even a C extension would not help you if you stick with
the same IO patterns.

We should probably learn more about the nature of your processing but
considering that you only write 32,000 * 22 * 80 (estimated line
length) = 56,320,000 bytes (~ 54MB) NOT writing those small files is
probably an option.  Burning 54MB of memory in a structure suitable
for later processing (i.e. you do not need to parse all those small
files) is a small price compared to the large amount of IO you need to
do to read that data back again (plus the CPU cycles for parsing).

The second best option would be to keep the data in memory as before
but still write those small files if you really need them (for example
for later processing).  In this case you could put this in a separate
thread so your main processing can continue on the state in memory.
That way you'll gain another improvement.

For reading of the large files I would use at most two threads because
I assume they all reside on the same filesystem.  With two threads one
can do calculations (e.g. parsing, aggregating) while the other thread
is doing IO.  If you have more threads you'll likely see a slowdown
because you may introduce too many seeks etc.

Kind regards

robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

In This Thread