[#377882] remove array bracket — Kamarulnizam Rahim <niezam54@...>

Hi when i run my script, the output is as followed:

18 messages 2011/02/02

[#378046] Setter method for Hash value — Rolf Pedersen <rolfhsp@...>

Hi

20 messages 2011/02/03
[#378052] Re: Setter method for Hash value — Brian Candler <b.candler@...> 2011/02/03

Rolf Pedersen wrote in post #979431:

[#378056] Re: Setter method for Hash value — Rolf Pedersen <rolfhsp@...> 2011/02/03

Hi Brian (and others who have contributed with suggestions along the same

[#378144] C extension: How to check if a VALUE is still alive (not being GC'ed)? — Iñaki Baz Castillo <ibc@...>

Hi, I'm coding an async DNS resolver for EventMachine based on udns (a

13 messages 2011/02/05
[#378171] Re: C extension: How to check if a VALUE is still alive (not being GC'ed)? — Tony Arcieri <tony.arcieri@...> 2011/02/06

On Sat, Feb 5, 2011 at 4:02 PM, I=F1aki Baz Castillo <ibc@aliax.net> wrote:

[#378179] Re: C extension: How to check if a VALUE is still alive (not being GC'ed)? — Iñaki Baz Castillo <ibc@...> 2011/02/06

2011/2/6 Tony Arcieri <tony.arcieri@medioh.com>:

[#378199] Choosing an office suite — Hilary Bailey <my77elephants@...>

I am trying to decide which office suite to choose from. The only

30 messages 2011/02/07
[#378229] Re: Choosing an office suite — Phillip Gawlowski <cmdjackryan@...> 2011/02/07

On Mon, Feb 7, 2011 at 11:15 AM, Hilary Bailey <my77elephants@gmail.com> wrote:

[#378202] making hash key from arrays — Arihan Sinha <arihan_sinha@...>

Hi All,

11 messages 2011/02/07

[#378254] "permission denied" happening too often — Peter Bailey <pbailey@...>

Hello,

15 messages 2011/02/08
[#378256] Re: "permission denied" happening too often — Anurag Priyam <anurag08priyam@...> 2011/02/08

> I've got Ruby scripts that have been working fine for years now. But,

[#378257] Re: "permission denied" happening too often — Markus Schirp <mbj@...> 2011/02/08

You can also try to strace your script. In the logs you'll find the system

[#378259] Re: "permission denied" happening too often — Peter Bailey <pbailey@...> 2011/02/08

Markus Schirp wrote in post #980289:

[#378307] undefined class/module YAML::PrivateType - Error — "Priya D." <dharsininitt@...>

Hi,

11 messages 2011/02/09

[#378341] System calls with ` in parameters — "Gerad S." <geradstemke@...>

Hi All,

12 messages 2011/02/09

[#378618] Defining class methods — Tony Arcieri <tony.arcieri@...>

It seems there are 3 ways of defining class methods (at least in common

12 messages 2011/02/16

[#378685] LiveAST: a pure Ruby 1.9.2 library obtaining live abstract syntax trees — "James M. Lawrence" <quixoticsycophant@...>

= LiveAST

13 messages 2011/02/18

[#378753] posix_mq : Problem installing on HPUX — Tadeusz Bochan <tad.bochan@...>

Hello,

13 messages 2011/02/20

[#378890] a, b = Array.new(2).map!{|x| data.dup} — Stefan Salewski <mail@...>

I think I can replace this code

19 messages 2011/02/23
[#378892] Re: a, b = Array.new(2).map!{|x| data.dup} — niklas | brueckenschlaeger <niklas@...> 2011/02/23

Are you sure you can't rework your code to *not* copy data 5x? I assume

[#378899] Re: a, b = Array.new(2).map!{|x| data.dup} — Stefan Salewski <mail@...> 2011/02/23

On Thu, 2011-02-24 at 07:00 +0900, niklas | brueckenschlaeger wrote:

[#378941] Automatic question generator libs in Ruby Language — Sniper Abandon <sathish.salem.1984@...>

is there any Automatic question generator libraries in Ruby Language ?

20 messages 2011/02/24
[#379058] Re: Automatic question generator libs in Ruby Language — Sniper Abandon <sathish.salem.1984@...> 2011/02/27

suppose if i have a paragraph (arround 250 words)

[#379172] Re: Automatic question generator libs in Ruby Language — Shadowfirebird <shadowfirebird@...> 2011/03/01

> i want to get all the possible question from that paragraph

[#379174] Re: Automatic question generator libs in Ruby Language — Peter Zotov <whitequark@...> 2011/03/01

On Tue, 1 Mar 2011 19:31:36 +0900, Shadowfirebird wrote:

[#379175] Re: Automatic question generator libs in Ruby Language — Adam Prescott <mentionuse@...> 2011/03/01

On Tue, Mar 1, 2011 at 10:55 AM, Peter Zotov <whitequark@whitequark.org>wrote:

[#379177] Re: Automatic question generator libs in Ruby Language — Peter Zotov <whitequark@...> 2011/03/01

On Tue, 1 Mar 2011 20:02:13 +0900, Adam Prescott wrote:

[#379179] Re: Automatic question generator libs in Ruby Language — Adam Prescott <mentionuse@...> 2011/03/01

On Tue, Mar 1, 2011 at 12:28 PM, Peter Zotov <whitequark@whitequark.org>wrote:

[#378949] why is $1 in a grep() equal to nil? — 7stud -- <bbxx789_05ss@...>

class DataSource

16 messages 2011/02/24
[#378953] Re: why is $1 in a grep() equal to nil? — Eric Christopherson <echristopherson@...> 2011/02/25

On Thu, Feb 24, 2011 at 2:59 PM, 7stud -- <bbxx789_05ss@yahoo.com> wrote:

[#378958] parsing rule for this code? — 7stud -- <bbxx789_05ss@...>

1)

11 messages 2011/02/25

[#379000] Symbol#to_proc helping out with #select to beat Scala-s solution — Jarmo Pertman <jarmo.p@...>

Hey!

9 messages 2011/02/25

[#379074] finding a tag in a binary file — rob stanton <tnotnats@...>

I have a binary file in which I'd like to find multiple strings of 10

12 messages 2011/02/27

Re: Fast alternatives to "File" and "IO" for large numbers of files ?

From: Philip Rhoades <phil@...>
Date: 2011-02-26 14:33:51 UTC
List: ruby-talk #379043
People,

Thanks to all who responded - I have concatenated the replies for ease
of response:


On 2011-02-24 19:15, pp wrote:
>
>> Date: Thu, 24 Feb 2011 12:09:48 +0900 From: phil@pricom.com.au
>> Subject: Fast alternatives to "File" and "IO" for large numbers of
>> files ? To: ruby-talk@ruby-lang.org
>>
>> People,
>>
>> I have script that does:
>>
>> - statistical processing from data in 50x32x20 (32,000) large input
>> files
>>
>> - writes a small text file (22 lines with one or more columns of
>> numbers) for each input file
>>
>> - read all small files back in again for final processing.
>>
>> Profiling shows that IO is taking up more than 60% of the time -
>> short of making fewer, larger files for the data (which is
>> inconvenient for random viewing/ processing of individual results)
>> are there other alternatives to using the "File" and "IO" classes
>> that would be faster?
>>
>> Thanks,
>>
>> Phil.
>>
> Hi, could you be more specific on what do you do with the small
> files, read/write in per-line or whole file?for rapid file ops due to
> file system heaps(or sort) may be slow anyway.so maybe you can try
> less file ops, for example, write a file with a single string may
> serve the io cache well. or, maybe, have a lot of files to write/read
> in a new thread, so that IO may not interfere your none-IO
> calculations, if you have some


Each individual small file is written in one go ie file opened, written 
to and closed - there is no re-opening and more writing.  See later for 
current approach.


On 2011-02-24 19:19, Peter Zotov wrote:
>
> I can think of two approaches here.
>
> First, you can write one large file (perhaps creating it in memory
> first) and then splitting it afterwards.
>
> Second, if you're on *nix, you can write your output files to a
> tmpfs.
>
> Both should reduce number of seeks and improve performance.


After staying up all night, I eventually settled on a hash table 
outputted via YAML to ONE very large file.  I need a human friendly form 
for spot checking of statistical calculations so I have used a hash 
table and the key lets me find a particular calculation in the big file 
in the same way I would have found it in the similarly named 
subdirectories.  I haven't actually implemented this on the full system 
yet so it will be interesting to see if Vim can handle opening a 32,000 
x 23 line file (and bigger actually if each individual small file is 
bigger than a 23x1 array).


On 2011-02-24 19:52, Robert Klemme wrote:
>
> I think whatever you do, as long as you do not get rid of the IO or
> improve IO access patterns your performance gains will only be
> marginally.  Even a C extension would not help you if you stick with
> the same IO patterns.


Right.


> We should probably learn more about the nature of your processing
> but considering that you only write 32,000 * 22 * 80 (estimated line
> length) = 56,320,000 bytes (~ 54MB) NOT writing those small files is
> probably an option.  Burning 54MB of memory in a structure suitable
> for later processing (i.e. you do not need to parse all those small
> files) is a small price compared to the large amount of IO you need
> to do to read that data back again (plus the CPU cycles for
> parsing).


Yep - I came to that conclusion too and went for one big hash table and 
one file.


> The second best option would be to keep the data in memory as before
> but still write those small files if you really need them (for
> example for later processing).  In this case you could put this in a
> separate thread so your main processing can continue on the state in
> memory. That way you'll gain another improvement.


Interesting idea but I'm not sure how to actually implement that but I 
will see how the hash table/one file approach goes first.


> For reading of the large files I would use at most two threads
> because I assume they all reside on the same filesystem.  With two
> threads one can do calculations (e.g. parsing, aggregating) while the
> other thread is doing IO.  If you have more threads you'll likely see
> a slowdown because you may introduce too many seeks etc.


OK, this idea might help for the next stage.


On 2011-02-24 20:02, Brian Candler wrote:
> If you read in all the data files and build a single Ruby data
> structure which contains all the data you're interested in, you can
> dump it out like this:
>
> File.open("foo.msh","wb") {|f|  Marshal.dump(myobj, f) }


I did read up about this stuff but I have to have human readable files.


> And you can reload it in another program like this:
>
> myobj = File.open("foo.msh","rb") {|f|  Marshal.load(f) }
>
> This is*very*  fast.


I might check this out as an exercise!

Thanks to all again!

Phil.
-- 
Philip Rhoades

GPO Box 3411
Sydney NSW	2001
Australia
E-mail:  phil@pricom.com.au

In This Thread