[#399938] how to read arrary with an array — "Richard D." <lists@...>

Hello. I believe this is basic question, but I'm just starting to learn

19 messages 2012/10/02

[#400050] img src while sending email ruby cgi — Ferdous ara <lists@...>

Hi

16 messages 2012/10/05

[#400351] Drop 1st and last particular character — ajay paswan <lists@...>

What is the most efficient way to drop '#' from the first place and last

15 messages 2012/10/16

[#400374] database part of a desktop application — "Sebastjan H." <lists@...>

Hi,

14 messages 2012/10/16
[#400375] Re: database part of a desktop application — Chad Perrin <code@...> 2012/10/16

On Wed, Oct 17, 2012 at 05:28:39AM +0900, Sebastjan H. wrote:

[#400377] Re: database part of a desktop application — sto.mar@... 2012/10/17

Am 16.10.2012 23:24, schrieb Chad Perrin:

[#400389] Re: database part of a desktop application — Chad Perrin <code@...> 2012/10/17

On Wed, Oct 17, 2012 at 01:39:21PM +0900, sto.mar@web.de wrote:

[#400386] Unable to send attachment, and dealing with multiple attachment — ajay paswan <lists@...>

Hi,

11 messages 2012/10/17

[#400454] Hash with Integer key issue — Wayne Simmerson <lists@...>

Hi Im new to Ruby and am getting some unexpected results from a

18 messages 2012/10/19

[#400535] Name/symbol/object type clash? What is happening here? — Todd Benson <caduceass@...>

It's nonsense code, but I'm curious as to what is going on behind the scenes...

41 messages 2012/10/23

[#400556] Calling a method foo() or an object foo.method_call_here - both — Marc Heiler <lists@...>

Hello.

13 messages 2012/10/24

[#400650] OpenSSL ECDSA public key from private — Nokan Emiro <uzleepito@...>

Hi,

11 messages 2012/10/27

[#400680] Passing folder as argument ARGV? — Joz Private <lists@...>

Is there an easy way to pass multiple files on the command line?

15 messages 2012/10/28
[#400681] Re: Passing folder as argument ARGV? — brad smith <bradleydsmith@...> 2012/10/28

How are you traversing the directory you pass in on the command line ?

[#400697] File.readable? and /proc — Jeff Moore <lists@...>

root@nail:/projects/proc_fs# uname -a

13 messages 2012/10/28

[#400714] Marshal.load weird issue — "Pierre J." <lists@...>

Hi guys

12 messages 2012/10/28

[#400781] bug?: local variable created in if modifier not available in modified expression — "Mean L." <lists@...>

irb(main):001:0> local1 if local1 = "created"

21 messages 2012/10/30
[#400807] Re: bug?: local variable created in if modifier not available in modified expression — Bartosz Dziewoński <matma.rex@...> 2012/10/31

Oh, and in case it wasn't apparent: you can just add

[#400808] Re: bug?: local variable created in if modifier not available in modified expression — Eliezer Croitoru <eliezer@...> 2012/10/31

On 10/31/2012 4:52 PM, Bartosz Dziewoナгki wrote:

[#400809] Re: bug?: local variable created in if modifier not available in modified expression — Robert Klemme <shortcutter@...> 2012/10/31

On Wed, Oct 31, 2012 at 4:28 PM, Eliezer Croitoru <eliezer@ngtech.co.il>wrote:

[#400784] REXML & HTMLentities incorrectly map to UTF-8 — "Mark S." <lists@...>

I have some XML data (UTF 8) that I'm trying to convert into another XML

13 messages 2012/10/30

Looking for suggestions processing and comparing 2 very large files

From: Ruby Student <ruby.student@...>
Date: 2012-10-22 18:21:23 UTC
List: ruby-talk #400507
Team,

Every week I get a large file, over 50 millions records with record length
> 150 chars. These files can actually be over 15GB.
I need to take the new file and compare it against the one from the
previous week.
Reading the files into two arrays would make the process a bit easier, but
the files are too large and when I try using arrays, the process crashes
with out of storage messages.

I am looking for suggestions on how to efficiently perform the following
process:


   1. Compare each record from the file from this week against last week
   file
   2. If every record are the same, do nothing or just indicate so: *SAM*
   3. If there is any duplicate records on the new file, output the record
   to a file of dups
   4. If there are any new record, (records found on the new file, but not
   on last week file) output: *INS* followed by the record
   5. If there is a record which is found on last week file (old file) but
   not on this week file, output: *DEL* followed by the record
   6. If there is a record with the same key (the first 13 chars) on both
   files, but the rest of the record is different, output: *UPD* followed
   by the record

Hey, I can do all of the above doing reading each record from both files
and do different type of comparison/match, but I was wondering if there is
an efficient way to do this. I was looking for suggestions.

Thank you

-- 
Ruby Student

In This Thread

Prev Next