[#399938] how to read arrary with an array — "Richard D." <lists@...>

Hello. I believe this is basic question, but I'm just starting to learn

19 messages 2012/10/02

[#400050] img src while sending email ruby cgi — Ferdous ara <lists@...>

Hi

16 messages 2012/10/05

[#400351] Drop 1st and last particular character — ajay paswan <lists@...>

What is the most efficient way to drop '#' from the first place and last

15 messages 2012/10/16

[#400374] database part of a desktop application — "Sebastjan H." <lists@...>

Hi,

14 messages 2012/10/16
[#400375] Re: database part of a desktop application — Chad Perrin <code@...> 2012/10/16

On Wed, Oct 17, 2012 at 05:28:39AM +0900, Sebastjan H. wrote:

[#400377] Re: database part of a desktop application — sto.mar@... 2012/10/17

Am 16.10.2012 23:24, schrieb Chad Perrin:

[#400389] Re: database part of a desktop application — Chad Perrin <code@...> 2012/10/17

On Wed, Oct 17, 2012 at 01:39:21PM +0900, sto.mar@web.de wrote:

[#400386] Unable to send attachment, and dealing with multiple attachment — ajay paswan <lists@...>

Hi,

11 messages 2012/10/17

[#400454] Hash with Integer key issue — Wayne Simmerson <lists@...>

Hi Im new to Ruby and am getting some unexpected results from a

18 messages 2012/10/19

[#400535] Name/symbol/object type clash? What is happening here? — Todd Benson <caduceass@...>

It's nonsense code, but I'm curious as to what is going on behind the scenes...

41 messages 2012/10/23

[#400556] Calling a method foo() or an object foo.method_call_here - both — Marc Heiler <lists@...>

Hello.

13 messages 2012/10/24

[#400650] OpenSSL ECDSA public key from private — Nokan Emiro <uzleepito@...>

Hi,

11 messages 2012/10/27

[#400680] Passing folder as argument ARGV? — Joz Private <lists@...>

Is there an easy way to pass multiple files on the command line?

15 messages 2012/10/28
[#400681] Re: Passing folder as argument ARGV? — brad smith <bradleydsmith@...> 2012/10/28

How are you traversing the directory you pass in on the command line ?

[#400697] File.readable? and /proc — Jeff Moore <lists@...>

root@nail:/projects/proc_fs# uname -a

13 messages 2012/10/28

[#400714] Marshal.load weird issue — "Pierre J." <lists@...>

Hi guys

12 messages 2012/10/28

[#400781] bug?: local variable created in if modifier not available in modified expression — "Mean L." <lists@...>

irb(main):001:0> local1 if local1 = "created"

21 messages 2012/10/30
[#400807] Re: bug?: local variable created in if modifier not available in modified expression — Bartosz Dziewoński <matma.rex@...> 2012/10/31

Oh, and in case it wasn't apparent: you can just add

[#400808] Re: bug?: local variable created in if modifier not available in modified expression — Eliezer Croitoru <eliezer@...> 2012/10/31

On 10/31/2012 4:52 PM, Bartosz Dziewoナгki wrote:

[#400809] Re: bug?: local variable created in if modifier not available in modified expression — Robert Klemme <shortcutter@...> 2012/10/31

On Wed, Oct 31, 2012 at 4:28 PM, Eliezer Croitoru <eliezer@ngtech.co.il>wrote:

[#400784] REXML & HTMLentities incorrectly map to UTF-8 — "Mark S." <lists@...>

I have some XML data (UTF 8) that I'm trying to convert into another XML

13 messages 2012/10/30

Re: Looking for suggestions processing and comparing 2 very large files

From: Ruby Student <ruby.student@...>
Date: 2012-10-22 19:20:51 UTC
List: ruby-talk #400510
I forgot to mention that the files are sorted!

On Mon, Oct 22, 2012 at 2:57 PM, Dave Aronson <rubytalk2dave@davearonson.com
> wrote:

> On Mon, Oct 22, 2012 at 2:21 PM, Ruby Student <ruby.student@gmail.com>
> wrote:
>
> > Every week I get a large file, over 50 millions records
>
> The big question is... are these files SORTED, preferably on some
> UNIQUE key, or at least some order that will remain the same from week
> to week?  If yes, then you can use the same sort of techniques as in
> the "diff" utility found on every Unix-derived system and many others.
>  (Windows has something similar but the name escapes me at the moment.
>  IIRC, in an ironic twist, this is one of those cases where the
> Windows command has a *more* cryptic name than its Unix cognate.)  How
> to make a "diff" type program has been covered in gazillions of
> blog/magazine articles, textbooks, etc., so I won't go into detail.
> If you're lucky, you might even be able to just use the ones existing
> on your system, with some shell scripting for glue.
>
> On the other claw, if the records are in random order, then you've got
> a much more serious problem.  In that case, ASSUMING that the keys,
> and number of updated/duplicated records, are both quite small, off
> the top of my head I think I'd:
>
> - Extract the keys from last week's file
> - Ditto for this week's
> - Sort those, assuming the keys are sufficiently smaller that this is
> reasonable
> - Diff them.
> - Extract the actual records from both weeks for any matching keys.
> - Sort and diff, under the same assumption.
>
> Or, if the above data sets are not small enough to make sorting
> reasonable, but the potential dups might at least fit in RAM:
>
> - Extract last week's keys into a Set
> - Initialize a "Needs Further Inspection" (NFI) Set
> - Iterate over this week's records:
>   = Try to find the key in last week's Set of keys
>   = If seen, remove from last weeks and add to NFI Set
>   = Else process as an Insertion
> - Anything left in last week's Set is a Removal
> - (You can now get rid of last week's Set of keys)
> - Extract last week's full records matching NFI keys,
>   putting them in a hash keyed by the key
> - Extract this week's records matching NFI keys,
>   looking them up in the hash
> - Compare the entire records, processing as either
>   Duplicate or Update as needed
>
> -Dave
>
> --
> Dave Aronson, the T. Rex of Codosaurus LLC,
> secret-cleared freelance software developer
> taking contracts in or near NoVa or remote.
> See information at http://www.Codosaur.us/.
>
>


-- 
Ruby Student

In This Thread