[#407882] Ruby extremely slow compared to PHP — Mick Jagger <lists@...>

Hello there, how are you? Hope you are fine. I am a PHP programmer

17 messages 2013/06/02

[#407908] TCPServer/Socket and Marshal problem — Panagiotis Atmatzidis <atma@...>

Hello,

18 messages 2013/06/03

[#407946] Is rubyquiz.com dead? — Alphonse 23 <lists@...>

Thread title says everything.

18 messages 2013/06/04

[#408012] Need help understanding recursion. — pedro oliva <lists@...>

Ive been reading Chris Pine's book 'Learn to Program' and its been going

11 messages 2013/06/06

[#408129] Getting Started With Development — Chamila Wijayarathna <cdwijayarathna@...>

I'm new to Ruby Development. I downloaded source from Github, but couldn't

24 messages 2013/06/11
[#408131] Re: Getting Started With Development — Per-erik Martin <lists@...> 2013/06/11

Ruby is often installed on linux, or can be easily installed with the

[#408146] Re: Getting Started With Development — "Chamila W." <lists@...> 2013/06/11

Per-erik Martin wrote in post #1112021:

[#408149] Re: Getting Started With Development — "Carlo E. Prelz" <fluido@...> 2013/06/11

Subject: Re: Getting Started With Development

[#408198] NokoGiri XML Parser — "Devender P." <lists@...>

Hi,

11 messages 2013/06/13

[#408201] trying to load a .rb file in irb — "Eric D." <lists@...>

I am trying to load a ruby program into irb and it will not load.

12 messages 2013/06/13

[#408205] Can I use Sinatra to render dynamic pages? — Ruby Student <ruby.student@...>

Hell Team,

18 messages 2013/06/13
[#408219] Re: Can I use Sinatra to render dynamic pages? — Nicholas Van Weerdenburg <vanweerd@...> 2013/06/14

You should be able to do this without JavaScript by using streaming.

[#408228] Re: Can I use Sinatra to render dynamic pages? — Ruby Student <ruby.student@...> 2013/06/14

Well, I got some good suggestions from everyone here. I thank you all for

[#408275] Compare and sort one array according to another. — masta Blasta <lists@...>

I have two arrays of objects that look something like this:

14 messages 2013/06/17

[#408276] Comparing objects — "Thom T." <lists@...>

How do I compare two objects in Ruby, considering only attributes

15 messages 2013/06/17

[#408307] getting the most out of Ruby — robin wood <lists@...>

I write a lot of scripts in Ruby, most are small simple things but some

13 messages 2013/06/18

[#408309] Creating ruby script exe — Rochit Sen <lists@...>

Hi All,

17 messages 2013/06/18

[#408357] Beginners problem with database and datamapper — cristian cristian <lists@...>

Hi all!

28 messages 2013/06/20

[#408437] How do I input a variable floating point number into Ruby Programs — "Michael P F." <lists@...>

I want to evaluate the following interactively:

10 messages 2013/06/23

[#408518] #!/usr/bin/env: No such file or directory — Todd Sterben <lists@...>

I am new to both linux and ruby. I am using Ubuntu and Ruby 1.9

17 messages 2013/06/27

[#408528] Designing a Cabinet class — Mike Vezzani <lists@...>

Hello all,

12 messages 2013/06/27

[#408561] Find elment in array of hashes — Rodrigo Lueneberg <lists@...>

array = {:id=>1, :price =>0.25} # index[0]

23 messages 2013/06/28

Re: Question About OCR in Ruby vs. Rails

From: Kendall Gifford <zettabyte@...>
Date: 2013-06-03 20:50:32 UTC
List: ruby-talk #407930
On Mon, Jun 3, 2013 at 2:28 PM, Kirk Keeter <kirkkeeter@gmail.com> wrote:

> Team,
>
> I'm working on a project that will involve processing 15,000+ complex
> financial documents.  They are in PDF form.
>
> Unfortunately, the documents are not available in a non-PDF form -- so I
> have to electronically scan the documents and "break them down" into a
> database.
>
Unless the actual data you're trying to extract from the PDF is actually
inside an embedded raster image, you don't need OCR but something that can
parse the PDF file format.


> I'm familiar enough with Rails, that I feel comfortable doing it with the
> Rails framework -- but I'm not sure this is a good use of Rails.
>

So, rails only needs to be involved insofar as you need a web application
to wrap or expose this functionality. Otherwise rails is irrelevant to
"processing 15,000+ complex financial documents".


> Ruby and Javascript are the only programming languages I know, so I'd
> either need to somehow do this as a Rails project (with Ruby and
> javascript), or as a Ruby project.
>
> If I do it as a Ruby project (not rails), can you make recommendations
> about the best way to go about it?
>

Searching came up with the pdf-reader gem (https://github.com/yob/pdf-reader)
which looks like it'd give you plenty of power to parse and extract the
data from your PDF files. Searching also came up with an old
(unmaintained?) gem called pdf-toolkit (
https://rubygems.org/gems/pdf-toolkit) that's a wrapper around the pdftk (
http://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/). I'd just play around
in an IRB session with these tools, trying to parse out the data from a few
representative copies of the documents in question to see what works. Then
you could do some trial passes and benchmark them, etc.



-- 
Kendall Gifford
zettabyte@gmail.com

In This Thread