[#35036] Intentional Programming — "John" <nojgoalbyspam@...>
Hi all
Noone here interested in Intentional Programming? Anybody have an idea of
[#35057] Re: Ruby Regular Expression problem — "Olonichev Sergei" <olonichev@...>
[#35058] here documents as method parameter problem — mrp@...
in the following code, the first 2 "puts" work, but the last produces:
[#35060] Just noticed: 1.6.7 — Dave Thomas <Dave@...>
[#35112] RDoc question — Michael Davis <mdavis@...>
I have a question about RDoc. I would like to reference an external
Dave Thomas wrote:
Michael Davis <mdavis@sevainc.com> writes:
On Sun, Mar 03, 2002 at 08:07:52AM +0900, Dave Thomas wrote:
[#35118] TestUnit 0.1.4 — <nathaniel@...>
From the README:
[#35162] string to array and back — Ron Jeffries <ronjeffries@...>
I am needing to convert strings to arrays of bytes and back. I see pack and
[#35175] Readline loses stdin? — Massimiliano Mirra <list@...>
When I run the following code...
[#35182] RubyStudio UI Question — Chris Gehlker <gehlker@...>
I've been noodling all day about how the editing window in RubyStudio should
[#35186] ruby-dev summary 16040-16125 — Minero Aoki <aamine@...>
Hi all,
[#35200] Ruby and Parrot in The Perl Review — "Paul E.C. Melis" <melis@...>
Hi all,
[#35244] require 'file path' ? — Audun Tonnesen <aut@...2i.net>
This is a stupid question from a newbie. I have installed the windows version, and I wanted to try a graphic example from the 'pragmatic' using tk.
[#35276] Please help me!!!!!!!!!!!!! — rahmanian mohsen <rahmanian60@...>
where I can ask a question?
[#35322] How Ruby differs from Other Languages? — S Sridevi <shridevi@...>
Hi,
[#35341] Ruby Serial Ports under Windows — David Wende <DWende@...>
I've started using Ruby a week ago as my first OOP
[#35364] file reading impossibly slow? — Ron Jeffries <ronjeffries@...>
So I'm doing this benchmark to work with my set program. Part of the problem is
On Thu, 07 Mar 2002 13:52:41 GMT, nobu.nokada@softhome.net wrote:
I'm jumping into this thread late, so apologies if I'm repeating well-known
[#35378] RDoc now generates .chm files — Dave Thomas <Dave@...>
[#35386] What are the Ruby 1.8 plans? — Matt Armstrong <matt@...>
There have been two recent threads about slow Ruby IO under Win32
[#35388] Re: file reading impossibly slow? 1.6.6 vs. 1.6.2 numbers — Pete McBreen <pete@...>
Matt Armstrong wrote
[#35393] net/http; 302 — Tobias Reif <tobiasreif@...>
Hi,
[#35409] Talking Trash About Ruby — Lyle Johnson <lyle@...>
All,
[#35429] Interesting link on static/dynamic typing... — Robert Feldt <feldt@...>
...relevant to Ruby compared to other languages discussion:
Robert Feldt <feldt@ce.chalmers.se> writes:
On Fri, Mar 08, 2002 at 05:34:43PM +0900, Robert Feldt wrote:
On 3/8/02 7:56 AM, "Paul Brannan" <paul@atdesk.com> wrote:
[#35430] Spooky backtick bug w/large file support — jonathan@... (Jonathan Baker)
Discovered that the patch for large files caused a nasty side effect:
[#35439] Opinions wanted (long) — Massimiliano Mirra <list@...>
These are some embryonic ideas that generated after a discussion with
[#35443] RE: Text stuff and appending to a file — Wyss Clemens <WYS@...>
By heart I would guess:
[#35460] Spam, ruby-talk, and me — Dave Thomas <Dave@...>
Dave Thomas <Dave@PragmaticProgrammer.com> writes:
On Sat, Mar 09, 2002 at 07:12:56AM +0900, Yohanes Santoso wrote:
[#35477] is there any way ... — Ron Jeffries <ronjeffries@...>
is there any way we could fix the bug whereby a..b is /longer/ than a...b ?
[#35478] file closing — Ron Jeffries <ronjeffries@...>
So I'm going to create an object that wraps a file and gives it some new kind of
[#35519] Open Watcom — Massimiliano Mirra <list@...>
I'm not directly interested in this since I run Linux, but I stumbled
[#35537] Confusion — David Corbin <dcorbin@...>
The following is from my debugging through xmlc.rb
David Corbin wrote:
[#35553] Fiddling with ruby/tk canvas — Thomas Sondergaard <tsondergaard@...>
I've got a TkCanvas with a number of TkcImage items on it. I've set it
[#35557] Issue with trapping errors in instance_eval — Michael Davis <mdavis@...>
I need to be able to trap error during a call to instance_eval. Here is
[#35567] Which GUI interface to use! — "Douglas J van Vliet" <dougie@...>
Hi,
[#35579] RE: WIN32OLE and LDAP — "Morris, Chris" <chris.morris@...>
> The new version 0.4.2 of Win32OLE has WIN32OLE.bind method.
ehlo.
[#35580] packages (jar-alike) — Vladimir Dozen <Vladimir.Dozen@...>
ehlo.
[#35608] Using 'uniq' question — edjbaker@... (Ed Baker)
If I have an array of data, where each record looks like the example
[#35609] Config::CONFIG['build_alias'] ? — Massimiliano Mirra <list@...>
Which are the build aliases of the platforms ruby runs on?
[#35636] gzip again... — Massimiliano Mirra <list@...>
I can't seem to process from within Ruby with gzip files that exceed a
On Tue, Mar 12, 2002 at 12:34:29PM +0900, Massimiliano Mirra wrote:
[#35639] Controlling Mouse/Keyboard — "Norman Makoto Su" <normsu@...>
Hi,
[#35653] Some potential RCRs — "Bob Alexander" <bobalex@...>
Here are a few thing I am considering submitting as RCRs. I'm looking for comments to help decide whether to make them official, so please let know what you think is good and bad about these...
Following are summarized responses to the passle of RCR candidates I posted
Hi,
On Wed, Mar 13, 2002 at 03:58:01AM +0900, Yukihiro Matsumoto wrote:
Hello --
[#35692] Module Syntax — "Sean O'Dell" <sean@...>
Being primarily a C++ programmer, I'm happy with Ruby's Module concept
[#35694] rpkg 0.3 — Massimiliano Mirra <list@...>
>
james@rubyxml.com writes:
Dave Thomas <Dave@PragmaticProgrammer.com> writes:
On Wed, Mar 13, 2002 at 08:48:03PM +0900, Piers Cawley wrote:
[#35718] RE: testunit, was RE: [ANN] rpkg 0.3 — Thomas Sdergaard <tsondergaard@...>
> I believe the reason is because your standard Ruby
It is the merging of Lapidary and RubyUnit (backward compatibility APIs for
Rich Kilmer [mailto:rich@infoether.com] wrote:
On Wed, Mar 13, 2002 at 11:48:15PM +0900, nathaniel@talbott.ws wrote:
[#35742] how do i: add_observer( { small bit of code } ) — mrp@...
observing an object, i want to run a small bit of code when update is
[#35757] process two arrays pairwise? — Ron Jeffries <ronjeffries@...>
Is there somewhere in Ruby a way to process the elements of two arrays (of equal
[#35766] Ruby/zlib 0.5.0 — UENO Katsuhiro <unnie@...>
Hello,
[#35776] Anyone using the WASTE editing library with Ruby? — ptkwt@...1.aracnet.com (Phil Tomson)
[#35787] testunit - setup -> set_up ? — "Morris, Chris" <chris.morris@...>
I'm just starting to use testunit instead of rubyunit ... I noticed with an
Morris, Chris [mailto:chris.morris@snelling.com] wrote:
"Nathaniel Talbott" <nathaniel@talbott.ws> writes:
dave@thomases.com [mailto:dave@thomases.com] wrote:
"Nathaniel Talbott" <nathaniel@talbott.ws> writes:
Hi,
On Fri, 15 Mar 2002 02:10:16 GMT, "Nathaniel Talbott" <nathaniel@talbott.ws>
[#35790] RE: Some potential RCRs — "Morris, Chris" <chris.morris@...>
> > - - A much better solution would be added support for file "type"
[#35825] Test::Unit — "Steve Merrick" <Steve.Merrick@...>
Can someone explain to me the *details* of how to use Test::Unit,
[#35826] Newbie -- Please help!! — "Firestone, Mark - Technical Support" <mark.firestone@...>
First an apology for asking what are probably lamer questions.
[#35845] Debugger problem — "Nathaniel Talbott" <nathaniel@...>
Perhaps this is a known issue, but given the following program:
[#35866] new user confused by multidimensional Ruby hashes — Ian Macdonald <ian@...>
Hi all,
[#35879] language addition: automatic 'new' — Vardhan Varma <vardhan.v@...>
[#35898] camelCase and underscore_style — "Morris, Chris" <chris.morris@...>
First, a question. If underscore_style is the Ruby norm for methods and the
Phil Tomson wrote:
Guy N. Hurst [mailto:gnhurst@hurstlinks.com] wrote:
On Sat, Mar 16, 2002 at 09:14:30AM +0900, Nathaniel Talbott wrote:
[#35900] Re: camelCase and underscore_style — Andrew Hunt <andy@...>
>I noticed code samples in the pickaxe book used camelCase. Dave/Andy, was
[#35929] Reading from a string — David Corbin <dcorbin@...>
Is there a class that provides the IO interface from a String? It seems
Hi,
On Sat, 2002-03-16 at 07:33, nobu.nokada@softhome.net wrote:
On Sat, Mar 16, 2002 at 05:28:27PM +0900, Thomas Sondergaard wrote:
[#35937] rpkg 0.3.1 — Massimiliano Mirra <list@...>
rpkg is a system for quick installation, removal, and browsing of Ruby
On 3/15/02 6:00 PM, "Massimiliano Mirra" <list@chromatic-harp.com> wrote:
[#35947] Newbie problem — Liorean <Liorean@...>
In a small program I wrote at school, using mswin32, I used $stdin to get
Hi,
At 15:43 2002-03-16 +0900, nobu.nokada@softhome.net wrote:
Hi,
At 17:23 2002-03-16 +0900, nobu.nokada@softhome.net wrote:
Hi,
At 16:26 2002-03-17 +0900, nobu.nokada@softhome.net wrote:
[#35973] how does gtk curve work? — Massimiliano Mirra <list@...>
I'm trying to initialize the curve in the Gtk::Curve widget.
[#35986] Good example Ruby program? — jennyw <jennyw@...>
I was hoping someone could suggest a good example Ruby program for me to
I haven't read it yet (planning on buying it), but I understand that
I totally agree with Dennis. I own "The Ruby Way" by Hal Fulton.
[#35989] ANN: Locana GUI and GUI Builder version 0.81 — Michael Davis <mdavis@...>
I am pleased to announce release 0.81 of Locana. Locana is a GUI
[#35992] XPath — Michael Schuerig <schuerig@...>
Tobias Reif wrote:
[#36004] Rubicon -> Test::Unit — Bil Kleb <W.L.Kleb@...>
Are there plans a foot for Rubicon to use Test::Unit?
[#36034] Mini Rant: Indenting — Thomas Hurst <tom.hurst@...>
Why is it that I see *so* much code like:
[#36042] instance_respond_to? — Thomas Sondergaard <tsondergaard@...>
Shouldn't module have an instance_respond_to? method?
[#36049] web templating for static sites? — Massimiliano Mirra <list@...>
I'm using the Template Toolkit for generating static web sites and I
> I am fascinated by the approach of tools like Iowa or Walrus, and how
Massimiliano Mirra <list@chromatic-harp.com> wrote in message news:<20020317204303.C5717@prism.localnet>...
On Mon, Mar 18, 2002 at 04:23:46PM +0900, Patrick May wrote:
Massimiliano Mirra <list@chromatic-harp.com> wrote in message news:<20020319153126.B3694@prism.localnet>...
How do you Ruby/HTML guys deal with the different versions of
Albert Wagner wrote:
Jason Voegele wrote:
> > I apologize for continuing an off-topic thread, but I've heard several
* Carl Parrish (cparrish@cox.net) wrote:
> People who don't care about current standards or nice
[#36052] Xml Serialization for Ruby — "Chris Morris" <chrismo@...>
=Xml Serialization for Ruby
On Mon, Mar 18, 2002 at 05:20:56AM +0900, Chris Morris wrote:
> </MyClass>
On Mon, Mar 18, 2002 at 12:51:14PM +0900, Chris Morris wrote:
> These two are mostly equivalent. The main reasons I prefer the first
> I agree that that smells ... the inconsistency of "root element is class
Hi Chris,
[#36067] eval/Module question — David Corbin <dcorbin@...>
If I have a String src that is similar to the following:
Let me try again.
[#36093] Test::Unit GTk GUI — Bil Kleb <W.L.Kleb@...>
While trying to get a "green bar" with Test::Unit using the GTk
Bil Kleb wrote:
[#36107] Extconf.rb: how to add specific include/lib dirs? — "Paul E.C. Melis" <melis@...>
Is it possible to specify a complete include or lib directory that is needed
[#36142] Why is Ruby so slow? — Venherm.Borchers@... (Venherm Borchers)
WHY IS RUBY SO SLOW?
Venherm Borchers wrote:
Kent Dahl wrote:
[#36157] Development of Windows version of Ruby — ptkwt@...1.aracnet.com (Phil Tomson)
Now that we've dumped the cygwin requirement for the Windows version of
On Tue, 19 Mar 2002 14:05:27 GMT, "Albert L. Wagner" <alwagner@uark.edu> wrote:
Dennis Newbold <dennisn@pe.net> wrote in message news:<Pine.GSO.3.96.1020320113603.22242B-100000@shell2>...
Some thoughts on the 2 first Windows issues, plus a 4th one...
"Christian Boos" <cboos@bct-technology.com> writes:
Hi,
nobu.nokada@softhome.net writes:
Hi,
A couple of times I've posted questions to this list,
Something like all the PC programs in the world are written for Windows.
[#36171] Weekly RCR Summary — RubyGarden@...
This is an automatically generated list of Ruby Change Requests.
[#36175] Wanted: slick rubyesqe algorithm — David Corbin <dcorbin@...>
Say I've got a list of Objects
On Tue, Mar 19, 2002 at 09:36:59AM +0900, David Corbin wrote:
[#36211] dots in Dir.entries — matz@... (Yukihiro Matsumoto)
Hi,
At 4:22 PM +0900 3/19/02, Yukihiro Matsumoto wrote:
[#36231] style choice — Ron Jeffries <ronjeffries@...>
A style question for the community ... which of the following do you prefer, and
[#36271] RE: Development of Windows version of Ruby — "Morris, Chris" <chris.morris@...>
Can I try to steer the thread here?
[#36294] RE: Development of Windows version of Ruby — "Morris, Chris" <chris.morris@...>
Here's an overview of fork on Windows with Perl. (Google rocks):
[#36298] Other than fork, what is vc++ ruby win missing? — "Morris, Chris" <chris.morris@...>
Is there anything other than fork (remember popen works) that Ruby on
On Tue, 19 Mar 2002 18:27, Morris, Chris wrote:
[#36301] another Test::Unit question — Pat Eyler <pate@...>
Okay, so I'm moving along doing unit testing to build a small application
[#36310] Re: Why is Ruby so slow? - Solution — Venherm.Borchers@... (Venherm Borchers)
Dear colleagues,
[#36318] Increased traffic? — Massimiliano Mirra <list@...>
Is it just my impression or has the traffic on the list increased
[#36322] Why no String#bin? — ptkwt@...1.aracnet.com (Phil Tomson)
We've got:
[#36345] ANN: REXML 2.0 — Sean Russell <ser@...>
I have a feeling there will only be three major revisions of REXML. Version
[#36396] Archives — Massimiliano Mirra <list@...>
Those interested in archive-like containers like Tar and Zip but also
[#36401] building latest cvs ruby on win — "Morris, Chris" <chris.morris@...>
I'm trying to build the latest cvs on Windows with msvc ... I found the
[#36402] Ruby/Tk on OSX — jack_d_herrington@... (Jack Herrington)
Has anyone had any success running Ruby/Tk on OSX.
[#36423] Linux Journal Ruby article — mike@... (Mike Stok)
I just noticed this ... http://www.linuxjournal.com//article.php?sid=5915
Mike Stok wrote:
[#36448] Ruby speak — Carl Parrish <cparrish@...>
[#36491] erb: how to include(import,load) another erb file? — Norbert Gawor <ngawor@...>
Sorry for not being able to find this by myself,
[#36580] www.selfruby.de — "Jonas Hoffmann" <ruby@...>
Hallo !
[#36597] RE: Hello! Can you help me? — "Morris, Chris" <chris.morris@...>
> -Can you tel me that ,is a way for convert Ruby source to .EXE file?
[#36608] Unicode in Regexp followup — ser@... (Sean Russell)
In fact, the unicode regular expressions have problems with the
[#36610] Re: Windows version of Ruby (proposals) — Ron Jeffries <ronjeffries@...>
On Thu, 21 Mar 2002 14:11:55 GMT, Dave Thomas <Dave@PragmaticProgrammer.com> wrote:
Another dumb question about syntax, but I haven't been
[#36613] Unicode in Regexp (a question) — ser@... (Sean Russell)
Does anybody know how to specify UTF-8 characters in regular
[#36617] Etiquette of RCRs — Chris Gehlker <gehlker@...>
I have a specific addition I would like to see in mkmf. I also have some
Hi,
On 3/22/02 8:30 PM, "nobu.nokada@softhome.net" <nobu.nokada@softhome.net>
[#36642] extracting multi-line <a> tags? — Tobias DiPasquale <anany@...>
Hi all,
[#36644] ruby-gtk fatal error — Patrik Sundberg <ps@...>
hi,
[#36645] Ruby for Mac OS 10.1 — Jim Freeze <jim@...>
Hi:
Chris Gehlker <gehlker@fastq.com> wrote:
On 3/25/02 1:19 PM, "Jayce Piel" <jayce@mosx.net> wrote:
On Tue, Mar 26, 2002 at 10:25:57AM +0900, Chris Gehlker wrote:
[#36670] sendmail undisclosed-recipients — "Chris Morris" <chrismo@...>
Have the following code:
[#36690] CGI testing — "Norman Makoto Su" <normsu@...>
Hi,
[#36691] ruby-dev summary 16301-16500 — Minero Aoki <aamine@...>
Hi all,
[#36712] ruby -n -e doesn't find things ... — Ron Jeffries <ronjeffries@...>
I'm confused. This line
[#36721] How to split an Array efficiently? + non-inplace Delete_if —
Hi,
[#36726] A mode-ruby.el for xemacs ? — jayce@... (Jayce Piel)
[#36733] dbm/gdbm/sdbm — Urban Hafner <ruby-lists@...>
Hey hey,
[#36768] Re: Difference between 'do' and 'begin' — Clemens Hintze <c.hintze@...>
In <slrna9ulvi.f2h.mwg@fluffy.isd.dp.ua> Wladimir Mutel <mwg@fluffy.isd.dp.ua> writes:
james@rubyxml.com wrote:
On Tue, 2002-03-26 at 11:27, Kent Dahl wrote:
Hi,
> Using direct translation, we use "block" for the former (do), "statement
james@rubyxml.com wrote:
[#36771] yield in iterator's body - — Wladimir Mutel <mwg@...>
Can we write in Ruby something like this :
[#36808] Error calling Tk in a loop — <james@...>
I'm trying to write some code that pops up a Tk window when for certain
Hi,
[#36829] Experience with larger web-projects in Ruby? — "Jonas Delfs" <jonas@...>
Hi -
[#36841] RE: Windows version of Ruby (proposals) — "Andres Hidalgo" <sol123@...>
I believe that Ruby has a place in windows (Office), I happened to have
I looked in the FAQ and didn't see this, but can someone give me their opinion
[#36863] Hash.new(Hash.new) doesn't use Hash.new as default value — "Jonas Delfs" <jonas@...>
Hi -
>>>>> "J" == Jonas Delfs <jonas@NOSPAMdelfs.dk> writes:
"Yukihiro Matsumoto" <matz@ruby-lang.org> skrev i en meddelelse
"Jonas Delfs" <jonas@NOSPAMdelfs.dk> writes:
Hi,
[#36871] RE: File change notification? — Dale Martenson <dmartenson@...>
Hello,
[#36879] Ruby SWIG problems — "James Adam" <james@...>
Hi
[#36882] Webrick book ?? — Markus Jais <mjais@...>
hello
[#36910] Variable variables in Ruby? — "Jonas Delfs" <jonas@...>
Hi -
[#36934] parsing strings into numbers — "Chris Morris" <chrismo@...>
Ruby knows that:
[#36944] Anyone in or around Bergen want to talk Ruby? — Dave Thomas <Dave@...>
[#36977] Dir.glob => Enumerable.glob? — Thomas Sdergaard <thomass@...>
Why not?
[#36990] rubydbi in debian? — Tom Robinson <tom@...>
It would be nice to have this as a package in debian. Any ruby debian
On Fri, 29 Mar 2002 12:00:22 GMT, Massimiliano Mirra
[#37005] Matz's book ?? — Markus Jais <info@...>
hello
[#37009] optimization questions — George Moschovitis <gmosx@...>
Hi there,
[#37036] My sprintf methods and props for FXRuby — Jonathan Gillette <jonathan@...>
I've had a blast with Ruby over the past seven or so months and thought I'd pass on a bit of favorite code. I always find myself laying out my strings with sprintf and found it handy to add sprintf methods to arrays, hashes, and the REXML::Element class. I thought someone else might find this useful or might clue me in on some optimizations.
[#37038] RUNIT: mixin instead of inheriting? — ptkwt@...1.aracnet.com (Phil Tomson)
[#37043] Creating an instance of a class without calling initialize() — harryo@... (Harry Ohlsen)
For reasons that I'll go into if someone's interested,
[#37060] Re: Creating an instance of a class without calling initialize() — "Joseph Girgis" <jbgirgis@...>
Why don't you want the initialize() method?
"Joseph Girgis" <jbgirgis@msn.com> wrote in message news:<OE74EzcT1ePyu91wCJW00002df4@hotmail.com>...
[#37080] Why isn't Math object-oriented? — Bil Kleb <W.L.Kleb@...>
So I'm reading along in the Pixaxe book (yet again), and I am told
On Sun, 31 Mar 2002 15:44:46 GMT, matz@ruby-lang.org (Yukihiro Matsumoto) wrote:
Hi,
[#37093] Timezone name -> UTC hour offset? — Sean Chittenden <sean@...>
Is there any way of going from -0800 to (PST|PDT) ? Is there a module
[#37115] rbot - ruby IRC bot — Tom Gilbert <tom@...>
Hi,
i've not used ruby for that long, but i've found the need to do the
[#37119] Detecting class variables.... — Sean Chittenden <sean@...>
I have a class that will have many objects and each object has a
Hello --
> > I have a class that will have many objects and each object has a
Hi --
[#37121] String#begins?(s) — timsuth@... (Tim Sutherland)
class String
Hi,
Why is Ruby so slow?
WHY IS RUBY SO SLOW?
I implemented a _DataReader_ class in Ruby and Python. The reader:
- reads in a CSV file, in this case tab-separated,
- gets variable names from the header line,
- splits up each row into single items,
- checks for and counts missing values,
- determines the type of the item - using regular expressions -
(integer, float, or else classified as string), and
- counts the number of unique items in each column
finally outputting a short report on what it found. And this result is
quite useful even if you later on perform data mining tasks on these
data utilizing other tools.
The implementation is straightforward with no attempts to optimize in
the first run. I tested it on a quite large data file with 4.3 MB and
1.6 Mill. data items, most of them integers.
Here are the running times for some available Ruby implementations
under Windows:
________data items______1,600,000_________320,000_______
Ruby 1.6.5-2 17:10 min 46 sec
Ruby 1.6.6-0 18:43 min 58 sec
Ruby 1.7.2 (i586-mswin32) 18:05 min 54 sec
As a comparision, I implemented the method in Python too with the
following results:
Python 2.1.1 (Zope) 58 sec 10 sec
Python 2.2 49 sec 9 sec
Active State Python 2.2a 49 sec 11 sec
And I also tested the data with the _read.table_ function of the
public domain statistical package *R* that has a almost the same
functionality (in a way I tried to model it)
R::read.table 30 sec 2 sec
One can see that the Python implementation compares reasonably with
such a well-known package. Unfortunately, the Ruby implementation of
the same method is *unacceptably* slow.
I had experiences with some text analysis functionalities where I did
split some 5,000 news messages into words and then counted and stored
these words for retrieval and for determining similarity between the
news articles.
Ruby was 20-30% slower than Python in this task, which I could really
accept because Ruby is such a nice language. But the time differences
above will kill my project, I'm afraid.
The tests were done on a 1.1 GHz Pentium III PC under Windows 2000 and
with 512 MB main memory. I didn't try Linux for that because the final
application has to run under MS Windows anyway.
So for me the question remains: Why is Ruby so unbelievably slow (more
than 5-20 times slower than Python) in this task -- esp. for larger
data sets?
Many thanks, Hans Werner.
______________________________________________________________________
Loading data set test.dat...
10001 rows loaded, of required length 32.
2.824 secs needed.
0 Id: TYPE Integer (10000 items, 0 missing).
1 V1: TYPE Set (2 items, 0 missing).
2 V2: TYPE Integer (75 items, 0 missing).
3 V3: TYPE Set (2 items, 0 missing).
4 V4: TYPE Set (6 items, 0 missing).
5 V5: TYPE Integer (885 items, 0 missing).
6 V6: TYPE Integer (467 items, 0 missing).
7 V7: TYPE Integer (402 items, 0 missing).
8 V8: TYPE Set (9 items, 0 missing).
9 V9: TYPE Integer (19 items, 0 missing).
10 V10: TYPE Integer (70 items, 0 missing).
11 V11: TYPE Integer (1653 items, 0 missing).
12 V12: TYPE Integer (1316 items, 0 missing).
13 V13: TYPE Integer (52 items, 0 missing).
14 V14: TYPE Set (6 items, 0 missing).
15 V15: TYPE Set (2 items, 0 missing).
16 V16: TYPE Integer (29 items, 0 missing).
17 V17: TYPE Integer (49 items, 0 missing).
18 V18: TYPE Integer (69 items, 0 missing).
19 V19: TYPE Integer (13 items, 0 missing).
20 V20: TYPE Set (11 items, 0 missing).
21 V21: TYPE Set (9 items, 0 missing).
22 V22: TYPE Integer (15 items, 0 missing).
23 V23: TYPE Integer (19 items, 0 missing).
24 V24: TYPE Set (10 items, 0 missing).
25 V25: TYPE Integer (15 items, 0 missing).
26 V26: TYPE Set (12 items, 0 missing).
27 V27: TYPE Integer (17 items, 0 missing).
28 V28: TYPE Integer (15 items, 0 missing).
29 V29: TYPE Integer (25 items, 0 missing).
30 V30: TYPE Set (2 items, 0 missing).
31 Target: TYPE Set (2 items, 0 missing).
48.655 secs needed.
______________________________________________________________________
module CSV
def parse_line(line, sep="\t", missing='?', comment='#')
line.chomp!
if line == '' or line[0] == comment
fields = []
nfields = 0
else
fields = line.split(sep)
nfields = fields.length
end
return nfields, fields
end
end #module
### -- c l a s s DataReader ---------------------------------------
class DataReader
include CSV
def initialize(fname, header=true, sep="\t", missing="?", comment="#")
### ------------------------------------------------
@fname = fname;
@header = header; @hfields = []
@dtypes = []; @dfields = []
@nrows = 0; @ncols = 0
@sep = sep; @missing = missing
@comment = comment
### ------------------------------------------------
end
def load(logging=false)
t1 = Time.now
if logging
puts
puts "---------------------------------------------- LOADING DATA ----"
puts "Loading data set #{@fname}..."
end
csvFile = File.open(@fname, 'r')
if @header
@ncols, @hfields = parse_line(csvFile.gets, \
sep=@sep, missing=@missing, comment=@comment)
else
raise "Not Implemented Error."
end
@row = []; @col = []
@row[0] = @hfields
(0...@ncols).each { |j| @col << [] }
no_short = 0; no_long = 0
ln_short = []; ln_long = []
n = 0
while line = csvFile.gets
n += 1
m, fields = parse_line(line, \
sep=@sep, missing=@missing, comment=@comment)
if m == 0 then next end
# fill row up with NA character or cut if too long
if m < @ncols
no_short +=1; ln_short << n+1
(@ncols - m).times { fields << @missing }
elsif m > @ncols
no_long += 1; ln_long << n+1
fields = fields[0...@ncols]
end
@row[n] = fields
(0...@ncols).each { |j| @col[j] << fields[j] }
end
csvFile.close
@nrows = @row.size
t2 = Time.now
if logging
puts "#{@nrows} rows loaded, of required length #{@ncols}."
if no_short > 0
puts "#{no_short} rows too short: #{ln_short[0]}, ..."
end
if no_long > 0
puts "#{no_long} rows too long: #{ln_long[0]}, ..."
end
puts "#{t2 - t1} secs needed."
puts
end
end
def prelyze(logging=false, missing=@missing)
t1 = Time.now
dtypes = {0 => 'NA', 1 => 'Integer', 2 => 'Continuous',
3 => 'String', 4 => 'Set'}
@dtypes = []
for j in (0...@ncols) do
ctype = 0; mitms = 0
@col[j].each { |item|
if item == missing
ctype = [ctype, 0].max
mitms += 1
elsif item =~ /^\s*[+\-]?\d+\s*$/
ctype = [ctype, 1].max
elsif item =~ /^\s*[+\-]?(?:\d+\.\d*|\d*\.\d+)\s*$/
ctype = [ctype, 2].max
else
ctype = [ctype, 3].max
end
}
nitms = (@col[j]-['']).nitems
if 0 < nitms and nitms <= 12 and nitms <= 0.1*(@nrows-mitms) then ctype
= 4 end
ctype = dtypes[ctype]
@dtypes << ctype
if logging
puts "#{j.to_s.rjust(3)} #{(@row[0][j]).rjust(15)}:\tTYPE #{ctype}
(#{nitms} items, #{mitms} missing)."
end
end
t2 = Time.now
if logging
puts
puts "#{t2 - t1} secs needed."
puts "----------------------------------------------------------------"
puts " Copyright (C) 2001, Data Mining Center."
puts
end
end
### -- accessor functions --
attr_reader :nrows, :ncols
attr_reader :dtypes
def nrow(); @nrows; end
def ncol(); @ncols; end
def hfields(); @row[0]; end
def [](i, j); @row[i][j]; end
def col(j); @col[j]; end
def row(i); @row[i]; end
end #class
### -- m a i n ( ) ------------------------------------------------#
tData = DataReader.new("test2.dat", header=true, \
sep="\t", missing="", comment="%")
tData.load(logging=true)
tData.prelyze(logging=true)