[#6115] Ruby 1.8.3: YAML.dump/load cannot handle Bignum — akira yamada / やまだあきら <akira@...>
[#6119] Packaging BOF on Friday the 14th? — Austin Ziegler <halostatue@...>
(Crossposted to both ruby-core and rubygems-developers for the benefit
[#6135] ObjectSpace.each_object, but not Symbols? — TRANS <transfire@...>
I added some state to Symbol:
Hi,
Hi,
[#6143] — Christophe Poucet <christophe.poucet@...>
Hello,
Hi,
On Wed, 5 Oct 2005 nobu.nokada@softhome.net wrote:
Hi,
On Wed, 5 Oct 2005, nobuyoshi nakada wrote:
[#6161] On NullClass or FalseClass#method_missing — TRANS <transfire@...>
Hi--
[#6162] Concerning shared flag — Christophe Poucet <christophe.poucet@...>
Hello,
>>>>> "C" == Christophe Poucet <christophe.poucet@gmail.com> writes:
Hello,
>>>>> "C" == Christophe Poucet <christophe.poucet@gmail.com> writes:
[#6188] yield and call not identical? — "David A. Black" <dblack@...>
Hi --
[#6199] Kernel rdoc HTML file not being created when rdoc is run on 1.8.3 — James Britt <ruby@...>
When 1.8.3 came out, I grabbed the source and ran rdoc on it. After
On Sun, Oct 09, 2005 at 12:41:02AM +0900, James Britt wrote:
Doug Kearns wrote:
H.Yamamoto wrote:
On 10/19/05, why the lucky stiff <ruby-core@whytheluckystiff.net> wrote:
[#6213] extend and super -- I cannot understand why this behavior — TRANS <transfire@...>
module Q
On Tue, 11 Oct 2005, TRANS wrote:
On 10/10/05, Mathieu Bouchard <matju@artengine.ca> wrote:
On Tue, 11 Oct 2005, TRANS wrote:
On 10/10/05, Mathieu Bouchard <matju@artengine.ca> wrote:
[#6235] Keyword arguments in Rite — Daniel Schierbeck <daniel.schierbeck@...>
Hello everybody! I'm new to this list, so please don't flame me if what
Daniel Schierbeck wrote:
[#6251] RubyGems, upstream releases and idempotence of packaging — Mauricio Fern疣dez <mfp@...>
[sorry for the very late reply; I left this message in +postponed and forgot
On 10/13/05, Mauricio Fern疣dez <mfp@acm.org> wrote:
On Thu, Oct 13, 2005 at 08:55:41PM +0900, Gavin Sinclair wrote:
[#6262] Re: A concrete solution to RubyGems' repackageability problems — Gavin Sinclair <gsinclair@...>
On 10/13/05, Mauricio Fern疣dez <mfp@acm.org> wrote:
[#6282] Wilderness: Need Code to invoke ELTS_SHARED response — "Charles E. Thornton" <ruby-core@...>
Testing the My Object Dump and I am trying to cause creation
On Fri, Oct 14, 2005 at 05:04:59PM +0900, Charles E. Thornton wrote:
Mauricio Fern疣dez wrote:
On Oct 14, 2005, at 12:43 PM, Charles E. Thornton wrote:
On Sun, Oct 16, 2005 at 01:34:13PM +0900, Charles Mills wrote:
Mauricio Fern疣dez wrote:
[#6284] Ruby 1.8.3, Gems, Rake and Syck — TRANS <transfire@...>
George Moschovitis tried to send me a gem to try out and it would not install.
On 10/14/05, Ryan Davis <ryand-ruby@zenspider.com> wrote:
[#6315] Integer#** weirdness — Peter Vanbroekhoven <calamitates@...>
Hello,
[#6338] Help/Ruby 1.8.3/HP-UX/[BUG] Bus Error — tad.bochan@...
Hi ... need help ...
[#6358] Handle prompts with newlines in irb auto-indentation mode — noreply@...
Bugs item #2705, was opened at 2005-10-23 23:07
Hi,
[#6362] CGI read_multipart implementaion can create Tempfiles for files less than 10KB — noreply@...
Bugs item #2708, was opened at 2005-10-24 15:44
On Mon, 24 Oct 2005 noreply@rubyforge.org wrote:
[#6364] lib/rational.rb documentation — Gavin Sinclair <gsinclair@...>
Hi,
[#6365] Time for built-in Rational and Complex classes? — Gavin Sinclair <gsinclair@...>
There has been some support for, but no comment on, RCR #260 ("Make
On Mon, 24 Oct 2005, Gavin Sinclair wrote:
On Oct 24, 2005, at 7:14 AM, Ara.T.Howard wrote:
On Wed, 26 Oct 2005, Charles Mills wrote:
On 10/26/05, Mathieu Bouchard <matju@artengine.ca> wrote:
On Thu, 27 Oct 2005, Charles Mills wrote:
On 10/27/05, Mathieu Bouchard <matju@artengine.ca> wrote:
[#6373] instance_eval/instance_exec discussion — Daniel Amelang <daniel.amelang@...>
Introduction:
Hi,
[#6376] Crash in Tk demo of Ruby 1.9.0 CVS — Jean-Claude Arbaut <jcarbaut@...>
I tried the demos in /ruby/ext/tk/sample/demos-en/widget
[#6389] [PATCH] 1.8.3 ruby.c doesn't compile on OS X due to missing char **environ — noreply@...
Bugs item #2715, was opened at 2005-10-24 23:01
Hi,
[#6391] Threading performance — Wink Saville <wink@...>
Hello all,
[#6396] Nested Exception — Yohanes Santoso <ysantoso-rubycore@...>
Would you accept a patch to provide nested Exception?
[#6402] Pathname.exists?() — James Edward Gray II <james@...>
Pathname supports the legacy exist?() method, but not the current
[#6405] Re: [PATCH] Pathname.exists?() — "Berger, Daniel" <Daniel.Berger@...>
On 10/25/05, Berger, Daniel <Daniel.Berger@qwest.com> wrote:
On 10/26/05, TRANS <transfire@gmail.com> wrote:
On 10/25/05, Gavin Sinclair <gsinclair@gmail.com> wrote:
On Oct 25, 2005, at 11:28 AM, TRANS wrote:
On Wed, 26 Oct 2005, Eric Hodel wrote:
On 10/26/05, Ara.T.Howard <Ara.T.Howard@noaa.gov> wrote:
On 10/25/05, Gavin Sinclair <gsinclair@gmail.com> wrote:
[#6419] Refactoring eval.c into eval.c, thread.c, thread.h & eval.h — Wink Saville <wink@...>
Hello,
[#6427] Re: Wilderness: I am working of a TAGS Extension - We Have One? — "Berger, Daniel" <Daniel.Berger@...>
> -----Original Message-----
[#6430] PStore Documentation — James Edward Gray II <james@...>
The attached patch completely documents the PStore library. Please
James Edward Gray II wrote:
[#6442] Wilderness: I Have formatted README.EXT into an HTML Document — "Charles E. Thornton" <ruby-core@...>
I have taken README.EXT (English Version Only) and have reformatted
Hi,
Charles E. Thornton wrote:
[#6455] Wilderness: OK - Let us Try to sending it (not as a reply) — "Charles E. Thornton" <ruby-core@...>
I am sorry - I don't understand this problem
[#6469] csv.rb a start on refactoring. — Hugh Sasse <hgs@...>
For a database application I found using CSV to be rather slow.
On Oct 28, 2005, at 8:53 AM, Ara.T.Howard wrote:
On Fri, 28 Oct 2005, James Edward Gray II wrote:
On Oct 28, 2005, at 9:58 AM, Ara.T.Howard wrote:
On Sat, 29 Oct 2005, James Edward Gray II wrote:
On Oct 28, 2005, at 8:25 PM, Ara.T.Howard wrote:
On Sat, 29 Oct 2005, James Edward Gray II wrote:
On Oct 28, 2005, at 8:43 PM, Ara.T.Howard wrote:
On Oct 28, 2005, at 8:43 PM, Ara.T.Howard wrote:
On Oct 28, 2005, at 10:06 PM, James Edward Gray II wrote:
On Sun, 30 Oct 2005, James Edward Gray II wrote:
On Oct 29, 2005, at 12:11 PM, Ara.T.Howard wrote:
On Mon, 31 Oct 2005, James Edward Gray II wrote:
I've decided to create a FasterCSV library, based on the code we
On Mon, 31 Oct 2005, James Edward Gray II wrote:
-----BEGIN PGP SIGNED MESSAGE-----
On Mon, 31 Oct 2005, NAKAMURA, Hiroshi wrote:
-----BEGIN PGP SIGNED MESSAGE-----
On Tue, 1 Nov 2005, NAKAMURA, Hiroshi wrote:
-----BEGIN PGP SIGNED MESSAGE-----
On Wed, 2 Nov 2005, NAKAMURA, Hiroshi wrote:
-----BEGIN PGP SIGNED MESSAGE-----
On Oct 29, 2005, at 12:11 PM, Ara.T.Howard wrote:
On Tue, 1 Nov 2005, James Edward Gray II wrote:
On Oct 31, 2005, at 11:59 AM, Ara.T.Howard wrote:
[#6508] characters (and small strings) in ruby 2.0 — Eric Mahurin <eric.mahurin@...>
In ruby 2.0, the current plan is to for a character to be represented as a
Re: csv.rb a start on refactoring.
wow - your mailer sent everything as attachments, including your message....
strange. maybe it's on my end... so sorry for not quoting context here.
anyhow - thanks for doing this. fyi, i've used the following approach many
times for loading huge csv files in an attempt to squeeze out speed - it
works. the approach is simple:
- parse the first line __only__ using the built-in csv class, note the
number of columns as n_columns.
- for each subsequent line
fields = row.strip.split(%r/\s*,\s*/)
if fields.size == n_columns
yield fields
else
fields = CSV::parse row
yield fields
end
this apprach would need to be expanded slightly to deal with awful csv lines
like
foo, "bar
and more
and more and more bar"
but it can still be done. essentially the parser must remain optimistic at
all times - assuming a simple split and stripping of all fields is sufficient.
__only__ upon finding it not so should it degrade to the slow, but very
complete, built-in csv parser. tweaking my appoach to be buffer-of-lines
based rather than lines based should do the trick.
food for thought.
regards.
-a
--
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| anything that contradicts experience and logic should be abandoned.
| -- h.h. the 14th dalai lama
===============================================================================
For a database application I found using CSV to be rather slow.
I have made an attempt to speed it up, which, frankly has not been
very successuful. However I think I have improved the code a
little.
I've added an error for if the state machine should get into an
unknown state, rather than assuming that "it must be in the other
one" after several tests. This slows things down, but is probably
safer, given how hard state machines can be to debug.
I've changed if block to if block_given? -- more idiomatic,
possibly faster??
I've tried to simplify
if something.is_a?(Klass)
something.method
end
to
if something.respond_to?(:method)
something.method
end
I think its faster, and its more duck-typing style.
I've tried to handle state transitions with a hash, as it is
probably clearer than if...elsif...else...end. Likely to be faster,
too.
I've changed
if expr
singlestatement
end
to
singlestatement if expr
to shorten the code. Don't know how that impacts speed.
I've changed if...elsif...else... to case statements where I can. I
think that ought to be faster, but it is clearer code, I think.
There's a while statement with an if immediately inside it. Since
thre is no else for that if, and the loop condition doesn't change
if the if-condition is false, then the condition must always apply.
For, if it did not apply, the while loop would never exit. I have
commented out the condition and corresponding end.
idx_is_eos? is private, and thus not really part of the external
interface. It takes a parameter that it doesn't use. I have removed
that and made it parameterless.
I have tested the code and it still works. I could not find
exhaustive test cases for CSV (looked in Rubycon and on the web) so
hope I haven't broken anything. It does seem to be a tiny bit
faster, for the given case. My tests are attached. The test
program takes the file to test (csv or nova_csv) as an argument, to
allow comparison. The name? nova_csv because new_csv was already
being used for doc patches.
HTH
Hugh
Attachments (2)
--- csv.rb 2004-05-27 15:39:10.000000000 +0100
+++ nova_csv.rb 2005-10-28 11:05:48.751478000 +0100
@@ -10,6 +10,7 @@
class CSV
class IllegalFormatError < RuntimeError; end
+ class UnknownState < RuntimeError; end
# deprecated
class Cell < String
@@ -118,7 +119,7 @@
" Use CSV.open(filename, 'r') instead.")
return open_reader(str_or_readable, 'r', fs, rs, &block)
end
- if block
+ if block_given?
CSV::Reader.parse(str_or_readable, fs, rs) do |row|
yield(row)
end
@@ -136,10 +137,10 @@
# not, use CSV.parse_row instead of this method.
def CSV.parse_line(src, fs = nil, rs = nil)
fs ||= ','
- if fs.is_a?(Fixnum)
+ if fs.respond_to? :chr
fs = fs.chr
end
- if !rs.nil? and rs.is_a?(Fixnum)
+ if rs.respond_to? :chr
rs = rs.chr
end
idx = 0
@@ -162,10 +163,10 @@
return ''
end
fs ||= ','
- if fs.is_a?(Fixnum)
+ if fs.respond_to? :chr
fs = fs.chr
end
- if !rs.nil? and rs.is_a?(Fixnum)
+ if rs.respond_to? :chr
rs = rs.chr
end
res_type = :DT_COLSEP
@@ -213,10 +214,10 @@
#
def CSV.parse_row(src, idx, out_dev, fs = nil, rs = nil)
fs ||= ','
- if fs.is_a?(Fixnum)
+ if fs.respond_to? :chr
fs = fs.chr
end
- if !rs.nil? and rs.is_a?(Fixnum)
+ if rs.respond_to? :chr
rs = rs.chr
end
idx_backup = idx
@@ -270,10 +271,10 @@
#
def CSV.generate_row(src, cells, out_dev, fs = nil, rs = nil)
fs ||= ','
- if fs.is_a?(Fixnum)
+ if fs.respond_to :chr
fs = fs.chr
end
- if !rs.nil? and rs.is_a?(Fixnum)
+ if rs.respond_to :chr
rs = rs.chr
end
src_size = src.size
@@ -303,6 +304,9 @@
# Private class methods.
class << self
private
+ DATA_TRANSITION1 = Hash.new{|h,k| k}.merge!(:ST_START => :ST_DATA,
+ :ST_QUOTE => :ILLEGAL_FORMAT)
+ # return the key as default
def open_reader(path, mode, fs, rs, &block)
file = File.open(path, mode)
@@ -340,6 +344,13 @@
end
end
+
+ def to_data_unless_quote(state)
+ state = DATA_TRANSITION1[state]
+ raise IllegalFormatError if state == :ILLEGAL_FORMAT
+ return state
+ end
+
def parse_body(src, idx, fs, rs)
fs_str = fs
fs_size = fs_str.size
@@ -359,54 +370,37 @@
if !fschar and c == fs_str[0]
fs_idx = 0
fschar = true
- if state == :ST_START
- state = :ST_DATA
- elsif state == :ST_QUOTE
- raise IllegalFormatError
- end
+ state = to_data_unless_quote(state)
end
if !rschar and c == rs_str[0]
rs_idx = 0
rschar = true
- if state == :ST_START
- state = :ST_DATA
- elsif state == :ST_QUOTE
- raise IllegalFormatError
- end
+ state = to_data_unless_quote(state)
end
end
if c == ?"
fs_idx = rs_idx = 0
- if cr
- raise IllegalFormatError
- end
+ raise IllegalFormatError if cr
cell << src[last_idx, (idx - last_idx)]
- last_idx = idx
- if state == :ST_DATA
- if quoted
- last_idx += 1
- quoted = false
- state = :ST_QUOTE
- else
- raise IllegalFormatError
- end
- elsif state == :ST_QUOTE
+ last_idx = idx + 1
+ case state
+ when :ST_DATA
+ raise IllegalFormatError unless quoted
+ quoted = false # Redundant, surely?
+ state = :ST_QUOTE
+ when :ST_QUOTE
cell << c.chr
- last_idx += 1
quoted = true
state = :ST_DATA
- else # :ST_START
+ when :ST_START
quoted = true
- last_idx += 1
state = :ST_DATA
+ else
+ raise UnknownState
end
elsif fschar or rschar
- if fschar
- fs_idx += 1
- end
- if rschar
- rs_idx += 1
- end
+ fs_idx += 1 if fschar
+ rs_idx += 1 if rschar
sep = nil
if fs_idx == fs_size
if state == :ST_START and rs_idx > 0 and fs_idx < rs_idx
@@ -415,9 +409,7 @@
cell << src[last_idx, (idx - last_idx - (fs_size - 1))]
last_idx = idx
fs_idx = rs_idx = 0
- if cr
- raise IllegalFormatError
- end
+ raise IllegalFormatError if cr
sep = :DT_COLSEP
elsif rs_idx == rs_size
if state == :ST_START and fs_idx > 0 and rs_idx < fs_idx
@@ -431,20 +423,19 @@
sep = :DT_ROWSEP
end
if sep
- if state == :ST_DATA
- return sep, idx + 1, cell;
- elsif state == :ST_QUOTE
- return sep, idx + 1, cell;
- else # :ST_START
- return sep, idx + 1, nil
+ return sep, idx + 1, case state
+ when :ST_DATA, :ST_QUOTE
+ cell
+ when :ST_START
+ nil
+ else
+ raise UnknownState
end
end
elsif rs.nil? and c == ?\r
# special \r treatment for backward compatibility
fs_idx = rs_idx = 0
- if cr
- raise IllegalFormatError
- end
+ raise IllegalFormatError if cr
cell << src[last_idx, (idx - last_idx)]
last_idx = idx
if quoted
@@ -454,13 +445,14 @@
end
else
fs_idx = rs_idx = 0
- if state == :ST_DATA or state == :ST_START
- if cr
- raise IllegalFormatError
- end
+ case state
+ when :ST_DATA, :ST_START
+ raise IllegalFormatError if cr
state = :ST_DATA
- else # :ST_QUOTE
+ when :ST_QUOTE
raise IllegalFormatError
+ else
+ raise UnknownState
end
end
idx += 1
@@ -471,9 +463,7 @@
else
return :DT_EOS, idx, nil
end
- elsif quoted
- raise IllegalFormatError
- elsif cr
+ elsif quoted or cr
raise IllegalFormatError
end
cell << src[last_idx, (idx - last_idx)]
@@ -804,7 +794,7 @@
if idx < 0
return nil
end
- if (idx_is_eos?(idx))
+ if (idx_is_eos?)
if n and (@offset + idx == buf_size(@cur_buf))
# Like a String, 'abc'[4, 1] returns nil and
# 'abc'[3, 1] returns '' not nil.
@@ -818,13 +808,12 @@
next_idx = idx
while (my_offset + next_idx >= buf_size(my_buf))
if (my_buf == @buf_tail_idx)
- unless add_buf
- break
- end
+ break unless add_buf
end
next_idx = my_offset + next_idx - buf_size(my_buf)
my_buf += 1
my_offset = 0
+ # i.e. loc = my_offset + next_idx because myoffset is 0
end
loc = my_offset + next_idx
if !n
@@ -855,9 +844,9 @@
if is_eos?
return 0
end
- size_dropped = 0
+ bsc = size_dropped = 0
while (n > 0)
- if !@is_eos or (@cur_buf != @buf_tail_idx)
+ # if !@is_eos or (@cur_buf != @buf_tail_idx) # redundant? If it fails loop never exits
if (@offset + n < buf_size(@cur_buf))
size_dropped += n
@offset += n
@@ -868,19 +857,17 @@
n -= size
@offset = 0
unless rel_buf
- unless add_buf
- break
- end
+ break unless add_buf
@cur_buf = @buf_tail_idx
end
end
- end
+ # end
end
size_dropped
end
def is_eos?
- return idx_is_eos?(0)
+ return idx_is_eos?
end
# WARN: Do not instantiate this class directly. Define your own class
@@ -929,13 +916,12 @@
if str_read.nil?
@is_eos = true
@buf_list.push('')
- @buf_tail_idx += 1
false
else
@buf_list.push(str_read)
- @buf_tail_idx += 1
true
end
+ @buf_tail_idx += 1
end
def rel_buf
@@ -952,8 +938,8 @@
end
end
- def idx_is_eos?(idx)
- (@is_eos and ((@cur_buf < 0) or (@cur_buf == @buf_tail_idx)))
+ def idx_is_eos?
+ (@is_eos and ((@cur_buf == @buf_tail_idx) or (@cur_buf < 0)))
end
BufSize = 1024 * 8