[#6143] — Christophe Poucet <christophe.poucet@...>

Hello,

17 messages 2005/10/04
[#6147] Re: patch.tgz — nobu.nokada@... 2005/10/04

Hi,

[#6199] Kernel rdoc HTML file not being created when rdoc is run on 1.8.3 — James Britt <ruby@...>

When 1.8.3 came out, I grabbed the source and ran rdoc on it. After

9 messages 2005/10/08

[#6251] RubyGems, upstream releases and idempotence of packaging — Mauricio Fern疣dez <mfp@...>

[sorry for the very late reply; I left this message in +postponed and forgot

14 messages 2005/10/12

[#6282] Wilderness: Need Code to invoke ELTS_SHARED response — "Charles E. Thornton" <ruby-core@...>

Testing the My Object Dump and I am trying to cause creation

13 messages 2005/10/14
[#6283] Re: Wilderness: Need Code to invoke ELTS_SHARED response — Mauricio Fern疣dez <mfp@...> 2005/10/14

On Fri, Oct 14, 2005 at 05:04:59PM +0900, Charles E. Thornton wrote:

[#6288] Re: Wilderness: Need Code to invoke ELTS_SHARED response — "Charles E. Thornton" <ruby-core@...> 2005/10/14

Mauricio Fern疣dez wrote:

[#6365] Time for built-in Rational and Complex classes? — Gavin Sinclair <gsinclair@...>

There has been some support for, but no comment on, RCR #260 ("Make

12 messages 2005/10/24
[#6366] Re: Time for built-in Rational and Complex classes? — "Ara.T.Howard" <Ara.T.Howard@...> 2005/10/24

On Mon, 24 Oct 2005, Gavin Sinclair wrote:

[#6405] Re: [PATCH] Pathname.exists?() — "Berger, Daniel" <Daniel.Berger@...>

12 messages 2005/10/25
[#6406] Re: [PATCH] Pathname.exists?() — TRANS <transfire@...> 2005/10/25

On 10/25/05, Berger, Daniel <Daniel.Berger@qwest.com> wrote:

[#6408] Re: [PATCH] Pathname.exists?() — Gavin Sinclair <gsinclair@...> 2005/10/25

On 10/26/05, TRANS <transfire@gmail.com> wrote:

[#6442] Wilderness: I Have formatted README.EXT into an HTML Document — "Charles E. Thornton" <ruby-core@...>

I have taken README.EXT (English Version Only) and have reformatted

14 messages 2005/10/27

[#6469] csv.rb a start on refactoring. — Hugh Sasse <hgs@...>

For a database application I found using CSV to be rather slow.

50 messages 2005/10/28
[#6470] Re: csv.rb a start on refactoring. — "Ara.T.Howard" <Ara.T.Howard@...> 2005/10/28

[#6471] Re: csv.rb a start on refactoring. — James Edward Gray II <james@...> 2005/10/28

On Oct 28, 2005, at 8:53 AM, Ara.T.Howard wrote:

[#6474] Re: csv.rb a start on refactoring. — "Ara.T.Howard" <Ara.T.Howard@...> 2005/10/28

On Fri, 28 Oct 2005, James Edward Gray II wrote:

[#6484] Re: csv.rb a start on refactoring. — James Edward Gray II <james@...> 2005/10/29

On Oct 28, 2005, at 9:58 AM, Ara.T.Howard wrote:

[#6485] Re: csv.rb a start on refactoring. — "Ara.T.Howard" <Ara.T.Howard@...> 2005/10/29

On Sat, 29 Oct 2005, James Edward Gray II wrote:

[#6486] Re: csv.rb a start on refactoring. — James Edward Gray II <james@...> 2005/10/29

On Oct 28, 2005, at 8:25 PM, Ara.T.Howard wrote:

[#6487] Re: csv.rb a start on refactoring. — "Ara.T.Howard" <Ara.T.Howard@...> 2005/10/29

On Sat, 29 Oct 2005, James Edward Gray II wrote:

[#6491] Re: csv.rb a start on refactoring. — James Edward Gray II <james@...> 2005/10/29

On Oct 28, 2005, at 8:43 PM, Ara.T.Howard wrote:

[#6493] Re: csv.rb a start on refactoring. — James Edward Gray II <james@...> 2005/10/29

On Oct 28, 2005, at 10:06 PM, James Edward Gray II wrote:

[#6496] Re: csv.rb a start on refactoring. — "Ara.T.Howard" <Ara.T.Howard@...> 2005/10/29

On Sun, 30 Oct 2005, James Edward Gray II wrote:

[#6502] Re: csv.rb a start on refactoring. — James Edward Gray II <james@...> 2005/10/30

On Oct 29, 2005, at 12:11 PM, Ara.T.Howard wrote:

[#6505] Re: csv.rb a start on refactoring. — "Ara.T.Howard" <Ara.T.Howard@...> 2005/10/30

On Mon, 31 Oct 2005, James Edward Gray II wrote:

[#6511] Planning FasterCSV (was Re: csv.rb a start on refactoring.) — James Edward Gray II <james@...> 2005/10/30

I've decided to create a FasterCSV library, based on the code we

[#6516] Re: Planning FasterCSV (was Re: csv.rb a start on refactoring.) — "Ara.T.Howard" <Ara.T.Howard@...> 2005/10/31

On Mon, 31 Oct 2005, James Edward Gray II wrote:

[#6518] Re: Planning FasterCSV (was Re: csv.rb a start on refactoring.) — "NAKAMURA, Hiroshi" <nakahiro@...> 2005/10/31

-----BEGIN PGP SIGNED MESSAGE-----

Re: csv.rb a start on refactoring.

From: "Ara.T.Howard" <Ara.T.Howard@...>
Date: 2005-10-29 17:11:24 UTC
List: ruby-core #6496
On Sun, 30 Oct 2005, James Edward Gray II wrote:

> On Oct 28, 2005, at 10:06 PM, James Edward Gray II wrote:
>
>> Here's the hyper-optimized version with an example of how he uses it 
>> (translated from Perl to Ruby by me):
>
> This certainly seems promising, at least as a starting point:
>
> Neo:~/Desktop$ cat bm_csv.rb
> #!/usr/local/bin/ruby -w
>
> require "csv"
> require "benchmark"
>
> def parse_csv( line )
>  results = Array.new
>  line.scan(/\G(?:^|,)(?:"((?>[^"]*)(?>""[^"]*)*)"|([^",]*))/) do
>    if $1.nil?
>      results << $2
>    else
>      results << $1.gsub('""', '"')
>    end
>  end
>  results
> end
>
> DATA  = %Q{Ten Thousand,10000, 2710 ,,"10,000","It's ""10 Grand"", baby",10K}
> TESTS = 50000
>
> Benchmark.bm do |timings|
>  timings.report("CSV") { TESTS.times { CSV.parse_line(DATA) } }
>  timings.report("Regexp") { TESTS.times { parse_csv(DATA) } }
> end
> Neo:~/Desktop$ ruby bm_csv.rb
>      user     system      total        real
> CSV 18.570000   0.060000  18.630000 ( 18.675331)
> Regexp  2.700000   0.010000   2.710000 (  2.726666)


it __is__ promising!

it may or may not be tricky to get these failing cases working though:

harp:~ > ruby a.rb
==========================================
CSV2[7] => FAILED (RuntimeError)
==========================================
input:
"a,\"\"\"\nb\n\"\"\",\nc"
csv:
["a", "\"\nb\n\"", "\nc"]
expected:
["a", "\"\nb\n\"", nil]
==========================================

==========================================
CSV2[8] => FAILED (RuntimeError)
==========================================
input:
"a,,,"
csv:
["a", "", "", ""]
expected:
["a", nil, nil, nil]
==========================================

==========================================
CSV2[9] => FAILED (RuntimeError)
==========================================
input:
","
csv:
[""]
expected:
[nil, nil]
==========================================

==========================================
CSV2[13] => FAILED (RuntimeError)
==========================================
input:
",\"\""
csv:
[""]
expected:
[nil, ""]
==========================================

==========================================
CSV2[14] => FAILED (RuntimeError)
==========================================
input:
",\"\r\""
csv:
[""]
expected:
[nil, "\r"]
==========================================

==========================================
CSV2[16] => FAILED (RuntimeError)
==========================================
input:
"\"\r\n,\","
csv:
["\r\n,", ""]
expected:
["\r\n,", nil]
==========================================


mode of the erros stem from problems dealing with leading/trailing commas.  i
think it could be fixed.

code inlined at the end of this message.  btw - don't get me wrong - i'd
__love__ to see csv be faster, i just happen to load tons of mega escaped
doccuments so it has to handle anything we can throw at it too.

cheers.

-a
-- 
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| anything that contradicts experience and logic should be abandoned.
| -- h.h. the 14th dalai lama
===============================================================================


require 'pp'
require 'csv'

module CSV2
   def self::parse_line line
     csv = Array.new
     line.scan(/\G(?:^|,)(?:"((?>[^"]*)(?>""[^"]*)*)"|([^",]*))/){ csv << ($1 ? $1.gsub('""', '"') : $2)}
     csv
   end
end

tests = [
   [
     %( a,b ),
     ["a", "b"]
   ],
   [
     %( a,"""b""" ),
     ["a", "\"b\""]
   ],
   [
     %( a,"""b" ),
     ["a", "\"b"]
   ],
   [
     %( a,"b""" ),
     ["a", "b\""]
   ],
   [
     %( a,"
b""" ),
     ["a", "\nb\""]
   ],
   [
     %( a,"""
b" ),
     ["a", "\"\nb"]
   ],
   [
     %( a,"""
b
""" ),
     ["a", "\"\nb\n\""]
   ],
   [
     %( a,"""
b
""",
c ),
     ["a", "\"\nb\n\"", nil]
   ],
   [
     %( a,,, ),
     ["a", nil, nil, nil]
   ],
   [
     %( , ),
     [nil, nil]
   ],
   [
     %( "","" ),
     ["", ""]
   ],
   [
     %( """" ),
     ["\""]
   ],
   [
     %( """","" ),
     ["\"",""]
   ],
   [
     %( ,"" ),
     [nil,""]
   ],
   [
     %( \r,"\r" ),
     [nil,"\r"]
   ],
   [
     %( "\r\n," ),
     ["\r\n,"]
   ],
   [
     %( "\r\n,", ),
     ["\r\n,", nil]
   ],
]

impls = CSV, CSV2

tests.each_with_index do |test, idx|
   input, expected = test
   csv = []
   impls.each do |impl|
     begin
       csv = impl::parse_line input.strip
       raise "FAILED" unless csv == expected
     rescue => e
       puts "=" * 42
       puts "#{ impl }[#{ idx }] => #{ e.message } (#{ e.class })"
       puts "=" * 42
       puts "input:\n#{ PP::pp input.strip, '' }"
       puts "csv:\n#{ PP::pp csv, '' }"
       puts "expected:\n#{ PP::pp expected, '' }"
       puts "=" * 42
       puts
     end
   end
end

__END__

http://www.ietf.org/rfc/rfc4180.txt

In This Thread