From: "nobu (Nobuyoshi Nakada)" Date: 2013-06-30T23:38:55+09:00 Subject: [ruby-core:55717] [ruby-trunk - Bug #8585] Time for CSV.generate grows quadratic with number of rows Issue #8585 has been updated by nobu (Nobuyoshi Nakada). File bug-8585.diff added Eregon (Benoit Daloze) wrote: > Adding "# encoding: US-ASCII" at the top of the script makes it identical to the previous behavior, therefore taking the same time. I would certainly not call this a solution though. The file already has that line. This slug seems because `String#encode` in `do_quote` lambda in init_separators is called for each fields. ---------------------------------------- Bug #8585: Time for CSV.generate grows quadratic with number of rows https://bugs.ruby-lang.org/issues/8585#change-40211 Author: peter_v (Peter Vandenabeele) Status: Open Priority: Normal Assignee: Category: Target version: ruby -v: 2.1.0dev and 2.0.0 Backport: 1.9.3: UNKNOWN, 2.0.0: UNKNOWN Hi, I want to generate a CSV string, from millions of rows. I see the time to create the string grows quadratic with the amount of rows. With this issue, I cannot use ruby 2.0.0 to create the CSV file. I did not see this problem was not present in ruby 1.9.3. I see the problem is present in ruby 2.0.0 and ruby-head. Using ruby-head =============== Installed with `rvm reinstall ruby-head` (built from version 3a01b9e) peter_v@peter64:~/p/dbd$ rvm use ruby-head Using /home/peter_v/.rvm/gems/ruby-head peter_v@peter64:~/p/dbd$ ruby -v ruby 2.1.0dev (2013-06-30) [x86_64-linux] peter_v@peter64:~/p/dbd$ uname -a Linux peter64 3.5.0-34-generic #55~precise1-Ubuntu SMP Fri Jun 7 16:25:50 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux peter_v@peter64:~/p/dbd$ rvm current ruby-head peter_v@peter64:~/p/dbd$ cat bin/test_4.rb #!/usr/bin/env ruby count = ARGV[0].to_i unless count > 0 puts "Give a 'count' as first argument." exit(1) end require 'csv' row_data = [ "59ffbb3b-1e48-4c1f-81d8-d93afc84c966", "2013-06-28 19:14:55.975000806 UTC", "a11f290e-c441-41bc-8b8c-4e6c27b1b6fc", "c73e6241-d46f-4952-8377-c11372346d15", "test", "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0"] puts "starting CSV.generate" start_time = Time.now csv_string = CSV.generate(force_quotes: true) do |csv| count.times do csv << row_data end end puts "CSV.generate took #{Time.now - start_time} seconds" peter_v@peter64:~/p/dbd$ time bin/test_4.rb 10_000 starting CSV.generate CSV.generate took 1.01238478 seconds real 0m1.045s user 0m1.044s sys 0m0.004s peter_v@peter64:~/p/dbd$ time bin/test_4.rb 20_000 starting CSV.generate CSV.generate took 3.815373614 seconds real 0m3.847s user 0m3.844s sys 0m0.000s peter_v@peter64:~/p/dbd$ time bin/test_4.rb 40_000 starting CSV.generate CSV.generate took 17.176208859 seconds real 0m17.212s user 0m17.177s sys 0m0.020s peter_v@peter64:~/p/dbd$ time bin/test_4.rb 80_000 starting CSV.generate CSV.generate took 71.400916725 seconds real 1m11.436s user 1m11.320s sys 0m0.036s peter_v@peter64:~/p/dbd$ Using ruby-1.9.3-p448 ===================== This is as expected LINEAR growth of time with number of rows. peter_v@peter64:~/p/dbd$ rvm use ruby-1.9.3 Using /home/peter_v/.rvm/gems/ruby-1.9.3-p448 peter_v@peter64:~/p/dbd$ ruby -v ruby 1.9.3p448 (2013-06-27 revision 41675) [x86_64-linux] peter_v@peter64:~/p/dbd$ rvm current ruby-1.9.3-p448 peter_v@peter64:~/p/dbd$ time bin/test_4.rb 10_000 starting CSV.generate CSV.generate took 0.125396387 seconds real 0m0.150s user 0m0.140s sys 0m0.008s peter_v@peter64:~/p/dbd$ time bin/test_4.rb 20_000 starting CSV.generate CSV.generate took 0.249746069 seconds real 0m0.274s user 0m0.268s sys 0m0.004s peter_v@peter64:~/p/dbd$ time bin/test_4.rb 40_000 starting CSV.generate CSV.generate took 0.498180989 seconds real 0m0.522s user 0m0.504s sys 0m0.016s peter_v@peter64:~/p/dbd$ time bin/test_4.rb 80_000 starting CSV.generate CSV.generate took 0.991481147 seconds real 0m1.015s user 0m1.000s sys 0m0.016s peter_v@peter64:~/p/dbd$ time bin/test_4.rb 100_000 starting CSV.generate CSV.generate took 1.243347153 seconds real 0m1.265s user 0m1.240s sys 0m0.020s peter_v@peter64:~/p/dbd$ time bin/test_4.rb 1_000_000 starting CSV.generate CSV.generate took 12.461711974 seconds real 0m12.492s user 0m12.405s sys 0m0.080s peter_v@peter64:~/p/dbd$ -- http://bugs.ruby-lang.org/