From: "hsbt (Hiroshi SHIBATA)" Date: 2021-11-24T05:12:04+00:00 Subject: [ruby-core:106241] [Ruby master Bug#17478] Ruby3.0 is slower than Ruby2.7.2 when parsing a large CSV file Issue #17478 has been updated by hsbt (Hiroshi SHIBATA). Assignee set to kou (Kouhei Sutou) Status changed from Open to Assigned ---------------------------------------- Bug #17478: Ruby3.0 is slower than Ruby2.7.2 when parsing a large CSV file https://bugs.ruby-lang.org/issues/17478#change-94861 * Author: okkez (okkez _) * Status: Assigned * Priority: Normal * Assignee: kou (Kouhei Sutou) * ruby -v: ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-linux] * Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN ---------------------------------------- Ruby3.0 is around 10%-20% slower than Ruby2.7.2 when parsing and aggregating a large CSV file. The script is here: ```ruby require "csv" name_to_cost = Hash.new(0) CSV.foreach(ARGV[0], headers: true) do |row| name_to_cost[row["name"]] += row["cost"].to_f end name_to_cost.sort_by {|k, _| k }.each do |name, cost| printf "%s\t%.3f\n", name, cost end ``` The sample data is like following(3 mega lines and the size is about 235MiB): ``` id,name,description,cost 2365599605,ysgHDPA,Voluptatem sit perferendis accusantium consequatur aut.,25.115 2365599606,xFLXOtJ,Sit accusantium aut perferendis voluptatem consequatur.,60.228 2365599607,RlkxNQB,Accusantium sit aut consequatur perferendis voluptatem.,79.663 2365599608,YVMbuva,Sit perferendis voluptatem accusantium aut consequatur.,49.863 2365599609,rtxVcDW,Accusantium voluptatem sit perferendis aut consequatur.,50.765 2365599610,rtxVcDW,Aut sit accusantium consequatur perferendis voluptatem.,94.310 2365599611,muDwuke,Consequatur sit accusantium aut perferendis voluptatem.,16.991 2365599612,tkqFWyM,Perferendis sit voluptatem consequatur aut accusantium.,98.753 ``` * Ruby2.7.2: 25.37 seconds * Ruby3.0.0: 27.53 seconds I use this program to generate the test CSV file: https://gist.github.com/okkez/05ffa0df08cf49014f460eb2e8543698 In case of using another private data: * Ruby2.7.2: 31.54 seconds * Ruby3.0.0: 37.15 seconds The private data is like followings: * There are 18 columns * There are 1144305 lines * It is 334MiB -- https://bugs.ruby-lang.org/ Unsubscribe: