From: mame@... Date: 2019-10-04T21:03:01+00:00 Subject: [ruby-core:95232] [Ruby master Bug#16237] Set#union performance issue Issue #16237 has been updated by mame (Yusuke Endoh). Status changed from Open to Closed This is a spec. The reason is because `Set#union` duplicates self and then merges the argument, which makes your "set" implementation O( n^2 ). You may want to use `Set#merge` instead of `Set#union` in this case. ``` def set2 ret = Set.new REPEAT.times do ret.merge(Set.new(random_array)) end ret end ``` ``` user system total real hash 0.404887 0.011980 0.416867 ( 0.416911) set2 0.468687 0.004019 0.472706 ( 0.473242) ``` ---------------------------------------- Bug #16237: Set#union performance issue https://bugs.ruby-lang.org/issues/16237#change-81907 * Author: lionelperrin (Lionel Perrin) * Status: Closed * Priority: Normal * Assignee: * Target version: * ruby -v: ruby 2.6.2p47 (2019-03-13 revision 67232) [x86_64-linux] * Backport: 2.5: UNKNOWN, 2.6: UNKNOWN ---------------------------------------- I've discovered a very large difference in performances between Set#union and the use of a hash to compute this union. The issue can be shown with the code below. Here are the benchmark results: ``` user system total real hash 0.562500 0.078125 0.640625 ( 0.639528) uniq 54.375000 0.312500 54.687500 ( 54.772007) set 37.312500 0.125000 37.437500 ( 37.474751) mix1 43.718750 0.000000 43.718750 ( 43.890027) mix2 37.109375 0.000000 37.109375 ( 37.190839) ``` The code example: ``` require 'benchmark' require 'set' REPEAT = 1_000 $DISTINCT_COUNT = 100_000_000 def random_array(size=1_000) Array.new(size) { Random.rand(1..$DISTINCT_COUNT) } end def uniq ret = [] REPEAT.times do ret += random_array ret.uniq! end ret end def set ret = Set.new REPEAT.times do ret += Set.new(random_array) end ret end def mix1 ret = Set.new REPEAT.times do ret += random_array end ret end def mix2 ret = Set.new REPEAT.times do ret += random_array.uniq end ret end def hash ret = {} REPEAT.times do random_array.each { |v| ret[v] = nil } end ret.keys end Benchmark.bm do |x| %i(hash uniq set mix1 mix2).each do |i| x.report(i, &method(i)) end end ``` -- https://bugs.ruby-lang.org/ Unsubscribe: