[ruby-core:95232] [Ruby master Bug#16237] Set#union performance issue
From:
mame@...
Date:
2019-10-04 21:03:01 UTC
List:
ruby-core #95232
Issue #16237 has been updated by mame (Yusuke Endoh).
Status changed from Open to Closed
This is a spec. The reason is because `Set#union` duplicates self and then merges the argument, which makes your "set" implementation O( n^2 ). You may want to use `Set#merge` instead of `Set#union` in this case.
```
def set2
ret = Set.new
REPEAT.times do
ret.merge(Set.new(random_array))
end
ret
end
```
```
user system total real
hash 0.404887 0.011980 0.416867 ( 0.416911)
set2 0.468687 0.004019 0.472706 ( 0.473242)
```
----------------------------------------
Bug #16237: Set#union performance issue
https://bugs.ruby-lang.org/issues/16237#change-81907
* Author: lionelperrin (Lionel Perrin)
* Status: Closed
* Priority: Normal
* Assignee:
* Target version:
* ruby -v: ruby 2.6.2p47 (2019-03-13 revision 67232) [x86_64-linux]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
I've discovered a very large difference in performances between Set#union and the use of a hash to compute this union.
The issue can be shown with the code below.
Here are the benchmark results:
```
user system total real
hash 0.562500 0.078125 0.640625 ( 0.639528)
uniq 54.375000 0.312500 54.687500 ( 54.772007)
set 37.312500 0.125000 37.437500 ( 37.474751)
mix1 43.718750 0.000000 43.718750 ( 43.890027)
mix2 37.109375 0.000000 37.109375 ( 37.190839)
```
The code example:
```
require 'benchmark'
require 'set'
REPEAT = 1_000
$DISTINCT_COUNT = 100_000_000
def random_array(size=1_000)
Array.new(size) { Random.rand(1..$DISTINCT_COUNT) }
end
def uniq
ret = []
REPEAT.times do
ret += random_array
ret.uniq!
end
ret
end
def set
ret = Set.new
REPEAT.times do
ret += Set.new(random_array)
end
ret
end
def mix1
ret = Set.new
REPEAT.times do
ret += random_array
end
ret
end
def mix2
ret = Set.new
REPEAT.times do
ret += random_array.uniq
end
ret
end
def hash
ret = {}
REPEAT.times do
random_array.each { |v| ret[v] = nil }
end
ret.keys
end
Benchmark.bm do |x|
%i(hash uniq set mix1 mix2).each do |i|
x.report(i, &method(i))
end
end
```
--
https://bugs.ruby-lang.org/
Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>