[ruby-core:95137] [Ruby master Bug#16121] Stop making a redundant hash copy in Hash#dup
From:
dylan.smith@...
Date:
2019-09-27 22:49:15 UTC
List:
ruby-core #95137
Issue #16121 has been updated by dylants (Dylan Thacker-Smith).
ko1 (Koichi Sasada) wrote:
> I found that
>
> > ` rb_define_method(rb_cArray, "initialize_copy", rb_ary_replace, 1);`
>
> `Array#initialize_copy` == `Array#replace`. Can we use same technique?
I tried doing that and it caused test failures because Hash#replace stop rehashing keys in ruby 2.6. I've opened a bug for that: https://bugs.ruby-lang.org/issues/16187
I've opened https://github.com/ruby/ruby/pull/2489 which replaces the implementation of both of these methods with a single optimized implementation that always rehashes.
I benchmarked it with the following benchmark/hash_dup.yml file
```
prelude: |
small_hash = { a: 1 }
larger_hash = 20.times.map { |i| [('a'.ord + i).chr.to_sym, i] }.to_h
benchmark:
dup_small: small_hash.dup
dup_larger: larger_hash.dup
replace_small: '{}.replace(small_hash)'
replace_larger: '{}.replace(larger_hash)'
loop_count: 100000
```
and here are the results against the latest ruby master commit
```
Comparison:
dup_small
built-ruby: 4105090.4 i/s
compare-ruby: 3089471.1 i/s - 1.33x slower
dup_larger
built-ruby: 682873.5 i/s
compare-ruby: 518623.8 i/s - 1.32x slower
replace_small
compare-ruby: 10697475.5 i/s
built-ruby: 9399379.9 i/s - 1.14x slower
replace_larger
built-ruby: 807767.5 i/s
compare-ruby: 358383.1 i/s - 2.25x slower
```
The `replace_small` case is slower due to rehashing, but is still much faster than on ruby version 2.5.6 that did rehash keys of small hashes:
```
replace_small
built-ruby: 8847991.6 i/s
compare-ruby: 2178174.7 i/s - 4.06x slower
```
----------------------------------------
Bug #16121: Stop making a redundant hash copy in Hash#dup
https://bugs.ruby-lang.org/issues/16121#change-81776
* Author: dylants (Dylan Thacker-Smith)
* Status: Open
* Priority: Normal
* Assignee: ko1 (Koichi Sasada)
* Target version:
* ruby -v: ruby 2.7.0dev (2019-08-23T16:41:09Z master b38ab0a3a9) [x86_64-darwin18]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
## Problem
I noticed while profiling object allocations that Hash#dup was allocating 2 objects instead of only 1 as expected. I looked for alternatives for comparison and found that `Hash[hash]` created a copy with only a single object allocation and seemed to be more than twice as fast. Reading the source code revealed the difference was that Hash#dup creates a copy of the Hash, then rehashes the copy. However, rehashing is done by making a copy of the hash, so the first copy before rehashing was unnecessary.
## Solution
I changed the code to just use rehashing to make the copy of the hash to improve performance while also preserving the existing behaviour.
## Benchmark
```ruby
require 'benchmark'
N = 100000
def report(x, name)
x.report(name) do
N.times do
yield
end
end
end
hashes = {
small_hash: { a: 1 },
larger_hash: 20.times.map { |i| [('a'.ord + i).chr.to_sym, i] }.to_h
}
Benchmark.bmbm do |x|
hashes.each do |name, hash|
report(x, "#{name}.dup") do
hash.dup
end
end
end
```
results on master
```
user system total real
small_hash.dup 0.401350 0.001638 0.402988 ( 0.404608)
larger_hash.dup 7.218548 0.433616 7.652164 ( 7.695990)
```
results with the attached patch
```
user system total real
small_hash.dup 0.336733 0.002425 0.339158 ( 0.341760)
larger_hash.dup 6.617343 0.398407 7.015750 ( 7.070282)
```
---Files--------------------------------
0001-Remove-redundant-Check_Type-after-to_hash.diff.txt (624 Bytes)
0002-Fix-freeing-and-clearing-destination-hash-in-Hash.diff.txt (1.57 KB)
0003-Remove-dead-code-paths-in-rb_hash_initialize_copy.diff.txt (1.12 KB)
0004-Stop-making-a-redundant-hash-copy-in-Hash-dup.diff.txt (1.35 KB)
--
https://bugs.ruby-lang.org/
Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>