From: dylan.smith@...
Date: 2019-09-27T22:49:15+00:00
Subject: [ruby-core:95137] [Ruby master Bug#16121] Stop making a redundant	hash copy in Hash#dup

Issue #16121 has been updated by dylants (Dylan Thacker-Smith).


ko1 (Koichi Sasada) wrote:
> I found that
> 
> > `    rb_define_method(rb_cArray, "initialize_copy", rb_ary_replace, 1);`
> 
> `Array#initialize_copy` == `Array#replace`. Can we use same technique?

I tried doing that and it caused test failures because Hash#replace stop rehashing keys in ruby 2.6.  I've opened a bug for that: https://bugs.ruby-lang.org/issues/16187

I've opened https://github.com/ruby/ruby/pull/2489 which replaces the implementation of both of these methods with a single optimized implementation that always rehashes.

I benchmarked it with the following benchmark/hash_dup.yml file

```
prelude: |
  small_hash = { a: 1 }
  larger_hash = 20.times.map { |i| [('a'.ord + i).chr.to_sym, i] }.to_h

benchmark:
  dup_small: small_hash.dup
  dup_larger: larger_hash.dup
  replace_small: '{}.replace(small_hash)'
  replace_larger: '{}.replace(larger_hash)'
loop_count: 100000
```

and here are the results against the latest ruby master commit

```
Comparison:
                        dup_small
          built-ruby:   4105090.4 i/s
        compare-ruby:   3089471.1 i/s - 1.33x  slower

                       dup_larger
          built-ruby:    682873.5 i/s
        compare-ruby:    518623.8 i/s - 1.32x  slower

                    replace_small
        compare-ruby:  10697475.5 i/s
          built-ruby:   9399379.9 i/s - 1.14x  slower

                   replace_larger
          built-ruby:    807767.5 i/s
        compare-ruby:    358383.1 i/s - 2.25x  slower
```

The `replace_small` case is slower due to rehashing, but is still much faster than on ruby version 2.5.6 that did rehash keys of small hashes:

```
                    replace_small
          built-ruby:   8847991.6 i/s
        compare-ruby:   2178174.7 i/s - 4.06x  slower
```

----------------------------------------
Bug #16121: Stop making a redundant hash copy in Hash#dup
https://bugs.ruby-lang.org/issues/16121#change-81776

* Author: dylants (Dylan Thacker-Smith)
* Status: Open
* Priority: Normal
* Assignee: ko1 (Koichi Sasada)
* Target version: 
* ruby -v: ruby 2.7.0dev (2019-08-23T16:41:09Z master b38ab0a3a9) [x86_64-darwin18]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
## Problem

I noticed while profiling object allocations that Hash#dup was allocating 2 objects instead of only 1 as expected.  I looked for alternatives for comparison and found that `Hash[hash]` created a copy with only a single object allocation and seemed to be more than twice as fast.  Reading the source code revealed the difference was that Hash#dup creates a copy of the Hash, then rehashes the copy.   However, rehashing is done by making a copy of the hash, so the first copy before rehashing was unnecessary.

## Solution

I changed the code to just use rehashing to make the copy of the hash to improve performance while also preserving the existing behaviour.

## Benchmark

```ruby
require 'benchmark'

N = 100000

def report(x, name)
  x.report(name) do
    N.times do
      yield
    end
  end
end

hashes = {
  small_hash: { a: 1 },
  larger_hash: 20.times.map { |i| [('a'.ord + i).chr.to_sym, i] }.to_h
}

Benchmark.bmbm do |x|
  hashes.each do |name, hash|
    report(x, "#{name}.dup") do
      hash.dup
    end
  end
end
```

results on master

```
                      user     system      total        real
small_hash.dup    0.401350   0.001638   0.402988 (  0.404608)
larger_hash.dup   7.218548   0.433616   7.652164 (  7.695990)
```

results with the attached patch

```
                      user     system      total        real
small_hash.dup    0.336733   0.002425   0.339158 (  0.341760)
larger_hash.dup   6.617343   0.398407   7.015750 (  7.070282)
```

---Files--------------------------------
0001-Remove-redundant-Check_Type-after-to_hash.diff.txt (624 Bytes)
0002-Fix-freeing-and-clearing-destination-hash-in-Hash.diff.txt (1.57 KB)
0003-Remove-dead-code-paths-in-rb_hash_initialize_copy.diff.txt (1.12 KB)
0004-Stop-making-a-redundant-hash-copy-in-Hash-dup.diff.txt (1.35 KB)


-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>