From: dylan.smith@... Date: 2019-09-27T22:49:15+00:00 Subject: [ruby-core:95137] [Ruby master Bug#16121] Stop making a redundant hash copy in Hash#dup Issue #16121 has been updated by dylants (Dylan Thacker-Smith). ko1 (Koichi Sasada) wrote: > I found that > > > ` rb_define_method(rb_cArray, "initialize_copy", rb_ary_replace, 1);` > > `Array#initialize_copy` == `Array#replace`. Can we use same technique? I tried doing that and it caused test failures because Hash#replace stop rehashing keys in ruby 2.6. I've opened a bug for that: https://bugs.ruby-lang.org/issues/16187 I've opened https://github.com/ruby/ruby/pull/2489 which replaces the implementation of both of these methods with a single optimized implementation that always rehashes. I benchmarked it with the following benchmark/hash_dup.yml file ``` prelude: | small_hash = { a: 1 } larger_hash = 20.times.map { |i| [('a'.ord + i).chr.to_sym, i] }.to_h benchmark: dup_small: small_hash.dup dup_larger: larger_hash.dup replace_small: '{}.replace(small_hash)' replace_larger: '{}.replace(larger_hash)' loop_count: 100000 ``` and here are the results against the latest ruby master commit ``` Comparison: dup_small built-ruby: 4105090.4 i/s compare-ruby: 3089471.1 i/s - 1.33x slower dup_larger built-ruby: 682873.5 i/s compare-ruby: 518623.8 i/s - 1.32x slower replace_small compare-ruby: 10697475.5 i/s built-ruby: 9399379.9 i/s - 1.14x slower replace_larger built-ruby: 807767.5 i/s compare-ruby: 358383.1 i/s - 2.25x slower ``` The `replace_small` case is slower due to rehashing, but is still much faster than on ruby version 2.5.6 that did rehash keys of small hashes: ``` replace_small built-ruby: 8847991.6 i/s compare-ruby: 2178174.7 i/s - 4.06x slower ``` ---------------------------------------- Bug #16121: Stop making a redundant hash copy in Hash#dup https://bugs.ruby-lang.org/issues/16121#change-81776 * Author: dylants (Dylan Thacker-Smith) * Status: Open * Priority: Normal * Assignee: ko1 (Koichi Sasada) * Target version: * ruby -v: ruby 2.7.0dev (2019-08-23T16:41:09Z master b38ab0a3a9) [x86_64-darwin18] * Backport: 2.5: UNKNOWN, 2.6: UNKNOWN ---------------------------------------- ## Problem I noticed while profiling object allocations that Hash#dup was allocating 2 objects instead of only 1 as expected. I looked for alternatives for comparison and found that `Hash[hash]` created a copy with only a single object allocation and seemed to be more than twice as fast. Reading the source code revealed the difference was that Hash#dup creates a copy of the Hash, then rehashes the copy. However, rehashing is done by making a copy of the hash, so the first copy before rehashing was unnecessary. ## Solution I changed the code to just use rehashing to make the copy of the hash to improve performance while also preserving the existing behaviour. ## Benchmark ```ruby require 'benchmark' N = 100000 def report(x, name) x.report(name) do N.times do yield end end end hashes = { small_hash: { a: 1 }, larger_hash: 20.times.map { |i| [('a'.ord + i).chr.to_sym, i] }.to_h } Benchmark.bmbm do |x| hashes.each do |name, hash| report(x, "#{name}.dup") do hash.dup end end end ``` results on master ``` user system total real small_hash.dup 0.401350 0.001638 0.402988 ( 0.404608) larger_hash.dup 7.218548 0.433616 7.652164 ( 7.695990) ``` results with the attached patch ``` user system total real small_hash.dup 0.336733 0.002425 0.339158 ( 0.341760) larger_hash.dup 6.617343 0.398407 7.015750 ( 7.070282) ``` ---Files-------------------------------- 0001-Remove-redundant-Check_Type-after-to_hash.diff.txt (624 Bytes) 0002-Fix-freeing-and-clearing-destination-hash-in-Hash.diff.txt (1.57 KB) 0003-Remove-dead-code-paths-in-rb_hash_initialize_copy.diff.txt (1.12 KB) 0004-Stop-making-a-redundant-hash-copy-in-Hash-dup.diff.txt (1.35 KB) -- https://bugs.ruby-lang.org/ Unsubscribe: