[#98621] Re: Function getlogin_r()'s protoype] — Bertram Scharpf <lists@...>
FYI,
3 messages
2020/06/02
[#98947] [Ruby master Feature#16986] Anonymous Struct literal — ko1@...
Issue #16986 has been reported by ko1 (Koichi Sasada).
66 messages
2020/06/26
[#98962] [Ruby master Bug#16988] Kernel.load loads file from current directory without '.' in path — misharinn@...
Issue #16988 has been reported by TheSmartnik (Nikita Misharin).
5 messages
2020/06/26
[#98969] [Ruby master Feature#16994] Sets: shorthand for frozen sets of symbols / strings — marcandre-ruby-core@...
Issue #16994 has been reported by marcandre (Marc-Andre Lafortune).
7 messages
2020/06/26
[#100117] [Ruby master Feature#16994] Sets: shorthand for frozen sets of symbols / strings
— matz@...
2020/09/25
Issue #16994 has been updated by matz (Yukihiro Matsumoto).
[ruby-core:98978] [Ruby master Bug#16996] Hash should avoid doing unnecessary rehash
From:
marcandre-ruby-core@...
Date:
2020-06-27 08:20:32 UTC
List:
ruby-core #98978
Issue #16996 has been reported by marcandre (Marc-Andre Lafortune).
----------------------------------------
Bug #16996: Hash should avoid doing unnecessary rehash
https://bugs.ruby-lang.org/issues/16996
* Author: marcandre (Marc-Andre Lafortune)
* Status: Open
* Priority: Normal
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN
----------------------------------------
Pop quiz: Which is the fastest way to get a copy of a Hash `h`?
If, like me, you thought `h.dup` (of course, right?), you are actually wrong.
The fastest way is to call `h.merge`. Try it:
```
require 'benchmark/ips'
lengths = 1..50
h = lengths.to_h { |i| ['x' * i, nil] }
Benchmark.ips do |x|
x.report("dup") { h.dup }
x.report("merge") { h.merge }
end
```
I get
```
Calculating -------------------------------------
dup 259.233k (9.2%) i/s - 1.285M in 5.013445s
merge 944.095k (ア 8.2%) i/s - 4.693M in 5.005315s
```
Yup, it's *3.5x faster* with this example!!
Why? Because `Hash#dup` does a rehash, and `merge` does not.
Pop quiz 2: which methods of `Hash` that produce a new hash do a rehash?
Answer: it depends on the method and on the Ruby version
```
+---------------------------------+------+-----+-----+-----+-----+-----+-----+-----+-----+
| Does this rehash? | head | 2.7 | 2.6 | 2.5 | 2.4 | 2.3 | 2.2 | 2.1 | 2.0 |
+---------------------------------+------+-----+-----+-----+-----+-----+-----+-----+-----+
| h.dup / clone | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
+---------------------------------+------+-----+-----+-----+-----+-----+-----+-----+-----+
| h.select{true} / reject{false} | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
+---------------------------------+------+-----+-----+-----+-----+-----+-----+-----+-----+
| h.select!{true} / reject!{false}| リ | リ | リ | リ | リ | リ | リ | リ | リ |
+---------------------------------+------+-----+-----+-----+-----+-----+-----+-----+-----+
| sub_h.to_h | リ | リ | リ | リ | リ | リ | リ | リ | リ |
+---------------------------------+------+-----+-----+-----+-----+-----+-----+-----+-----+
| h.merge({}) | リ | リ | リ | リ | Yes | Yes | Yes | Yes | Yes |
+---------------------------------+------+-----+-----+-----+-----+-----+-----+-----+-----+
| h.merge | リ | リ | リ | n/a |
+---------------------------------+------+-----+-----+-----+-----+-----+-----+-----+-----+
| h.transform_values(&:itself) | リ | リ | Yes | Yes | Yes | n/a |
+---------------------------------+------+-----+-----+-----+-----+-----+-----+-----+-----+
(where `sub_h = Class.new(Hash).replace(h)`, リ = no rehash)
```
So in Ruby head, doing `h.merge({})` or even `h.transform_values(&:itself)` will be much faster than `h.dup` (but slower in Ruby 2.4) (*)
Notice that `select` rehashes, but `select!` doesn't, so the fastest way to do a `select` in Ruby is... not to call select and instead to actually do a `merge.select!`! (*)
*: on hashes with non-negligible hash functions
```ruby
class Hash
def fast_select(&block)
merge.select!(&block) # don't call dup because it's slow
end
end
Benchmark.ips do |x|
x.report("select") { h.select{true} }
x.report("fast_select") { h.fast_select{true} }
end
```
On my test case above, `fast_select` is *2.5x faster* than `select`. `fast_select` will always return exactly the same result (unless the receiver needed a rehash).
Pop quiz 3: Is this a bug or a feature?
It should be clear that no feature of Ruby should be re-implementable in Ruby with a 3.5x / 2.5x speed gain, so many would think "of course it's a bug".
Well, https://bugs.ruby-lang.org/issues/16121 seems to think that `Hash#dup`'s rehash is a feature...
Why?
Because there is actually a test that `dup` does a rehash
Why?
Because a test of `Set` was failing otherwise!
Commit: https://github.com/ruby/ruby/commit/a34a3c2caae4c1fbd
Short discussion: http://blade.nagaokaut.ac.jp/cgi-bin/vframe.rb/ruby/ruby-core/48040?47945-48527
Actual test: https://github.com/ruby/ruby/blob/master/test/test_set.rb#L621-L625
Why?
This test construct a `Set` that needs to be rehashed (by mutating an element of the set after it is added), and then checks that `rehash_me == rehash_me.clone`.
That test is bogus. It passes for obscure and undocumented reasons, and `rehash_me.clone == rehash_me` doesn't pass.
Today, it is official that sets with elements that are later mutated must be `Set#reset`, so it is official that this should not be relied upon.
Probably more clear is the case of `select/reject` (but I didn't check for failing test), and even more clear that `merge` changed in Ruby 2.5 and `transform_values` in 2.7, but not a single `NEWS` file mentions the word "rehash".
My conclusion is that Hash should avoid doing an unnecessary rehash: `dup`/`clone`/`select`/`reject`. We probably should add a reminder in the `NEWS` that if anyone mutates a key of a Hash, or an element of a Set and does not call `rehash`/`reset`, improper behavior should be expected.
Let's make `Hash#dup/clone/select/reject` fast please.
Any objection?
--
https://bugs.ruby-lang.org/
Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>