[#102687] [Ruby master Bug#17666] Sleep in a thread hangs when Fiber.set_scheduler is set — arjundas.27586@...

Issue #17666 has been reported by arjunmdas (arjun das).

16 messages 2021/03/02

[#102776] [Ruby master Bug#17678] Ractors do not restart after fork — knuckles@...

Issue #17678 has been reported by ivoanjo (Ivo Anjo).

8 messages 2021/03/08

[#102797] [Ruby master Feature#17684] Remove `--disable-gems` from release version of Ruby — hsbt@...

Issue #17684 has been reported by hsbt (Hiroshi SHIBATA).

17 messages 2021/03/10

[#102829] [Ruby master Bug#17718] a method paramaters object that can be pattern matched against — dsisnero@...

Issue #17718 has been reported by dsisnero (Dominic Sisneros).

9 messages 2021/03/11

[#102832] [Ruby master Misc#17720] Cirrus CI to check non-x86_64 architecture cases by own machines — jaruga@...

Issue #17720 has been reported by jaruga (Jun Aruga).

19 messages 2021/03/12

[#102850] [Ruby master Bug#17723] autoconf 2.70+ is not working with master branch — hsbt@...

Issue #17723 has been reported by hsbt (Hiroshi SHIBATA).

11 messages 2021/03/14

[#102884] [Ruby master Bug#17725] Prepend Breaks Ability to Alias — josh@...

Issue #17725 has been reported by joshuadreed (Josh Reed).

14 messages 2021/03/16

[#102914] [Ruby master Bug#17728] [BUG] Segmentation fault at 0x0000000000000000 — denthebat@...

Issue #17728 has been reported by meliborn (Denis Denis).

13 messages 2021/03/18

[#102919] [Ruby master Bug#17730] Ruby on macOS transitively links to ~150 dylibs — rickmark@...

Issue #17730 has been reported by rickmark (Rick Mark).

10 messages 2021/03/18

[#103013] [Ruby master Bug#17748] Ruby 3.0 takes a long time to resolv DNS of nonexistent domains — xdmx@...

Issue #17748 has been reported by xdmx (Eric Bloom).

8 messages 2021/03/25

[#103026] [Ruby master Feature#17749] Const source location without name — tenderlove@...

Issue #17749 has been reported by tenderlovemaking (Aaron Patterson).

10 messages 2021/03/25

[#103036] [Ruby master Misc#17751] Do these instructions (<<, +, [0..n]) modify the original string without creating copies? — cart4for1@...

Issue #17751 has been reported by stiuna (Juan Gregorio).

11 messages 2021/03/26

[#103040] [Ruby master Feature#17752] Enable -Wundef for C extensions in repository — eregontp@...

Issue #17752 has been reported by Eregon (Benoit Daloze).

23 messages 2021/03/26

[#103044] [Ruby master Feature#17753] Add Module#outer_scope — tenderlove@...

Issue #17753 has been reported by tenderlovemaking (Aaron Patterson).

31 messages 2021/03/26

[#103088] [Ruby master Feature#17760] Where we should install a header file when `gem install --user`? — muraken@...

Issue #17760 has been reported by mrkn (Kenta Murata).

11 messages 2021/03/30

[#103102] [Ruby master Feature#17762] A simple way to trace object allocation — mame@...

Issue #17762 has been reported by mame (Yusuke Endoh).

18 messages 2021/03/30

[#103105] [Ruby master Feature#17763] Implement cache for cvars — eileencodes@...

Issue #17763 has been reported by eileencodes (Eileen Uchitelle).

18 messages 2021/03/30

[ruby-core:102861] [Ruby master Bug#16996] Hash should avoid doing unnecessary rehash

From: knu@...
Date: 2021-03-15 05:25:32 UTC
List: ruby-core #102861
Issue #16996 has been updated by knu (Akinori MUSHA).


I think I can just drop the spec in test_set.rb, if it is blocking this.

----------------------------------------
Bug #16996: Hash should avoid doing unnecessary rehash
https://bugs.ruby-lang.org/issues/16996#change-90919

* Author: marcandre (Marc-Andre Lafortune)
* Status: Open
* Priority: Normal
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN
----------------------------------------
Pop quiz: Which is the fastest way to get a copy of a Hash `h`?

If, like me, you thought `h.dup` (of course, right?), you are actually wrong.

The fastest way is to call `h.merge`. Try it:

```
require 'benchmark/ips'

lengths = 1..50

h = lengths.to_h { |i| ['x' * i, nil] }

Benchmark.ips do |x|
  x.report("dup")        { h.dup }
  x.report("merge")      { h.merge }
end
```
I get
```
Calculating -------------------------------------
                 dup    259.233k (9.2%) i/s -      1.285M in   5.013445s
               merge    944.095k (ア 8.2%) i/s -      4.693M in   5.005315s
```

Yup, it's *3.5x faster* with this example!!

Why? Because `Hash#dup` does a rehash, and `merge` does not.

Pop quiz 2: which methods of `Hash` that produce a new hash do a rehash?

Answer: it depends on the method and on the Ruby version

```

+---------------------------------+------+-----+-----+-----+-----+-----+-----+-----+-----+
| Does this rehash?               | head | 2.7 | 2.6 | 2.5 | 2.4 | 2.3 | 2.2 | 2.1 | 2.0 |
+---------------------------------+------+-----+-----+-----+-----+-----+-----+-----+-----+
| h.dup / clone                   |  Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
+---------------------------------+------+-----+-----+-----+-----+-----+-----+-----+-----+
| h.select{true} / reject{false}  |  Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
+---------------------------------+------+-----+-----+-----+-----+-----+-----+-----+-----+
| h.select!{true} / reject!{false}|   リ  |  リ  |  リ  |  リ  |  リ  |  リ  |  リ  |  リ  |  リ  |
+---------------------------------+------+-----+-----+-----+-----+-----+-----+-----+-----+
| sub_h.to_h                      |   リ  |  リ  |  リ  |  リ  |  リ  |  リ  |  リ  |  リ  |  リ  |
+---------------------------------+------+-----+-----+-----+-----+-----+-----+-----+-----+
| h.merge({})                     |   リ  |  リ  |  リ  |  リ  | Yes | Yes | Yes | Yes | Yes |
+---------------------------------+------+-----+-----+-----+-----+-----+-----+-----+-----+
| h.merge                         |   リ  |  リ  |  リ  |             n/a                   |
+---------------------------------+------+-----+-----+-----+-----+-----+-----+-----+-----+
| h.transform_values(&:itself)    |   リ  |  リ  | Yes | Yes | Yes |          n/a          |
+---------------------------------+------+-----+-----+-----+-----+-----+-----+-----+-----+
(where `sub_h = Class.new(Hash).replace(h)`, リ = no rehash)
```

So in Ruby head, doing `h.merge({})` or even `h.transform_values(&:itself)` will be much faster than `h.dup` (but slower in Ruby 2.4) (*)

Notice that `select` rehashes, but `select!` doesn't, so the fastest way to do a `select` in Ruby is... not to call select and instead to actually do a `merge.select!`! (*)

*: on hashes with non-negligible hash functions

```ruby
class Hash
  def fast_select(&block)
    merge.select!(&block) # don't call dup because it's slow
  end
end

Benchmark.ips do |x|
  x.report("select")           { h.select{true} }
  x.report("fast_select")      { h.fast_select{true} }
end
```

On my test case above, `fast_select` is *2.5x faster* than `select`. `fast_select` will always return exactly the same result (unless the receiver needed a rehash).

Pop quiz 3: Is this a bug or a feature?

It should be clear that no feature of Ruby should be re-implementable in Ruby with a 3.5x / 2.5x speed gain, so many would think "of course it's a bug".

Well, https://bugs.ruby-lang.org/issues/16121 seems to think that `Hash#dup`'s rehash is a feature...
Why?
Because there is actually a test that `dup` does a rehash
Why?
Because a test of `Set` was failing otherwise!
Commit: https://github.com/ruby/ruby/commit/a34a3c2caae4c1fbd
Short discussion: http://blade.nagaokaut.ac.jp/cgi-bin/vframe.rb/ruby/ruby-core/48040?47945-48527
Actual test: https://github.com/ruby/ruby/blob/master/test/test_set.rb#L621-L625
Why?
This test construct a `Set` that needs to be rehashed (by mutating an element of the set after it is added), and then checks that `rehash_me == rehash_me.clone`.
That test is bogus. It passes for obscure and undocumented reasons, and `rehash_me.clone == rehash_me` doesn't pass.
Today, it is official that sets with elements that are later mutated must be `Set#reset`, so it is official that this should not be relied upon.

Probably more clear is the case of `select/reject` (but I didn't check for failing test), and even more clear that `merge` changed in Ruby 2.5 and `transform_values` in 2.7, but not a single `NEWS` file mentions the word "rehash".

My conclusion is that Hash should avoid doing an unnecessary rehash: `dup`/`clone`/`select`/`reject`. We probably should add a reminder in the `NEWS` that if anyone mutates a key of a Hash, or an element of a Set and does not call `rehash`/`reset`, improper behavior should be expected.

Let's make `Hash#dup/clone/select/reject` fast please.

Any objection?



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

In This Thread