From: "Eregon (Benoit Daloze)" <noreply@...>
Date: 2022-12-15T13:38:03+00:00
Subject: [ruby-core:111308] [Ruby master Bug#19237] Hash default_proc is not thread-safe to  lazy-initialize value for a given key

Issue #19237 has been reported by Eregon (Benoit Daloze).

----------------------------------------
Bug #19237: Hash default_proc is not thread-safe to  lazy-initialize value for a given key
https://bugs.ruby-lang.org/issues/19237

* Author: Eregon (Benoit Daloze)
* Status: Open
* Priority: Normal
* Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
```ruby
1000.times do
  h = Hash.new do |hash, key|
    # Note, with [] instead of Array.new below is seems to work
    # but then e.g. anything else in this block like Thread.pass also breaks it
    hash[key] = Array.new
  end

  go = false
  threads = 100.times.map do
    Thread.new do
      Thread.pass until go
      h[:na] << true
    end
  end
  go = true
  threads.each(&:join)
  raise "Expected 100 elements but was #{h[:na].size}" if h[:na].size != 100
end
```

gives (in 3 runs):
```
concurrent_hash_default_proc.rb:17:in `block in <main>': Expected 100 elements but was 1 (RuntimeError)
concurrent_hash_default_proc.rb:17:in `block in <main>': Expected 100 elements but was 3 (RuntimeError)
concurrent_hash_default_proc.rb:17:in `block in <main>': Expected 100 elements but was 2 (RuntimeError)
```

So what happens is the same Hash entry gets assigned multiple times, which feels quite unexpected and cause fairly surprising bugs (e.g. elements "disappearing" etc).

Any thoughts on how to solve that in CRuby?

In my PhD thesis one way I found is to actually pass a different object than the Hash itself as the first argument to the block.
See https://eregon.me/blog/assets/research/thesis-thread-safe-data-representations-in-dynamic-languages.pdf page 83 `Idiomatic Concurrent Hash Operations`. In short, it replaces `[]=` calls in the initializer block with `put_if_absent` by passing a different object than the Concurrent::Hash itself, which overrides `[]=` and delegates the rest.

#19069 could be another way but there are compatibility issues (e.g. storing when it previously would not for `Hash.new { |h,k| k * 3 }`.

There is also the question whether the block should be allowed to execute more than once for a given key.
I think that is difficult to solve (probably the only way is via some lock, but then that can lead to deadlocks), and less important than ensuring the value is only assigned once for a given key.

---

Note that concurrent-ruby has the same issue because it uses ::Array/::Hash on CRuby: https://github.com/ruby-concurrency/concurrent-ruby/issues/970#issuecomment-1346338557 There is also more discussions about trade-offs there which apply to `Hash` too.



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/