From: "kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core" <ruby-core@...>
Date: 2023-06-12T07:42:20+00:00
Subject: [ruby-core:113885] [Ruby master Feature#19717] `ConditionVariable#signal` is not fair when the wakeup is consistently spurious.

Issue #19717 has been updated by kjtsanaktsidis (KJ Tsanaktsidis).


> I like the proposed implementation. Is there any chance it still has the similar problems? If we always reque at the front, could multiple waiters reque in front of each other? i.e. it still depends on the order.

I think it can't have similar problems as long as it's woken up by `#signal` - but if it's woken up by `#broadcast` you probably don't want this prepending behaviour.

I think it's possible, with a bit of shuffling around in `thread_sync.c`, for the implementation to not actually remove the thread off the ConditionVariable's waitq until it's actually successfully acquired the resource; that way, if e.g. `#signal` is called twice, the two threads that are woken up will maintain their relative order in the waitq.

I also noticed last night that `MonitorMixin::ConditionVariable` already has `#wait_until` and `#wait_for` methods that take a block. We could adjust the behaviour of these without any new API at all, which might be preferable? They're implemented by calling into `::ConditionVariable#wait` via `rb_funcall`, but we could hack it to call some internal-only APIs to achieve different waitq behaviour if we wanted.

(Sidebar - when does one use `Monitor` in ruby instead of `Mutex`? Is it just that `Monitor` is re-entrant and `Mutex` is not?)

Finally - I did some reading, and another magic phrase I found about this problem is "handoff semantics". We could make it so that when calling `ConditionVariable#signal` whilst holding the associated mutex, the signaled thread is guaranteed to acquire the mutex next after the current thread releases it; essentially, the mutex is "handed off". Because of the GVL, we would also need to "hand off" that off too and give it to the signaled thread. Essentially, when the signalling thread unlocks the mutex, it would need to stop and let the signaled thread run instead.

The textbook downside of this approach is that it leads to increased context switching; in a real multi-CPU language, there might be another thread already running which is about to try and get the mutex, and it's best for _overall_ throughput if it's allowed to do so rather than stopping it and switching threads. In Ruby that applies too (if the signalling thread releases & immediately re-acquires the mutex, we can avoid a context switch). I also think there is a _worse_ problem with this approach in Ruby - the thread that did the signaling has to sleep and do _nothing_ after unlocking the mutex, whereas had it been allowed to run for a while, it might have started a blocking IO operation or called into a C extension or some such and yielded the GVL _whilst also doing something productive_. So, tl;dr, I don't think we should do this "handoff" approach.

----------------------------------------
Feature #19717: `ConditionVariable#signal` is not fair when the wakeup is consistently spurious.
https://bugs.ruby-lang.org/issues/19717#change-103535

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
----------------------------------------
For background, see this issue <https://github.com/socketry/async/issues/99>.

It looks like `ConditionVariable#signal` is not fair, if the calling thread immediately reacquires the resource.

I've given a detailed reproduction here as it's non-trivial: <https://github.com/ioquatix/ruby-condition-variable-timeout>.

Because the spurious wakeup occurs, the thread is pushed to the back of the waitq, which means any other waiting thread will acquire the resource, and that thread will perpetually be at the back of the queue.

I believe the solution is to change `ConditionVarialbe#signal` should only remove the thread from the waitq if it's possible to acquire the lock. Otherwise it should be left in place, so that the order is retained, this should result in fair scheduling.


-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/