From: Eric Wong Date: 2018-08-19T23:06:34+00:00 Subject: [ruby-core:88558] Re: [Ruby trunk Bug#14999] ConditionVariable doesn't reacquire the Mutex if Thread#kill-ed eregontp@gmail.com wrote: > normalperson (Eric Wong) wrote: > > "git bisect" points to r63498 ("thread_pthread.c: enable > > thread cache by default"), which is HIGHLY unexpected. > > So it seems it doesn't happen in Ruby <= 2.5 due to thread > caching being disabled then? Right; but that might just be "luck"... > > I'm not seeing how thread caching can be the cause of the > > bug, but rather it makes an existing bug more apparent by > > being faster and hitting the race sooner. > > No idea either, but I guess a bug could lurk in there. I did > experience a few weird bugs when doing thread caching in > TruffleRuby (e.g., inline caches based on pthread_self() are > incorrect as multiple Ruby threads could use the same pthread > thread) At least for the core VM and standard library; I don't think pthread_self is a factor; and we already clobber our TSD pointer since r63836 > > What thread-caching does do is make the MSPECOPT "-R" > > option noticeably faster > > Right, because creating threads is so much faster. Maybe the > fact pthread startup is done each time has some effect on > races (e.g., the pthread thread has a perfect view of the > caller thread when starting, but not true with caching, but > maybe the GVL still ensure that) Yes, it might affect accounting done in the kernel; particularly when bound to a single CPU. And it still takes -R1000 to reliably reproduce the problem with a single CPU; -R500 was sometimes successful, even. > > I think the only way to work reliably is to disable > > interrupt checking when attempting to lock a mutex in > > ensure: > > At least if the interrupt causes to escape the function (e.g., > Thread#raise/Thread#kill), then that escape must be delayed > after re-acquiring the Mutex. Yes, so I've committed r64476 > In normal Mutex#lock and Mutex#synchronize, it's fine to > raise/kill before taking the Mutex, because it's before > entering the critical section, but unfortunately here we are > inside the critical section. So this is why pthread_mutex_lock is forbidden from returning EINTR by POSIX. Mutex#lock/synchronize should have been uninterruptible, too; but it's too late to change that as it would break compatibility with existing code (bootstraptest/test_thread.rb already breaks). > > it could still affect user-experience if Ctrl-C takes a while > > When would that happen? > If some other Thread holds the Mutex too long? Yes, but I suppose it's not a real problem. > It's probably never a good idea to hold a Mutex for seconds or > a long time, especially when using a ConditionVariable. > It seems fair enough to delay the interrupt in such a case, as > the main Thread is waiting for the Mutex. Agreed(*). > Maybe we could warn if re-acquiring takes too long and it > becomes a problem in practice (I think it won't). We already have deadlock checking to alert users to real problems, so it's probably not a problem in practice. (*) Along those lines, I think the "lock" idiom for GVL is not ideal for performance (kosaki rewrote the GVL for 1.9.3 to optimize for contention as a result). I might try replacing the GVL with a platform-independent scheduler which limits parallelism to the data structures the GVL currently protects. Note: This will NOT solve the parallelism problem which exists in C Ruby; that is a much bigger task (but still doable with a scheduler, and perhaps even more so). Unsubscribe: