From: eregontp@... Date: 2021-03-02T21:41:04+00:00 Subject: [ruby-core:102726] [Ruby master Bug#17669] An exception still breaks monitor state and causes deadlock in 2.6.7 Issue #17669 has been updated by Eregon (Benoit Daloze). This is the same bug that @headius reported in https://github.com/ruby/monitor/issues/2. I'd like to ask to make that repository public (currently it's private). If the concern is that it might be confusing as the recent monitor stdlib does not use that source, how about renaming that repository, e.g. to `monitor-rb`? ---------------------------------------- Bug #17669: An exception still breaks monitor state and causes deadlock in 2.6.7 https://bugs.ruby-lang.org/issues/17669#change-90722 * Author: yoheimuta (Yohei Yoshimuta) * Status: Open * Priority: Normal * ruby -v: ruby 2.6.7p153 (2021-01-31 revision 67892) [x86_64-darwin19] * Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN ---------------------------------------- `lib/monitor.rb` provides Monitor. However, its state handling is weak for interrupts caused by Thread.kill for example timeout libraries even after introducing some uses of Thread.handle_interrupt at https://bugs.ruby-lang.org/issues/15992. Actually, timeout exception may happen everywhere. If it raised when the thread is executing right before the begin block, ```rb def mon_synchronize # Prevent interrupt on handling interrupts; for example timeout errors # it may break locking state. -> Thread.handle_interrupt(Exception => :never){ mon_enter } begin yield ensure Thread.handle_interrupt(EXCEPTION_NEVER){ mon_exit } end end ``` it breaks the state of the monitor and it causes deadlock. I can confirm that this happens either in 2.6.7 head and 2.6.6 release. ``` /bin/bash -c \ "date; ruby -v; ruby reproducible.rb; tail -n 10 /tmp/tmp.txt; date;" | tee ruby:2.6.7-macosx.log ``` ``` docker run -it --rm -v `pwd`:`pwd` -w `pwd` ruby:2.6.6-alpine3.13 /bin/ash -c \ "date; ruby -v; ruby reproducible.rb; tail -n 10 /tmp/tmp.txt; date;" | tee ruby:2.6.6-alpine3.13.log ``` Technically, 2.5.8 is also reproducible because it shares the same releated code. Incidentally, this doesn't happen in either 2.7.2 and 3.0.0 because [the monitor was reimplemented in C](https://bugs.ruby-lang.org/issues/16255). Our production busy puma servers have suffered this weakness susceptible to timeouts, which frequently causes completely hung worker threads in a process. The commit https://github.com/ruby/ruby/pull/4204/commits/e99c823f16918677b823255c44142910e02922c1 should fix this issue. ---Files-------------------------------- reproducible.rb (1.71 KB) ruby_2.6.6-alpine3.13.log (12.8 KB) ruby_2.6.7-macosx.log (3.73 KB) -- https://bugs.ruby-lang.org/ Unsubscribe: