From: "Eregon (Benoit Daloze)" Date: 2021-09-28T13:36:40+00:00 Subject: [ruby-core:105470] [Ruby master Bug#17669] An exception still breaks monitor state and causes deadlock in 2.6.7 Issue #17669 has been updated by Eregon (Benoit Daloze). Ah, 2.6.8 is in security maintenance, so maybe this is not considered then? ---------------------------------------- Bug #17669: An exception still breaks monitor state and causes deadlock in 2.6.7 https://bugs.ruby-lang.org/issues/17669#change-93919 * Author: yoheimuta (Yohei Yoshimuta) * Status: Closed * Priority: Normal * ruby -v: ruby 2.6.7p153 (2021-01-31 revision 67892) [x86_64-darwin19] * Backport: 2.5: UNKNOWN, 2.6: REQUIRED, 2.7: DONTNEED, 3.0: DONTNEED ---------------------------------------- `lib/monitor.rb` provides Monitor. However, its state handling is weak for interrupts caused by Thread.kill for example timeout libraries even after introducing some uses of Thread.handle_interrupt at https://bugs.ruby-lang.org/issues/15992. Actually, timeout exception may happen everywhere. If it raised when the thread is executing right before the begin block, ```rb def mon_synchronize # Prevent interrupt on handling interrupts; for example timeout errors # it may break locking state. -> Thread.handle_interrupt(Exception => :never){ mon_enter } begin yield ensure Thread.handle_interrupt(EXCEPTION_NEVER){ mon_exit } end end ``` it breaks the state of the monitor and it causes deadlock. I can confirm that this happens either in 2.6.7 head and 2.6.6 release. ``` /bin/bash -c \ "date; ruby -v; ruby reproducible.rb; tail -n 10 /tmp/tmp.txt; date;" | tee ruby:2.6.7-macosx.log ``` ``` docker run -it --rm -v `pwd`:`pwd` -w `pwd` ruby:2.6.6-alpine3.13 /bin/ash -c \ "date; ruby -v; ruby reproducible.rb; tail -n 10 /tmp/tmp.txt; date;" | tee ruby:2.6.6-alpine3.13.log ``` Technically, 2.5.8 is also reproducible because it shares the same releated code. Incidentally, this doesn't happen in either 2.7.2 and 3.0.0 because [the monitor was reimplemented in C](https://bugs.ruby-lang.org/issues/16255). Our production busy puma servers have suffered this weakness susceptible to timeouts, which frequently causes completely hung worker threads in a process. The commit https://github.com/ruby/ruby/pull/4204/commits/e99c823f16918677b823255c44142910e02922c1 should fix this issue. ---Files-------------------------------- reproducible.rb (1.71 KB) ruby_2.6.6-alpine3.13.log (12.8 KB) ruby_2.6.7-macosx.log (3.73 KB) -- https://bugs.ruby-lang.org/ Unsubscribe: