From: stanhu@... Date: 2021-03-05T17:27:58+00:00 Subject: [ruby-core:102746] [Ruby master Bug#17671] Possible deadlock with condition variables in Net::IMAP Issue #17671 has been updated by stanhu (Stan Hu). Filed under https://github.com/ruby/net-imap/issues/14. Feel free to close this. ---------------------------------------- Bug #17671: Possible deadlock with condition variables in Net::IMAP https://bugs.ruby-lang.org/issues/17671#change-90757 * Author: stanhu (Stan Hu) * Status: Assigned * Priority: Normal * Assignee: shugo (Shugo Maeda) * Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN ---------------------------------------- We're using Net::IMAP via the [MailRoom gem](https://github.com/tpitale/mail_room), and quite frequently we are seeing issues with the process not terminating even though we attempt to run `#Thread#join` with a 60-second timeout. A GDB backtrace shows that this is stuck waiting for a conditional variable: ``` Thread 2 (Thread 0x7f2bf81ff700 (LWP 11)): #0 futex_wait_cancelable (private=0, expected=0, futex_word=0x7f2bfa3df810) at ../sysdeps/unix/sysv/linux/futex-internal.h:88 #1 __pthread_cond_wait_common (abstime=0x0, mutex=0x7f2bfa3df840, cond=0x7f2bfa3df7e8) at pthread_cond_wait.c:502 #2 __pthread_cond_wait (cond=cond@entry=0x7f2bfa3df7e8, mutex=mutex@entry=0x7f2bfa3df840) at pthread_cond_wait.c:655 #3 0x00007f2bffe343fd in rb_native_cond_wait (mutex=0x7f2bfa3df840, cond=0x7f2bfa3df7e8) at thread_pthread.c:1197 #4 native_cond_sleep (th=th@entry=0x7f2bfa3df780, rel=rel@entry=0x0) at thread_pthread.c:1197 #5 0x00007f2bffe346f4 in native_sleep (th=th@entry=0x7f2bfa3df780, rel=0x0) at thread_pthread.c:2113 #6 0x00007f2bffe376de in sleep_forever (fl=1, th=0x7f2bfa3df780) at thread.c:1252 #7 rb_thread_sleep_deadly_allow_spurious_wakeup () at thread.c:1343 #8 rb_mutex_sleep_forever (time=time@entry=8) at thread_sync.c:451 #9 0x00007f2bffcce134 in rb_ensure (b_proc=b_proc@entry=0x7f2bffe37660 , data1=data1@entry=8, e_proc=e_proc@entry=0x7f2bffe36160 , data2=data2@entry=139826840321280) at eval.c:1129 #10 0x00007f2bffe361bd in rb_mutex_sleep (self=139826840321280, timeout=) at thread_sync.c:477 #11 0x00007f2bffe796e1 in vm_call0_cfunc_with_frame (argv=0x7f2bf80fc858, cd=, calling=0x7f2bf80fc700, ec=0x7f2bfa209bd0) at vm_eval.c:91 #12 vm_call0_cfunc (argv=0x7f2bf80fc858, cd=, calling=0x7f2bf80fc700, ec=0x7f2bfa209bd0) at vm_eval.c:105 #13 vm_call0_body (ec=0x7f2bfa209bd0, calling=0x7f2bf80fc700, cd=, argv=0x7f2bf80fc858) at vm_eval.c:140 #14 0x00007f2bffe7bfd8 in rb_funcallv_with_cc (cd=0x7f2bfffa75c0 , recv=139826840321280, mid=24193, argc=1, argv=) at vm_core.h:1805 #15 0x00007f2bffcce134 in rb_ensure (b_proc=b_proc@entry=0x7f2bffe2f360 , data1=data1@entry=139826822105168, e_proc=e_proc@entry=0x7f2bffe2df60 , data2=data2@entry=139826822105184) at eval.c:1129 #16 0x00007f2bffe2f472 in rb_condvar_wait (argc=, argv=0x7f2bf80fcad0, self=139826858411400) at thread_sync.c:1407 #17 0x00007f2bffe796e1 in vm_call0_cfunc_with_frame (argv=0x7f2bf80fcad0, cd=, calling=0x7f2bf80fc980, ec=0x7f2bfa209bd0) at vm_eval.c:91 #18 vm_call0_cfunc (argv=0x7f2bf80fcad0, cd=, calling=0x7f2bf80fc980, ec=0x7f2bfa209bd0) at vm_eval.c:105 #19 vm_call0_body (ec=0x7f2bfa209bd0, calling=0x7f2bf80fc980, cd=, argv=0x7f2bf80fcad0) at vm_eval.c:140 #20 0x00007f2bffe79d5f in rb_vm_call0 (ec=ec@entry=0x7f2bfa209bd0, recv=recv@entry=139826858411400, id=id@entry=24353, argc=, argv=, me=me@entry=0x7f2bfef8b9d0, kw_splat=0) at vm_eval.c:52 #21 0x00007f2bffe79ff8 in rb_vm_call_kw (ec=0x7f2bfa209bd0, recv=139826858411400, id=24353, argc=, argc@entry=2, argv=, argv@entry=0x7f2bf80fcab0, me=0x7f2bfef8b9d0, kw_splat=) at vm_eval.c:268 #22 0x00007f2bffe7a453 in rb_call0 (ec=, recv=, mid=, argc=argc@entry=2, argv=argv@entry=0x7f2bf80fcab0, call_scope=call_scope@entry=CALL_FCALL, self=) at vm_eval.c:392 #23 0x00007f2bffe7ac7e in rb_call (scope=CALL_FCALL, argv=0x7f2bf80fcab0, argc=2, mid=, recv=) at vm_eval.c:718 #24 rb_funcallv (recv=, mid=, argc=argc@entry=2, argv=argv@entry=0x7f2bf80fcad0) at vm_eval.c:958 #25 0x00007f2bff2f62e1 in monitor_wait_for_cond_body (v=v@entry=139826822106112) at monitor.c:152 #26 0x00007f2bffcce134 in rb_ensure (b_proc=b_proc@entry=0x7f2bff2f62a0 , data1=data1@entry=139826822106112, e_proc=e_proc@entry=0x7f2bff2f64a0 , data2=data2@entry=139826822106112) at eval.c:1129 #27 0x00007f2bff2f641e in monitor_wait_for_cond (monitor=139826840321320, cond=, timeout=8) at monitor.c:180 #28 0x00007f2bffe66c59 in vm_call_cfunc_with_frame (empty_kw_splat=, cd=0x7f2bfe9c8b70, calling=, reg_cfp=0x7f2bf81fd268, ec=0x7f2bfa209bd0) at vm_insnhelper.c:2514 #29 vm_call_cfunc (ec=0x7f2bfa209bd0, reg_cfp=0x7f2bf81fd268, calling=, cd=0x7f2bfe9c8b70) at vm_insnhelper.c:2539 #30 0x00007f2bffe72092 in vm_sendish (block_handler=, method_explorer=, cd=, reg_cfp=, ec=) at vm_insnhelper.c:4023 #31 vm_exec_core (ec=0x7f2bfa3df810, initial=128) at insns.def:801 #32 0x00007f2bffe78365 in rb_vm_exec (ec=0x7f2bfa209bd0, mjit_enable_p=1) at vm.c:1929 #33 0x00007f2bffe82130 in invoke_block (captured=0x7f2bf81fd3d0, opt_pc=, type=, cref=, self=139826840321400, iseq=0x7f2bfef91600, ec=0x7f2bfa209bd0) at vm.c:1044 #34 invoke_iseq_block_from_c (me=0x0, is_lambda=0, cref=, passed_block_handler=, kw_splat=, argv=, argc=0, self=139826840321400, captured=0x7f2bf81fd3d0, ec=0x7f2bfa209bd0) at vm.c:1116 #35 invoke_block_from_c_bh (argc=, argv=, kw_splat=, passed_block_handler=, cref=, is_lambda=, force_blockarg=, block_handler=, ec=) at vm.c:1134 #36 vm_yield (argc=, argv=, kw_splat=, ec=) at vm.c:1179 #37 rb_yield_0 (argv=, argc=) at vm_eval.c:1227 #38 rb_yield_values (n=) at vm_eval.c:1252 #39 0x00007f2bffe82eef in vm_yield (kw_splat=0, argv=, argc=, ec=) at vm.c:1165 #40 rb_yield_0 (argv=, argc=) at vm_eval.c:1227 #41 rb_yield_values2 (argc=, argv=) at vm_eval.c:1273 #42 0x00007f2bffcce134 in rb_ensure (b_proc=0x7f2bff2f6380 , data1=139826840321320, e_proc=, data2=) at eval.c:1129 #43 0x00007f2bffe66c59 in vm_call_cfunc_with_frame (empty_kw_splat=, cd=0x7f2bfa4f19e0, calling=, reg_cfp=0x7f2bf81fd380, ec=0x7f2bfa209bd0) at vm_insnhelper.c:2514 #44 vm_call_cfunc (ec=0x7f2bfa209bd0, reg_cfp=0x7f2bf81fd380, calling=, cd=0x7f2bfa4f19e0) at vm_insnhelper.c:2539 #45 0x00007f2bffe72150 in vm_sendish (method_explorer=, block_handler=, cd=, reg_cfp=, ec=) at vm_insnhelper.c:4023 #46 vm_exec_core (ec=0x7f2bfa3df810, initial=128) at insns.def:782 #47 0x00007f2bffe78365 in rb_vm_exec (ec=0x7f2bfa209bd0, mjit_enable_p=1) at vm.c:1929 #48 0x00007f2bffe792bc in invoke_iseq_block_from_c (me=0x0, is_lambda=, cref=0x0, passed_block_handler=0, kw_splat=, argv=, argc=0, self=139826857723800, captured=, ec=0x7f2bfa209bd0) at vm.c:1116 #49 invoke_block_from_c_proc (me=0x0, is_lambda=, passed_block_handler=0, kw_splat=, argv=, argc=0, self=139826857723800, proc=0x7f2bfa209bd0, ec=0x7f2bfa209bd0) at vm.c:1216 #50 vm_invoke_proc (passed_block_handler=0, kw_splat=, argv=, argc=0, self=139826857723800, proc=0x7f2bfa209bd0, ec=0x7f2bfa209bd0) at vm.c:1238 #51 rb_vm_invoke_proc (ec=0x7f2bfa209bd0, proc=proc@entry=0x7f2bf98a4510, argc=0, argv=, kw_splat=, passed_block_handler=passed_block_handler@entry=0) at vm.c:1259 #52 0x00007f2bffe3646d in thread_do_start (th=0x7f2bfa3df780) at thread.c:697 #53 0x00007f2bffe384ef in thread_start_func_2 (th=0x7f2bfa3df780, stack_start=) at thread.c:745 #54 0x00007f2bffe38a34 in thread_start_func_1 (th_ptr=) at thread_pthread.c:969 #55 0x00007f2bff9c5fa3 in start_thread (arg=) at pthread_create.c:486 #56 0x00007f2bff4574cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Thread 1 (Thread 0x7f2bff33ff80 (LWP 7)): #0 0x00007f2bff44c916 in __GI_ppoll (fds=fds@entry=0x7fff52ea2f78, nfds=nfds@entry=1, timeout=, timeout@entry=0x7fff52ea2f80, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0x00007f2bffe33880 in rb_sigwait_sleep (th=th@entry=0x7f2bfee2d000, sigwait_fd=sigwait_fd@entry=3, rel=rel@entry=0x7fff52ea3088) at hrtime.h:148 #2 0x00007f2bffe34589 in native_sleep (th=0x7f2bfee2d000, rel=0x7fff52ea3088) at thread_pthread.c:2099 #3 0x00007f2bffe37b64 in rb_thread_terminate_all () at thread.c:587 #4 0x00007f2bffccfbaa in rb_ec_cleanup (ec=ec@entry=0x7f2bfee09050, ex=) at eval.c:218 #5 0x00007f2bffccfed3 in ruby_run_node (n=0x7f2bfa503a98) at eval.c:336 #6 0x000055dabf0fe10b in main (argc=, argv=) at ./main.c:50 ``` `Net::IMAP` uses several conditional variables: 1. `idle` : https://github.com/ruby/ruby/blob/48f324e92f9b36edc267f9871e35039cbd1c2eb9/lib/net/imap.rb#L965 1. `get_tagged_response`: https://github.com/ruby/ruby/blob/48f324e92f9b36edc267f9871e35039cbd1c2eb9/lib/net/imap.rb#L1215 1. `send_literal`: https://github.com/ruby/ruby/blob/48f324e92f9b36edc267f9871e35039cbd1c2eb9/lib/net/imap.rb#L1368 The last two do NOT have a timeout, so it's possible we're getting stuck in one of those cases. I noticed that `send_literal` only checks the state of `@exception` *after* the `wait` returns. Do we need to do this? ```diff diff --git a/lib/net/imap.rb b/lib/net/imap.rb index 720acbc86d..2e1fd6e7b9 100644 --- a/lib/net/imap.rb +++ b/lib/net/imap.rb @@ -1365,6 +1365,7 @@ def send_literal(str, tag = nil) @continued_command_tag = tag @continuation_request_exception = nil begin + raise @exception if @exception @continuation_request_arrival.wait e = @continuation_request_exception || @exception raise e if e ``` -- https://bugs.ruby-lang.org/ Unsubscribe: