From: zach.anker@... Date: 2017-09-08T20:17:14+00:00 Subject: [ruby-core:82720] [Ruby trunk Bug#13882] Exception in `ensure` stops threads from exiting Issue #13882 has been reported by zanker (Zachary Anker). ---------------------------------------- Bug #13882: Exception in `ensure` stops threads from exiting https://bugs.ruby-lang.org/issues/13882 * Author: zanker (Zachary Anker) * Status: Open * Priority: Normal * Assignee: * Target version: * ruby -v: 2.4.1 * Backport: 2.2: UNKNOWN, 2.3: UNKNOWN, 2.4: UNKNOWN ---------------------------------------- When the Ruby process is gracefully exiting, if a thread has an exception during an `ensure` block it appears the Ruby process forgets it's attempting to exit and will keep running forever. Since there's still an alive thread, `rb_thread_terminate_all` doesn't finish (since `vm_living_thread_num(vm) > 1` is still true), and the Ruby process never exits until you `kill -9` it. I was able to cause this going back as far as MRI 2.0.0, but didn't have a 1.9.3 install to double check with. Repo case: ```ruby Thread.new do loop do puts "Loop start" begin begin sleep ensure raise end rescue => e p e end end end sleep 1 exit ``` Will result in a two `Loop start` messages. When running GDB on the process, we see Ruby is waiting on `sleep_forever` which is expected, but you can see the main thread is stuck on `rb_thread_terminate_all`: ``` (gdb) t a a bt Thread 3 (Thread 0x7f956f36e700 (LWP 401088)): #0 0x00000033d90df113 in poll () from /lib64/libc.so.6 #1 0x00007f9575863775 in timer_thread_sleep (p=0x7f957524c008) at thread_pthread.c:1460 #2 thread_timer (p=0x7f957524c008) at thread_pthread.c:1568 #3 0x00000033d9407aa1 in start_thread () from /lib64/libpthread.so.0 #4 0x00000033d90e893d in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x7f956f363700 (LWP 401117)): #0 0x00000033d940b68c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f957585d959 in native_cond_wait (cond=<value optimized out>, mutex=<value optimized out>) at thread_pthread.c:343 #2 0x00007f95758675b8 in native_sleep (th=0x7f956c491800, timeout_tv=0x0) at thread_pthread.c:1147 #3 0x00007f95758682f2 in sleep_forever () at thread.c:1083 #4 rb_thread_sleep_forever () at thread.c:1157 #5 0x00007f95757e1ca5 in rb_f_sleep (argc=0, argv=0x7f956bce6a78) at process.c:4393 #6 0x00007f9575890bda in vm_call_cfunc_with_frame (th=0x7f956c491800, reg_cfp=0x7f956bde6940, calling=<value optimized out>, ci=0x7f956e98c910, cc=<value optimized out>) at vm_insnhelper.c:1752 #7 vm_call_cfunc (th=0x7f956c491800, reg_cfp=0x7f956bde6940, calling=<value optimized out>, ci=0x7f956e98c910, cc=<value optimized out>) at vm_insnhelper.c:1847 #8 0x00007f957589f12b in vm_exec_core (th=<value optimized out>, initial=<value optimized out>) at insns.def:1066 #9 0x00007f95758a470b in vm_exec (th=0x7f956c491800) at vm.c:1727 #10 0x00007f95758abc04 in invoke_block () at vm.c:969 #11 invoke_iseq_block_from_c () at vm.c:1014 #12 invoke_block_from_c_splattable () at vm.c:1032 #13 vm_yield () at vm.c:1074 #14 rb_yield_0 () at vm_eval.c:1010 #15 loop_i () at vm_eval.c:1088 #16 0x00007f9575755954 in rb_rescue2 (b_proc=0x7f95758ab8b0 <loop_i>, data1=0, r_proc=0x7f957588b2a0 <loop_stop>, data2=0) at eval.c:838 #17 0x00007f9575890bda in vm_call_cfunc_with_frame (th=0x7f956c491800, reg_cfp=0x7f956bde69a0, calling=<value optimized out>, ci=0x7f956c5090f0, cc=<value optimized out>) at vm_insnhelper.c:1752 #18 vm_call_cfunc (th=0x7f956c491800, reg_cfp=0x7f956bde69a0, calling=<value optimized out>, ci=0x7f956c5090f0, cc=<value optimized out>) at vm_insnhelper.c:1847 #19 0x00007f95758a726b in vm_call_method (th=0x7f956c491800, cfp=0x7f956bde69a0, calling=<value optimized out>, ci=<value optimized out>, cc=<value optimized out>) at vm_insnhelper.c:2295 #20 0x00007f957589f97c in vm_exec_core (th=<value optimized out>, initial=<value optimized out>) at insns.def:967 #21 0x00007f95758a470b in vm_exec (th=0x7f956c491800) at vm.c:1727 #22 0x00007f95758a5771 in invoke_block (th=0x7f956c491800, captured=<value optimized out>, self=140279789266160, argc=<value optimized out>, argv=<value optimized out>, passed_block_handler=<value optimized out>, cref=0x0, splattable=0, is_lambda=0) at vm.c:969 #23 invoke_iseq_block_from_c (th=0x7f956c491800, captured=<value optimized out>, self=140279789266160, argc=<value optimized out>, argv=<value optimized out>, passed_block_handler=<value optimized out>, cref=0x0, splattable=0, is_lambda=0) at vm.c:1014 #24 0x00007f95758a581f in invoke_block_from_c_unsplattable (th=<value optimized out>, block=<value optimized out>, self=<value optimized out>, argc=<value optimized out>, argv=<value optimized out>, passed_block_handler=<value optimized out>, is_lambda=<value optimized out>) at vm.c:1101 #25 0x00007f95758a595a in vm_invoke_proc (th=0x7f956c491800, proc=0x7f956e04da50, self=140279789266160, argc=0, argv=0x7f956c547c18, passed_block_handler=0) at vm.c:1126 #26 0x00007f9575864589 in thread_do_start (th=0x7f956c491800, stack_start=0x7f956f364000) at thread.c:577 #27 thread_start_func_2 (th=0x7f956c491800, stack_start=0x7f956f364000) at thread.c:619 #28 0x00007f95758649b6 in thread_start_func_1 (th_ptr=0x7f956c491800) at thread_pthread.c:887 #29 0x00000033d9407aa1 in start_thread () from /lib64/libpthread.so.0 #30 0x00000033d90e893d in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x7f95756ab760 (LWP 401080)): #0 0x00000033d940b68c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f957585d959 in native_cond_wait (cond=<value optimized out>, mutex=<value optimized out>) at thread_pthread.c:343 #2 0x00007f95758675b8 in native_sleep (th=0x7f9575229400, timeout_tv=0x0) at thread_pthread.c:1147 #3 0x00007f9575867e34 in rb_thread_terminate_all () at thread.c:494 #4 0x00007f9575758466 in ruby_cleanup (ex=6) at eval.c:186 #5 0x00007f9575758725 in ruby_run_node (n=0x7f956c54c6b8) at eval.c:300 #6 0x00000000004008eb in main (argc=2, argv=0x7fffccd8e7c8) at main.c:36 (gdb) ``` Looking at the thread state shows that the `rb_threadptr_to_kill` executed properly, because `to_kill` is properly set, but the `errinfo` was nilled out: ``` (gdb) p ruby_current_thread $1 = (rb_thread_t *) 0x7f8cd06eb800 (gdb) p ruby_current_thread->to_kill $2 = 1 (gdb) p ruby_current_thread->status $3 = THREAD_STOPPED (gdb) p ruby_current_thread->errinfo $4 = 8 ``` I'm happy to contribute a patch, but not quite sure what the appropriate fix for this would be. It looks like the issue is an exception in `ensure` is causing the `TAG_FATAL` on `errinfo` to be overwritten so the thread doesn't think it should exit, and the right fix is to also check if `to_kill` is set. -- https://bugs.ruby-lang.org/ Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe> <http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>