From: merch-redmine@... Date: 2021-03-10T22:52:36+00:00 Subject: [ruby-core:102817] [Ruby master Bug#17189] test_thread.rb in testsuite deadlocks with musl libc 1.2.1 Issue #17189 has been updated by jeremyevans0 (Jeremy Evans). Status changed from Open to Feedback Can you narrow down which particular test method is causing this issue? That would help narrow down the problem? Even if the problem is not fixable, we could skip that test on musl libc so the test suite doesn't deadlock. It is expected to be able to fork without execve and have child processes perform useful work, and this is commonly used, with some Ruby webservers such as Unicorn working this way by default. fork is supposed to work even if the parent has other threads at the time of forking (other threads will be dead in the child process, though). ---------------------------------------- Bug #17189: test_thread.rb in testsuite deadlocks with musl libc 1.2.1 https://bugs.ruby-lang.org/issues/17189#change-90871 * Author: ncopa (Natanael Copa) * Status: Feedback * Priority: Normal * ruby -v: ruby 2.7.1p83 (2020-03-31 revision a0c7c23c9c) [x86_64-linux-musl] * Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN ---------------------------------------- When we build ruby-2.7.1 in alpine linux the test suite deadlocks in `test_thread.rb`. There are multiple processes: ``` $ ps xa | grep ruby 12401 ncopa 0:01 /home/ncopa/aports/main/ruby/src/ruby-2.7.1/ruby --disable-gems ./bootstraptest/runner.rb --ruby=ruby -I./lib --disable-gems -q 24150 ncopa 11:00 /home/ncopa/aports/main/ruby/src/ruby-2.7.1/ruby -I/home/ncopa/aports/main/ruby/src/ruby-2.7.1/lib --disable-gems -W0 bootstraptest.tmp.rb 24152 ncopa 0:00 /home/ncopa/aports/main/ruby/src/ruby-2.7.1/ruby -I/home/ncopa/aports/main/ruby/src/ruby-2.7.1/lib --disable-gems -W0 bootstraptest.tmp.rb ``` I'm gonna focus on the two last ones, 24150 and 24152. ### pid 24150 strace shows that it waits on ppoll: ``` $ doas strace -p 24150 doas (ncopa@ncopa-edge-x86_64) password: strace: Process 24150 attached ppoll([{fd=3, events=POLLIN}], 1, NULL, NULL, 8 ``` gdb shows two threads: ``` Attaching to process 24150 [New LWP 24151] __cp_end () at src/thread/x86_64/syscall_cp.s:29 29 src/thread/x86_64/syscall_cp.s: No such file or directory. (gdb) info threads Id Target Id Frame * 1 LWP 24150 "ruby" __cp_end () at src/thread/x86_64/syscall_cp.s:29 2 LWP 24151 "bootstraptest.*" VM_EP_LEP (ep=ep@entry=0x7ff1e63e7988) at vm.c:51 (gdb) bt #0 __cp_end () at src/thread/x86_64/syscall_cp.s:29 #1 0x00007ff1e93fbfeb in __syscall_cp_c (nr=271, u=, v=, w=, x=, y=, z=0) at src/thread/pthread_cancel.c:33 #2 0x00007ff1e93ca14b in ppoll (fds=fds@entry=0x7ffcdb9d0520, n=n@entry=1, to=to@entry=0x0, mask=mask@entry=0x0) at src/linux/ppoll.c:24 #3 0x00007ff1e9263160 in rb_sigwait_sleep (th=th@entry=0x7ff1e9440ae0, sigwait_fd=sigwait_fd@entry=3, rel=rel@entry=0x0) at hrtime.h:148 #4 0x00007ff1e92638cb in native_sleep (th=th@entry=0x7ff1e9440ae0, rel=rel@entry=0x0) at thread_pthread.c:2145 #5 0x00007ff1e92658c4 in rb_thread_sleep_interruptible () at thread.c:1334 #6 0x00007ff1e920a27f in waitpid_sleep (x=x@entry=140723992987392) at process.c:1123 #7 0x00007ff1e91836d1 in rb_ensure (b_proc=b_proc@entry=0x7ff1e920a26b , data1=data1@entry=140723992987392, e_proc=e_proc@entry=0x7ff1e920d463 , data2=data2@entry=140723992987392) at eval.c:1129 #8 0x00007ff1e920dde1 in waitpid_wait (w=0x7ffcdb9d0700) at process.c:1182 #9 rb_waitpid (pid=pid@entry=24152, st=st@entry=0x7ffcdb9d0774, flags=) at process.c:1221 #10 0x00007ff1e920dfa3 in proc_wait (argc=1, argv=0x7ff1e8f79090) at process.c:1260 #11 0x00007ff1e9285503 in vm_call_cfunc_with_frame (empty_kw_splat=, cd=0x7ff1e64e82f0, calling=, reg_cfp=0x7ff1e9078fd0, ec=0x7ff1e9440140) at vm_insnhelper.c:2514 #12 vm_call_cfunc (ec=0x7ff1e9440140, reg_cfp=0x7ff1e9078fd0, calling=, cd=0x7ff1e64e82f0) at vm_insnhelper.c:2539 #13 0x00007ff1e9281d23 in vm_sendish (ec=ec@entry=0x7ff1e9440140, reg_cfp=reg_cfp@entry=0x7ff1e9078fd0, cd=0x7ff1e64e82f0, block_handler=block_handler@entry=0, method_explorer=method_explorer@entry=0x7ff1e9287476 ) at vm_insnhelper.c:4023 #14 0x00007ff1e928dce7 in vm_exec_core (ec=ec@entry=0x7ff1e9440140, initial=initial@entry=0) at insns.def:801 #15 0x00007ff1e929065d in rb_vm_exec (ec=, mjit_enable_p=mjit_enable_p@entry=1) at vm.c:1929 #16 0x00007ff1e9291114 in rb_iseq_eval_main (iseq=iseq@entry=0x7ff1e8f56fe0) at vm.c:2179 #17 0x00007ff1e9181424 in rb_ec_exec_node (ec=ec@entry=0x7ff1e9440140, n=n@entry=0x7ff1e8f56fe0) at eval.c:278 #18 0x00007ff1e9184da2 in ruby_run_node (n=0x7ff1e8f56fe0) at eval.c:336 #19 0x00005597798fe11b in main (argc=, argv=) at ./main.c:50 (gdb) thread 2 [Switching to thread 2 (LWP 24151)] #0 VM_EP_LEP (ep=ep@entry=0x7ff1e63e7988) at vm.c:51 51 return ep; (gdb) bt #0 VM_EP_LEP (ep=ep@entry=0x7ff1e63e7988) at vm.c:51 #1 0x00007ff1e9283799 in VM_CF_LEP (cfp=, cfp=) at vm.c:97 #2 VM_CF_BLOCK_HANDLER (cfp=) at vm.c:97 #3 0x00007ff1e928af18 in check_block_handler (ec=, ec=) at vm.c:1159 #4 0x00007ff1e929237e in vm_yield (kw_splat=0, argv=0x0, argc=0, ec=0x7ff1e6522a60) at vm.c:1179 #5 rb_yield_0 (argc=argc@entry=0, argv=argv@entry=0x0) at vm_eval.c:1227 #6 0x00007ff1e9292521 in loop_i (_=_@entry=0) at vm_eval.c:1330 #7 0x00007ff1e9183143 in rb_vrescue2 (b_proc=0x7ff1e9292514 , data1=0, r_proc=0x7ff1e92817d5 , data2=0, args=args@entry=0x7ff1e63e74e0) at eval.c:990 #8 0x00007ff1e918334e in rb_rescue2 (b_proc=, data1=, r_proc=, data2=) at eval.c:967 #9 0x00007ff1e9285503 in vm_call_cfunc_with_frame (empty_kw_splat=, cd=0x7ff1e6518840, calling=, reg_cfp=0x7ff1e64e78d0, ec=0x7ff1e6522a60) at vm_insnhelper.c:2514 #10 vm_call_cfunc (ec=0x7ff1e6522a60, reg_cfp=0x7ff1e64e78d0, calling=, cd=0x7ff1e6518840) at vm_insnhelper.c:2539 #11 0x00007ff1e9281d23 in vm_sendish (ec=ec@entry=0x7ff1e6522a60, reg_cfp=reg_cfp@entry=0x7ff1e64e78d0, cd=cd@entry=0x7ff1e6518840, block_handler=, method_explorer=method_explorer@entry=0x7ff1e9287476 ) at vm_insnhelper.c:4023 #12 0x00007ff1e928dc52 in vm_exec_core (ec=ec@entry=0x7ff1e6522a60, initial=initial@entry=0) at insns.def:782 #13 0x00007ff1e929065d in rb_vm_exec (ec=ec@entry=0x7ff1e6522a60, mjit_enable_p=mjit_enable_p@entry=1) at vm.c:1929 #14 0x00007ff1e9291189 in invoke_block (ec=ec@entry=0x7ff1e6522a60, iseq=iseq@entry=0x7ff1e8f56bf8, self=self@entry=140676971683160, cref=cref@entry=0x0, type=type@entry=572653569, opt_pc=0, captured=, captured=) at vm.c:1044 #15 0x00007ff1e9292e5b in invoke_iseq_block_from_c (me=0x0, is_lambda=0, cref=0x0, passed_block_handler=0, kw_splat=0, argv=0x7ff1e63e7890, argc=0, self=140676971683160, captured=0x7ff1e64ff030, ec=0x7ff1e6522a60) at vm.c:1116 #16 invoke_block_from_c_proc (me=0x0, is_lambda=0, passed_block_handler=0, kw_splat=0, argv=0x7ff1e63e7890, argc=0, self=140676971683160, proc=0x7ff1e6522a60, ec=0x7ff1e6522a60) at vm.c:1216 #17 vm_invoke_proc (passed_block_handler=0, kw_splat=0, argv=0x7ff1e63e7890, argc=0, self=140676971683160, proc=0x7ff1e6522a60, ec=0x7ff1e6522a60) at vm.c:1238 #18 rb_vm_invoke_proc (ec=0x7ff1e6522a60, proc=proc@entry=0x7ff1e64ff030, argc=0, argv=0x7ff1e63e7890, kw_splat=0, passed_block_handler=passed_block_handler@entry=0) at vm.c:1259 #19 0x00007ff1e92657c5 in thread_do_start (th=th@entry=0x7ff1e64e89c0) at thread.c:697 #20 0x00007ff1e92663fa in thread_start_func_2 (th=, stack_start=) at thread.c:745 #21 0x00007ff1e9266781 in thread_start_func_1 (th_ptr=) at thread_pthread.c:1015 #22 0x00007ff1e93fcaa9 in start (p=0x7ff1e64e7ae8) at src/thread/pthread_create.c:196 #23 0x00007ff1e93fec5a in __clone () at src/thread/x86_64/clone.s:22 Backtrace stopped: frame did not save the PC ``` ### pid 24152 The second process waits on a futex says strace: ``` $ doas strace -p 24152 strace: Process 24152 attached futex(0x7ff1e943ffd8, FUTEX_WAIT_PRIVATE, 2147483650, NULL ``` Backtrace: ``` Attaching to process 24152 Reading symbols from /home/ncopa/aports/main/ruby/src/ruby-2.7.1/ruby... Reading symbols from /home/ncopa/aports/main/ruby/src/ruby-2.7.1/libruby.so.2.7.1... Reading symbols from /lib/ld-musl-x86_64.so.1... Reading symbols from /usr/lib/debug//lib/ld-musl-x86_64.so.1.debug... Reading symbols from /lib/libz.so.1... Reading symbols from /usr/lib/debug//lib/libz.so.1.2.11.debug... Reading symbols from /usr/lib/libgmp.so.10... Reading symbols from /usr/lib/debug//usr/lib/libgmp.so.10.4.0.debug... Reading symbols from /home/ncopa/aports/main/ruby/src/ruby-2.7.1/.ext/x86_64-linux-musl/enc/encdb.so... Reading symbols from /home/ncopa/aports/main/ruby/src/ruby-2.7.1/.ext/x86_64-linux-musl/enc/trans/transdb.so... __futexwait (priv=128, val=-2147483646, addr=0x7ff1e943ffd8 <__malloc_lock>) at ./arch/x86_64/syscall_arch.h:40 40 ./arch/x86_64/syscall_arch.h: No such file or directory. (gdb) info threads Id Target Id Frame * 1 process 24152 "ruby" __futexwait (priv=128, val=-2147483646, addr=0x7ff1e943ffd8 <__malloc_lock>) at ./arch/x86_64/syscall_arch.h:40 (gdb) bt #0 __futexwait (priv=128, val=-2147483646, addr=0x7ff1e943ffd8 <__malloc_lock>) at ./arch/x86_64/syscall_arch.h:40 #1 __lock (l=l@entry=0x7ff1e943ffd8 <__malloc_lock>) at src/thread/__lock.c:44 #2 0x00007ff1e93ce6ee in rdlock () at src/malloc/mallocng/glue.h:63 #3 malloc (n=n@entry=56) at src/malloc/mallocng/malloc.c:337 #4 0x00007ff1e9197a22 in objspace_xmalloc0 (objspace=0x7ff1e907a570, size=56) at gc.c:9860 #5 0x00007ff1e9197ab6 in ruby_xmalloc0 (size=) at vm_core.h:1806 #6 ruby_xmalloc_body (size=) at gc.c:10089 #7 0x00007ff1e6dfc2c0 in ?? () #8 0x00007ff1e8f564f0 in ?? () #9 0x00007ff1e8f56518 in ?? () #10 0x0000000000000d41 in ?? () #11 0x00007ffcdb9cfe40 in ?? () #12 0x00007ff1e927c297 in iv_index_tbl_make (obj=) at variable.c:1138 Backtrace stopped: frame did not save the PC ``` Let me know if I can provide any more info. Thanks! -- https://bugs.ruby-lang.org/ Unsubscribe: