From: jbrown@... Date: 2021-06-07T21:54:59+00:00 Subject: [ruby-core:104200] [Ruby master Bug#17941] POSIX timers are not fork-safe on 2.7.x Issue #17941 has been updated by jbrownEP (James Brown). Here is a small C program which aborts on Ruby 2.7 but works on Ruby 2.5: ``` #include #include #include #include #include #include int main(int argc, char **argv) { ruby_init(); ruby_init_loadpath(); // send a no-op signal in order to trigger ruby signal handling in the parent kill(getpid(), SIGUSR1); int pid2 = fork(); if (pid2 == 0) { int state = 0; rb_eval_string_protect("puts 'Hello, world!'", &state); if (state) { fprintf(stderr, "error calling require_file\n"); return ruby_cleanup(1); } else { return ruby_cleanup(0); } } else { int status; waitpid(pid2, &status, 0); printf("child is done with exit status %d\n", status); return 0; } } ``` ---------------------------------------- Bug #17941: POSIX timers are not fork-safe on 2.7.x https://bugs.ruby-lang.org/issues/17941#change-92378 * Author: jbrownEP (James Brown) * Status: Open * Priority: Normal * Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN ---------------------------------------- I have an application which runs under a pre-fork-model web server (uwsgi). When a worker exits, SIGCHLD will be delivered to the parent process, which is handled by `sighandler` and `rb_thread_wakeup_timer_thread`. This calls `ubf_timer_arm` in the parent process, which creates a POSIX timer (by calling `timer_create`). After that, uWSGI will fork off a new child, but by this point, the `timer_posix` global will have been populated by a `timerid` owned by the parent process. At some point, that new child will call `ubf_timer_disarm`, which will then call `timer_settime` with the timer_id that was generated in the parent thread. On Linux, it is fatal to share timers between parent/child processes, so the `timer_settime` invocation will fail with EINVAL and ruby will abort with a stack trace rather like the following: ``` #0 0x00007fe0c60784f5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x00007fe0c6079cd5 in abort () at abort.c:92 #2 0x00007fe0c6644363 in die (fmt=0x7fe0c6817285 "%s: %s (%s)") at error.c:632 #3 rb_bug (fmt=0x7fe0c6817285 "%s: %s (%s)") at error.c:647 #4 0x00007fe0c66b39f9 in rb_bug_errno (mesg=0x7fe0c6842dde "timer_settime (disarm)", errno_arg=22) at error.c:676 #5 0x00007fe0c67b6ee0 in ubf_timer_disarm () at thread_pthread.c:1721 #6 0x00007fe0c67b94da in rb_sigwait_fd_get (th=) at thread_pthread.c:1919 #7 0x00007fe0c67bd74d in rb_threadptr_execute_interrupts (th=, blocking_timing=0) at thread.c:2215 #8 0x00007fe0c67bda2e in rb_vm_check_ints (th=0x1c29480, region=0x7ffe4b15da40, ubf=0x7fe0c67ba780 , arg=0x1c29480, fail_if_interrupted=0) at vm_core.h:1868 #9 unblock_function_set (th=0x1c29480, region=0x7ffe4b15da40, ubf=0x7fe0c67ba780 , arg=0x1c29480, fail_if_interrupted=0) at thread.c:461 #10 blocking_region_begin (th=0x1c29480, region=0x7ffe4b15da40, ubf=0x7fe0c67ba780 , arg=0x1c29480, fail_if_interrupted=0) at thread.c:1416 #11 0x00007fe0c67c1652 in rb_nogvl (func=0x7fe0c66bc140 , data1=0x7ffe4b15db20, ubf=, data2=, flags=) at thread.c:1470 #12 0x00007fe0c66c08fa in stat_without_gvl (basedir=8, path=, origenc=0x0, mode=RB_REALPATH_CHECK) at file.c:1120 #13 rb_check_realpath_internal (basedir=8, path=, origenc=0x0, mode=RB_REALPATH_CHECK) at file.c:4401 #14 0x00007fe0c66f913f in rb_construct_expanded_load_path (type=EXPAND_ALL, has_relative=0x7ffe4b15dc0c, has_non_cache=0x7ffe4b15dc08) at load.c:82 #15 0x00007fe0c66f9351 in rb_get_expanded_load_path () at load.c:100 #16 0x00007fe0c66f9ada in rb_feature_p (feature=0x26d0e30 "/home/vagrant/pg/easy_post/uwsgi_tasks.rb", ext=0x26d0e56 ".rb", rb=1, expanded=0, fn=0x7ffe4b15dd00) at load.c:466 #17 0x00007fe0c66f7a86 in search_required (fname=36037560, path=0x7ffe4b15de80, rb_feature_p=0x7fe0c66f93f0 ) at load.c:847 #18 0x00007fe0c66f85d4 in require_internal (ec=0x1c4ea00, fname=36037560, exception=1) at load.c:1002 #19 0x00007fe0c66f8d2b in rb_require_string (fname=36038000) at load.c:1104 #20 0x00007fe0c67ee565 in vm_call0_cfunc_with_frame (ec=0x1c4ea00, calling=, cd=, argv=0x7ffe4b15e140) at vm_eval.c:91 #21 vm_call0_cfunc (ec=0x1c4ea00, calling=, cd=, argv=0x7ffe4b15e140) at vm_eval.c:105 #22 vm_call0_body (ec=0x1c4ea00, calling=, cd=, argv=0x7ffe4b15e140) at vm_eval.c:140 #23 0x00007fe0c67ee9f3 in rb_vm_call0 (ec=, recv=, id=, argc=, argv=, me=0x225e430, kw_splat=0) at vm_eval.c:52 #24 0x00007fe0c67eeda6 in rb_vm_call_kw (ec=0x1c4ea00, recv=30364240, id=21185, argc=1, argv=0x7ffe4b15e140, me=, kw_splat=0) at vm_eval.c:268 #25 0x00007fe0c67f133d in rb_call0 (recv=30364240, mid=21185, argc=1, argv=0x7ffe4b15e140) at vm_eval.c:392 #26 rb_call (recv=30364240, mid=21185, argc=1, argv=0x7ffe4b15e140) at vm_eval.c:718 #27 rb_funcallv (recv=30364240, mid=21185, argc=1, argv=0x7ffe4b15e140) at vm_eval.c:958 #28 0x00000000004968a4 in uwsgi_require_file () #29 0x00007fe0c66b9355 in rb_protect (proc=0x496860 , data=36038000, pstate=0x7ffe4b15e2bc) at eval.c:1087 #30 0x00000000004984f0 in uwsgi_rack_init_apps () #31 0x000000000046e8d2 in uwsgi_init_all_apps () #32 0x00000000004747d8 in uwsgi_worker_run () #33 0x0000000000474cee in uwsgi_run () #34 0x00007fe0c6064d20 in __libc_start_main (main=0x474d00
, argc=3, ubp_av=0x7ffe4b15e468, init=, fini=, rtld_fini=, stack_end=0x7ffe4b15e458) at libc-start.c:226 #35 0x00000000004213b9 in _start () ``` There are a few things that seem likely as fixes for this: 1. In `ubf_timer_disarm`, compare `timer_posix.owner` with `getpid()`, as is done in `ubf_timer_arm`. If the timer is owned by a different process, it's probably a no-op to disarm it and should be reset back to NULL 1. Install an atfork function `ubf_timer_atfork` that resets `timer_posix` to NULL -- https://bugs.ruby-lang.org/ Unsubscribe: