From: s.wanabe@... Date: 2018-03-18T13:42:34+00:00 Subject: [ruby-core:86174] [Ruby trunk Bug#14561] Consistent 2.5.0 seg fault in GC, related to accessing an enumerator in a thread Issue #14561 has been updated by wanabe (_ wanabe). It seems to be a `FIBER_USE_NATIVE == 0` environment issue. Perhaps it may be a potential `Enumerator`'s behaviour issue. First, `Fiber.new` and `Fiber#resume` must be in same thread, but `Enumerator.new` and `Enumerator#peek` don't have to be. Because `Fiber.new` calls `fiber_t_alloc()` immediately, but `Enumerator.new` doesn't. He is lazy :) So `Enumerator` can take out machine stack value of killed-thread. I think `Thread.new { enum.peek }` should raise FiberError. But it is big incompatibility and not realistic. There is no "marking dead fiber's stack" problem on `FIBER_USE_NATIVE != 0` environment. Because `fiber_setcontext()` set `oldfib->cont.saved_ec.machine.stack_end = NULL;` and skip machine stack mark when `ec->machine.stack_end == NULL` in `rb_execution_context_mark()`. There are some ways: 1. `fiber_mark` checks not only `fib->status` but also `fib->cont->saved_ec.thread_ptr->status` on `FIBER_USE_NATIVE == 0` environment. 2. `thread_cleanup_func()` makes all fibers `FIBER_TERMINATED`. 3. etc. ---------------------------------------- Bug #14561: Consistent 2.5.0 seg fault in GC, related to accessing an enumerator in a thread https://bugs.ruby-lang.org/issues/14561#change-71058 * Author: dazuma (Daniel Azuma) * Status: Open * Priority: Normal * Assignee: * Target version: * ruby -v: ruby 2.5.0p0 (2017-12-25 revision 61468) [x86_64-darwin17] * Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN ---------------------------------------- This seg fault happens consistently on OSX (specifically I'm reproing it on a late 2015 Macbook pro running 10.13.3, but it seems to happen on similar machines as well). It happens only on Ruby 2.5.0. Small repro case: ```ruby enum = Enumerator.new { |y| y << 1 } thread = Thread.new { enum.peek } # enum.next also causes the segfault, but not enum.size thread.join GC.start # <- seg fault here ``` The C-level backtrace identifies this as within the mark phase of GC: ``` -- C level backtrace information ------------------------------------------- 0 ruby 0x000000010f77ced7 rb_vm_bugreport + 135 1 ruby 0x000000010f602628 rb_bug_context + 472 2 ruby 0x000000010f6f1491 sigsegv + 81 3 libsystem_platform.dylib 0x00007fff6a779f5a _sigtramp + 26 4 ruby 0x000000010f61bb93 rb_gc_mark_machine_stack + 99 5 ruby 0x000000010f76bf39 rb_execution_context_mark + 137 6 ruby 0x000000010f5ea32b cont_mark + 27 7 ruby 0x000000010f626a02 gc_marks_rest + 146 8 ruby 0x000000010f6253c0 gc_start + 2816 9 ruby 0x000000010f61d628 garbage_collect + 184 10 ruby 0x000000010f622215 gc_start_internal + 485 11 ruby 0x000000010f7703be vm_call_cfunc + 286 12 ruby 0x000000010f759af4 vm_exec_core + 12260 13 ruby 0x000000010f76ac8e vm_exec + 142 14 ruby 0x000000010f60c101 ruby_exec_internal + 177 15 ruby 0x000000010f60bff8 ruby_run_node + 56 16 ruby 0x000000010f592d1f main + 79 I also ran this against Ruby recompiled with -O0, and got a more detailed backtrace: -- C level backtrace information ------------------------------------------- 0 libruby.2.5.dylib 0x000000010c416e19 rb_print_backtrace + 25 1 libruby.2.5.dylib 0x000000010c416f28 rb_vm_bugreport + 136 2 libruby.2.5.dylib 0x000000010c2096f2 rb_bug_context + 450 3 libruby.2.5.dylib 0x000000010c35b4ee sigsegv + 94 4 libsystem_platform.dylib 0x00007fff6a779f5a _sigtramp + 26 5 libruby.2.5.dylib 0x000000010c2395a1 mark_locations_array + 49 6 libruby.2.5.dylib 0x000000010c22a5bb gc_mark_locations + 75 7 libruby.2.5.dylib 0x000000010c22a7d9 mark_stack_locations + 41 8 libruby.2.5.dylib 0x000000010c22a79f rb_gc_mark_machine_stack + 79 9 libruby.2.5.dylib 0x000000010c3f8868 rb_execution_context_mark + 264 10 libruby.2.5.dylib 0x000000010c1e263e cont_mark + 46 11 libruby.2.5.dylib 0x000000010c1e2572 fiber_mark + 146 12 libruby.2.5.dylib 0x000000010c22f4c6 gc_mark_children + 1094 13 libruby.2.5.dylib 0x000000010c23734c gc_mark_stacked_objects + 108 14 libruby.2.5.dylib 0x000000010c237a5b gc_mark_stacked_objects_all + 27 15 libruby.2.5.dylib 0x000000010c236cb1 gc_marks_rest + 129 16 libruby.2.5.dylib 0x000000010c238787 gc_marks + 103 17 libruby.2.5.dylib 0x000000010c2352e2 gc_start + 802 18 libruby.2.5.dylib 0x000000010c22ca18 garbage_collect + 56 19 libruby.2.5.dylib 0x000000010c231f7d gc_start_internal + 493 20 libruby.2.5.dylib 0x000000010c401f2a call_cfunc_m1 + 42 21 libruby.2.5.dylib 0x000000010c400d1d vm_call_cfunc_with_frame + 605 22 libruby.2.5.dylib 0x000000010c3fc41d vm_call_cfunc + 173 23 libruby.2.5.dylib 0x000000010c3fb8fe vm_call_method_each_type + 190 24 libruby.2.5.dylib 0x000000010c3fb690 vm_call_method + 160 25 libruby.2.5.dylib 0x000000010c3fb5e5 vm_call_general + 53 26 libruby.2.5.dylib 0x000000010c3e784e vm_exec_core + 8974 27 libruby.2.5.dylib 0x000000010c3f6fe6 vm_exec + 182 28 libruby.2.5.dylib 0x000000010c3f7d5b rb_iseq_eval_main + 43 29 libruby.2.5.dylib 0x000000010c214208 ruby_exec_internal + 232 30 libruby.2.5.dylib 0x000000010c214111 ruby_exec_node + 33 31 libruby.2.5.dylib 0x000000010c2140d0 ruby_run_node + 64 32 ruby 0x000000010c16ff2f main + 95 ``` As far as I can tell, the C instruction triggering the segfault is here in gc.c (around line 4064): ```C static void mark_locations_array(rb_objspace_t *objspace, register const VALUE *x, register long n) { VALUE v; while (n--) { v = *x; // <----- Seems to be crashing here? gc_mark_maybe(objspace, v); x++; } } ``` Indicating a bad pointer in the machine stack. I'm not sufficiently familiar with the VM internals to make much further progress, but I hope the repro case is helpful. It seems to require accessing an `Enumerator` element within a separate thread, and then waiting for the thread to end. ---Files-------------------------------- ruby_2018-03-14-222035_Fukurou.crash (38.6 KB) ruby_2018-03-14-205753_Fukurou.crash (38.6 KB) dump.txt (51.4 KB) -- https://bugs.ruby-lang.org/ Unsubscribe: