[#101981] [Ruby master Bug#17519] set_visibility fails when a prepended module and a refinement both exist — dbfeldman@...

Issue #17519 has been reported by fledman (David Feldman).

12 messages 2021/01/08

[#102003] [Ruby master Bug#17527] rb_io_wait_readable/writable with scheduler don't check errno — julien@...

Issue #17527 has been reported by ysbaddaden (Julien Portalier).

13 messages 2021/01/11

[#102065] [Ruby master Bug#17536] Segfault in `CFUNC :define_method` — v.ondruch@...

Issue #17536 has been reported by vo.x (Vit Ondruch).

13 messages 2021/01/13

[#102083] [Ruby master Bug#17540] A segfault due to Clang/LLVM optimization on 32-bit ARM Linux — xtkoba+ruby@...

Issue #17540 has been reported by xtkoba (Tee KOBAYASHI).

12 messages 2021/01/14

[#102102] [Ruby master Bug#17543] Ractor isolation broken by `self` in shareable proc — marcandre-ruby-core@...

Issue #17543 has been reported by marcandre (Marc-Andre Lafortune).

14 messages 2021/01/15

[#102118] [Ruby master Feature#17548] Need simple way to include symlink directories in Dir.glob — keithrbennett@...

Issue #17548 has been reported by keithrbennett (Keith Bennett).

8 messages 2021/01/17

[#102158] [Ruby master Bug#17560] Does `Module#ruby2_keywords` return `nil` or `self`? — nobu@...

Issue #17560 has been reported by nobu (Nobuyoshi Nakada).

9 messages 2021/01/19

[#102163] [Ruby master Bug#17561] The timeout option for Addrinfo.getaddrinfo is not reliable on Ruby 2.7.2 — sean@...

Issue #17561 has been reported by smcgivern (Sean McGivern).

8 messages 2021/01/19

[#102249] [Ruby master Bug#17583] Segfault on large stack(RUBY_THREAD_VM_STACK_SIZE) — yoshiokatsuneo@...

Issue #17583 has been reported by yoshiokatsuneo (Tsuneo Yoshioka).

12 messages 2021/01/26

[#102256] [Ruby master Bug#17585] DWAR5 support? — v.ondruch@...

Issue #17585 has been reported by vo.x (Vit Ondruch).

19 messages 2021/01/26

[#102301] [Ruby master Bug#17591] Test frameworks and REPLs do not show deprecation warnings by default — eregontp@...

Issue #17591 has been reported by Eregon (Benoit Daloze).

14 messages 2021/01/29

[#102305] [Ruby master Feature#17592] Ractor should allowing reading shareable class instance variables — marcandre-ruby-core@...

Issue #17592 has been reported by marcandre (Marc-Andre Lafortune).

25 messages 2021/01/29

[ruby-core:102295] [Ruby master Bug#17573] Crashes in profiling tools when signals arrive in non-Ruby threads

From: ko1@...
Date: 2021-01-29 09:20:50 UTC
List: ruby-core #102295
Issue #17573 has been updated by ko1 (Koichi Sasada).

Assignee set to ko1 (Koichi Sasada)

----------------------------------------
Bug #17573: Crashes in profiling tools when signals arrive in non-Ruby threads
https://bugs.ruby-lang.org/issues/17573#change-90150

* Author: jhawthorn (John Hawthorn)
* Status: Open
* Priority: Normal
* Assignee: ko1 (Koichi Sasada)
* ruby -v: ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-darwin19]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN
----------------------------------------
Stackprof (and likely similar tools) works by setting up a timer to sends it a unix signal on an interval. From that signal handler it does a small amount of internal bookkeeping and calls `rb_postponed_job_register_one`.

This is a problem because unix signals arrive on an arbitrary thread, and as of Ruby 3.0 the execution context (which `rb_postponed_job_register_one` relies on) is stored as a thread-local.

This reproduction crashes reliably for me on macos. It doesn't seem to on linux, maybe because the timer thread is different or the kernel has a different "arbitrary" choice. It feels like this is just one of the circumstances this crash could happen.

```ruby
require "stackprof"

StackProf.run(interval: 100) do
  1000.times do
    GC.start
  end
end
```

```
$ ruby crash_stackprof.rb
[BUG] Segmentation fault at 0x0000000000000038
ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-darwin19]

-- Crash Report log information --------------------------------------------
   See Crash Report log file under the one of following:
     * ~/Library/Logs/DiagnosticReports
     * /Library/Logs/DiagnosticReports
   for more details.
Don't forget to include the above Crash Report log file in bug reports.

-- Machine register context ------------------------------------------------
 rax: 0x0000000000000000 rbx: 0x0000000107fbb780 rcx: 0x0000000000000000
 rdx: 0x0000000000000000 rdi: 0x0000000106982c28 rsi: 0x0000000107fbb780
 rbp: 0x000070000eb47a10 rsp: 0x000070000eb479f0  r8: 0x000070000eb47eb0
  r9: 0xd44931e7344c235f r10: 0x00007fff6ef49501 r11: 0x0000000000000202
 r12: 0xd44931e7344c235f r13: 0x00000000ffffffff r14: 0x0000000000000000
 r15: 0x0000000000000000 rip: 0x00000001068c85fd rfl: 0x0000000000010202

-- C level backtrace information -------------------------------------------
/Users/jhawthorn/.rubies/ruby-3.0.0/bin/ruby(rb_vm_bugreport+0x6cf) [0x1068c2d5f]
/Users/jhawthorn/.rubies/ruby-3.0.0/bin/ruby(rb_bug_for_fatal_signal+0x1d6) [0x1066dc556]
/Users/jhawthorn/.rubies/ruby-3.0.0/bin/ruby(sigsegv+0x5b) [0x10681aa0b]
/usr/lib/system/libsystem_platform.dylib(_sigtramp+0x1d) [0x7fff6efff5fd]
/Users/jhawthorn/.rubies/ruby-3.0.0/bin/ruby(rb_postponed_job_register_one+0x1d) [0x1068c85fd]
/usr/lib/system/libsystem_platform.dylib(0x7fff6efff5fd) [0x7fff6efff5fd]
```

`0x38` is the address of `((rb_execution_context_t *)0)->vm`. 

lldb shows that it comes from a second thread which was running `timer_pthread_fn`

```
$ lldb =ruby -- ./crash_stackprof.rb
(lldb) target create "/Users/jhawthorn/.rubies/ruby-3.0.0/bin/ruby"                                                                                                                                                                     ruCurrent executable set to '/Users/jhawthorn/.rubies/ruby-3.0.0/bin/ruby' (x86_64).
(lldb) settings set -- target.run-args  "./crash_stackprof.rb"                                                                                                                                                                          (lldb) run
Process 92893 launched: '/Users/jhawthorn/.rubies/ruby-3.0.0/bin/ruby' (x86_64)                                                                                                                                                         Process 92893 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGALRM
    frame #0: 0x00000001000dfbcd ruby`rgengc_check_relation [inlined] RVALUE_OLD_P_RAW(obj=4303689480) at gc.c:1419:32
   1416 RVALUE_OLD_P_RAW(VALUE obj)
   1417 {
   1418     const VALUE promoted = FL_PROMOTED0 | FL_PROMOTED1;
-> 1419     return (RBASIC(obj)->flags & promoted) == promoted;
   1420 }
   1421
   1422 static inline int                                                                                                                                                                                                                 thread #2, stop reason = EXC_BAD_ACCESS (code=1, address=0x38)
    frame #0: 0x000000010029a5fd ruby`rb_postponed_job_register_one(flags=3492904, func=(stackprof.bundle`stackprof_gc_job_handler at stackprof.c:598), data=0x0000000000000000) at vm_trace.c:1622:19                                     1619 rb_postponed_job_register_one(unsigned int flags, rb_postponed_job_func_t func, void *data)
   1620 {                                                                                                                                                                                                                                  1621     rb_execution_context_t *ec = GET_EC();
-> 1622     rb_vm_t *vm = rb_ec_vm_ptr(ec);
   1623     rb_postponed_job_t *pjob;
   1624     rb_atomic_t i, index;
   1625
Target 0: (ruby) stopped.
(lldb) t 2
* thread #2
    frame #0: 0x000000010029a5fd ruby`rb_postponed_job_register_one(flags=3492904, func=(stackprof.bundle`stackprof_gc_job_handler at stackprof.c:598), data=0x0000000000000000) at vm_trace.c:1622:19
   1619 rb_postponed_job_register_one(unsigned int flags, rb_postponed_job_func_t func, void *data)
   1620 {
   1621     rb_execution_context_t *ec = GET_EC();
-> 1622     rb_vm_t *vm = rb_ec_vm_ptr(ec);
   1623     rb_postponed_job_t *pjob;
   1624     rb_atomic_t i, index;
   1625
(lldb) bt
* thread #2
  * frame #0: 0x000000010029a5fd ruby`rb_postponed_job_register_one(flags=3492904, func=(stackprof.bundle`stackprof_gc_job_handler at stackprof.c:598), data=0x0000000000000000) at vm_trace.c:1622:19
    frame #1: 0x00007fff6efff5fd libsystem_platform.dylib`_sigtramp + 29
    frame #2: 0x00007fff6ef4e3d7 libsystem_kernel.dylib`poll + 11
    frame #3: 0x0000000100238e1e ruby`timer_pthread_fn(p=<unavailable>) at thread_pthread.c:2189:15
    frame #4: 0x00007fff6f00b109 libsystem_pthread.dylib`_pthread_start + 148
    frame #5: 0x00007fff6f006b8b libsystem_pthread.dylib`thread_start + 15
```

Attached is my attempted fix (also available at https://github.com/ruby/ruby/pull/4108) which uses the main-ractor's EC if there is none on the current thread. I *hope* this works (it seems to and fixes the crash) because before Ruby 3.0 there was a global EC, but I'm not entirely sure if this will cause other problems.

If accepted this should be backported to the 3.0 branch.

---Files--------------------------------
use_main_ractor_ec_on_threads_without_ec.patch (3.28 KB)
use_main_ractor_ec_on_threads_without_ec.patch (3.84 KB)


-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

In This Thread