[#102687] [Ruby master Bug#17666] Sleep in a thread hangs when Fiber.set_scheduler is set — arjundas.27586@...

Issue #17666 has been reported by arjunmdas (arjun das).

16 messages 2021/03/02

[#102776] [Ruby master Bug#17678] Ractors do not restart after fork — knuckles@...

Issue #17678 has been reported by ivoanjo (Ivo Anjo).

8 messages 2021/03/08

[#102797] [Ruby master Feature#17684] Remove `--disable-gems` from release version of Ruby — hsbt@...

Issue #17684 has been reported by hsbt (Hiroshi SHIBATA).

17 messages 2021/03/10

[#102829] [Ruby master Bug#17718] a method paramaters object that can be pattern matched against — dsisnero@...

Issue #17718 has been reported by dsisnero (Dominic Sisneros).

9 messages 2021/03/11

[#102832] [Ruby master Misc#17720] Cirrus CI to check non-x86_64 architecture cases by own machines — jaruga@...

Issue #17720 has been reported by jaruga (Jun Aruga).

19 messages 2021/03/12

[#102850] [Ruby master Bug#17723] autoconf 2.70+ is not working with master branch — hsbt@...

Issue #17723 has been reported by hsbt (Hiroshi SHIBATA).

11 messages 2021/03/14

[#102884] [Ruby master Bug#17725] Prepend Breaks Ability to Alias — josh@...

Issue #17725 has been reported by joshuadreed (Josh Reed).

14 messages 2021/03/16

[#102914] [Ruby master Bug#17728] [BUG] Segmentation fault at 0x0000000000000000 — denthebat@...

Issue #17728 has been reported by meliborn (Denis Denis).

13 messages 2021/03/18

[#102919] [Ruby master Bug#17730] Ruby on macOS transitively links to ~150 dylibs — rickmark@...

Issue #17730 has been reported by rickmark (Rick Mark).

10 messages 2021/03/18

[#103013] [Ruby master Bug#17748] Ruby 3.0 takes a long time to resolv DNS of nonexistent domains — xdmx@...

Issue #17748 has been reported by xdmx (Eric Bloom).

8 messages 2021/03/25

[#103026] [Ruby master Feature#17749] Const source location without name — tenderlove@...

Issue #17749 has been reported by tenderlovemaking (Aaron Patterson).

10 messages 2021/03/25

[#103036] [Ruby master Misc#17751] Do these instructions (<<, +, [0..n]) modify the original string without creating copies? — cart4for1@...

Issue #17751 has been reported by stiuna (Juan Gregorio).

11 messages 2021/03/26

[#103040] [Ruby master Feature#17752] Enable -Wundef for C extensions in repository — eregontp@...

Issue #17752 has been reported by Eregon (Benoit Daloze).

23 messages 2021/03/26

[#103044] [Ruby master Feature#17753] Add Module#outer_scope — tenderlove@...

Issue #17753 has been reported by tenderlovemaking (Aaron Patterson).

31 messages 2021/03/26

[#103088] [Ruby master Feature#17760] Where we should install a header file when `gem install --user`? — muraken@...

Issue #17760 has been reported by mrkn (Kenta Murata).

11 messages 2021/03/30

[#103102] [Ruby master Feature#17762] A simple way to trace object allocation — mame@...

Issue #17762 has been reported by mame (Yusuke Endoh).

18 messages 2021/03/30

[#103105] [Ruby master Feature#17763] Implement cache for cvars — eileencodes@...

Issue #17763 has been reported by eileencodes (Eileen Uchitelle).

18 messages 2021/03/30

[ruby-core:102729] [Ruby master Bug#17573] Crashes in profiling tools when signals arrive in non-Ruby threads

From: jean.boussier@...
Date: 2021-03-03 15:57:48 UTC
List: ruby-core #102729
Issue #17573 has been updated by byroot (Jean Boussier).


> I don't know wether it's the same issue or not.

So I tested this patch on top of the current `ruby_3_0` branch, and it does fix the stackprof issue I had.

----------------------------------------
Bug #17573: Crashes in profiling tools when signals arrive in non-Ruby threads
https://bugs.ruby-lang.org/issues/17573#change-90727

* Author: jhawthorn (John Hawthorn)
* Status: Open
* Priority: Normal
* Assignee: ko1 (Koichi Sasada)
* ruby -v: ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-darwin19]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN
----------------------------------------
Stackprof (and likely similar tools) works by setting up a timer to sends it a unix signal on an interval. From that signal handler it does a small amount of internal bookkeeping and calls `rb_postponed_job_register_one`.

This is a problem because unix signals arrive on an arbitrary thread, and as of Ruby 3.0 the execution context (which `rb_postponed_job_register_one` relies on) is stored as a thread-local.

This reproduction crashes reliably for me on macos. It doesn't seem to on linux, maybe because the timer thread is different or the kernel has a different "arbitrary" choice. It feels like this is just one of the circumstances this crash could happen.

```ruby
require "stackprof"

StackProf.run(interval: 100) do
  1000.times do
    GC.start
  end
end
```

```
$ ruby crash_stackprof.rb
[BUG] Segmentation fault at 0x0000000000000038
ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-darwin19]

-- Crash Report log information --------------------------------------------
   See Crash Report log file under the one of following:
     * ~/Library/Logs/DiagnosticReports
     * /Library/Logs/DiagnosticReports
   for more details.
Don't forget to include the above Crash Report log file in bug reports.

-- Machine register context ------------------------------------------------
 rax: 0x0000000000000000 rbx: 0x0000000107fbb780 rcx: 0x0000000000000000
 rdx: 0x0000000000000000 rdi: 0x0000000106982c28 rsi: 0x0000000107fbb780
 rbp: 0x000070000eb47a10 rsp: 0x000070000eb479f0  r8: 0x000070000eb47eb0
  r9: 0xd44931e7344c235f r10: 0x00007fff6ef49501 r11: 0x0000000000000202
 r12: 0xd44931e7344c235f r13: 0x00000000ffffffff r14: 0x0000000000000000
 r15: 0x0000000000000000 rip: 0x00000001068c85fd rfl: 0x0000000000010202

-- C level backtrace information -------------------------------------------
/Users/jhawthorn/.rubies/ruby-3.0.0/bin/ruby(rb_vm_bugreport+0x6cf) [0x1068c2d5f]
/Users/jhawthorn/.rubies/ruby-3.0.0/bin/ruby(rb_bug_for_fatal_signal+0x1d6) [0x1066dc556]
/Users/jhawthorn/.rubies/ruby-3.0.0/bin/ruby(sigsegv+0x5b) [0x10681aa0b]
/usr/lib/system/libsystem_platform.dylib(_sigtramp+0x1d) [0x7fff6efff5fd]
/Users/jhawthorn/.rubies/ruby-3.0.0/bin/ruby(rb_postponed_job_register_one+0x1d) [0x1068c85fd]
/usr/lib/system/libsystem_platform.dylib(0x7fff6efff5fd) [0x7fff6efff5fd]
```

`0x38` is the address of `((rb_execution_context_t *)0)->vm`. 

lldb shows that it comes from a second thread which was running `timer_pthread_fn`

```
$ lldb =ruby -- ./crash_stackprof.rb
(lldb) target create "/Users/jhawthorn/.rubies/ruby-3.0.0/bin/ruby"                                                                                                                                                                     ruCurrent executable set to '/Users/jhawthorn/.rubies/ruby-3.0.0/bin/ruby' (x86_64).
(lldb) settings set -- target.run-args  "./crash_stackprof.rb"                                                                                                                                                                          (lldb) run
Process 92893 launched: '/Users/jhawthorn/.rubies/ruby-3.0.0/bin/ruby' (x86_64)                                                                                                                                                         Process 92893 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGALRM
    frame #0: 0x00000001000dfbcd ruby`rgengc_check_relation [inlined] RVALUE_OLD_P_RAW(obj=4303689480) at gc.c:1419:32
   1416 RVALUE_OLD_P_RAW(VALUE obj)
   1417 {
   1418     const VALUE promoted = FL_PROMOTED0 | FL_PROMOTED1;
-> 1419     return (RBASIC(obj)->flags & promoted) == promoted;
   1420 }
   1421
   1422 static inline int                                                                                                                                                                                                                 thread #2, stop reason = EXC_BAD_ACCESS (code=1, address=0x38)
    frame #0: 0x000000010029a5fd ruby`rb_postponed_job_register_one(flags=3492904, func=(stackprof.bundle`stackprof_gc_job_handler at stackprof.c:598), data=0x0000000000000000) at vm_trace.c:1622:19                                     1619 rb_postponed_job_register_one(unsigned int flags, rb_postponed_job_func_t func, void *data)
   1620 {                                                                                                                                                                                                                                  1621     rb_execution_context_t *ec = GET_EC();
-> 1622     rb_vm_t *vm = rb_ec_vm_ptr(ec);
   1623     rb_postponed_job_t *pjob;
   1624     rb_atomic_t i, index;
   1625
Target 0: (ruby) stopped.
(lldb) t 2
* thread #2
    frame #0: 0x000000010029a5fd ruby`rb_postponed_job_register_one(flags=3492904, func=(stackprof.bundle`stackprof_gc_job_handler at stackprof.c:598), data=0x0000000000000000) at vm_trace.c:1622:19
   1619 rb_postponed_job_register_one(unsigned int flags, rb_postponed_job_func_t func, void *data)
   1620 {
   1621     rb_execution_context_t *ec = GET_EC();
-> 1622     rb_vm_t *vm = rb_ec_vm_ptr(ec);
   1623     rb_postponed_job_t *pjob;
   1624     rb_atomic_t i, index;
   1625
(lldb) bt
* thread #2
  * frame #0: 0x000000010029a5fd ruby`rb_postponed_job_register_one(flags=3492904, func=(stackprof.bundle`stackprof_gc_job_handler at stackprof.c:598), data=0x0000000000000000) at vm_trace.c:1622:19
    frame #1: 0x00007fff6efff5fd libsystem_platform.dylib`_sigtramp + 29
    frame #2: 0x00007fff6ef4e3d7 libsystem_kernel.dylib`poll + 11
    frame #3: 0x0000000100238e1e ruby`timer_pthread_fn(p=<unavailable>) at thread_pthread.c:2189:15
    frame #4: 0x00007fff6f00b109 libsystem_pthread.dylib`_pthread_start + 148
    frame #5: 0x00007fff6f006b8b libsystem_pthread.dylib`thread_start + 15
```

Attached is my attempted fix (also available at https://github.com/ruby/ruby/pull/4108) which uses the main-ractor's EC if there is none on the current thread. I *hope* this works (it seems to and fixes the crash) because before Ruby 3.0 there was a global EC, but I'm not entirely sure if this will cause other problems.

If accepted this should be backported to the 3.0 branch.

---Files--------------------------------
use_main_ractor_ec_on_threads_without_ec.patch (3.28 KB)
use_main_ractor_ec_on_threads_without_ec.patch (3.84 KB)


-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

In This Thread