[#123172] [Ruby Bug#21560] RUBY_MN_THREADS=1 causes large performance regression in Puma 7 — "schneems (Richard Schneeman) via ruby-core" <ruby-core@...>

Issue #21560 has been reported by schneems (Richard Schneeman).

13 messages 2025/09/03

[#123197] [Ruby Misc#21566] Transfer Shopify/yjit-bench and speed.yjit.org to ruby/ruby-bench and *.ruby-lang.org — "k0kubun (Takashi Kokubun) via ruby-core" <ruby-core@...>

Issue #21566 has been reported by k0kubun (Takashi Kokubun).

7 messages 2025/09/08

[#123207] [Ruby Bug#21568] Requiring core libraries when already requiring mutliple user defined libraries with the same name can error — "alexalexgriffith (Alex Griffith) via ruby-core" <ruby-core@...>

Issue #21568 has been reported by alexalexgriffith (Alex Griffith).

9 messages 2025/09/10

[#123209] [Ruby Bug#21569] [armv7, musl] SIGBUS in ibf_load_object_float due to unaligned VFP double load when reading IBF — "amacxz (Aleksey Maximov) via ruby-core" <ruby-core@...>

SXNzdWUgIzIxNTY5IGhhcyBiZWVuIHJlcG9ydGVkIGJ5IGFtYWN4eiAoQWxla3NleSBNYXhpbW92

8 messages 2025/09/10

[#123257] [Ruby Misc#21606] DevMeeting-2025-10-23 — "mame (Yusuke Endoh) via ruby-core" <ruby-core@...>

Issue #21606 has been reported by mame (Yusuke Endoh).

9 messages 2025/09/16

[#123261] [Ruby Bug#21607] require 'concurrent-ruby' causes segfault with Ruby 3.4.6 on linux/i686 — "satadru (Satadru Pramanik) via ruby-core" <ruby-core@...>

Issue #21607 has been reported by satadru (Satadru Pramanik).

17 messages 2025/09/16

[#123279] [Ruby Misc#21609] Propose Stan Lo (@st0012) as a core committer — "tekknolagi (Maxwell Bernstein) via ruby-core" <ruby-core@...>

SXNzdWUgIzIxNjA5IGhhcyBiZWVuIHJlcG9ydGVkIGJ5IHRla2tub2xhZ2kgKE1heHdlbGwgQmVy

12 messages 2025/09/17

[#123288] [Ruby Bug#21610] Use ec->interrupt_mask to prevent interrupts. — "ioquatix (Samuel Williams) via ruby-core" <ruby-core@...>

SXNzdWUgIzIxNjEwIGhhcyBiZWVuIHJlcG9ydGVkIGJ5IGlvcXVhdGl4IChTYW11ZWwgV2lsbGlh

7 messages 2025/09/18

[#123319] [Ruby Feature#21615] Introduce `Array#values` — "matheusrich (Matheus Richard) via ruby-core" <ruby-core@...>

Issue #21615 has been reported by matheusrich (Matheus Richard).

9 messages 2025/09/23

[#123350] [Ruby Bug#21618] Allow to use the build-in prism version to parse code — "Earlopain (Earlopain _) via ruby-core" <ruby-core@...>

Issue #21618 has been reported by Earlopain (Earlopain _).

15 messages 2025/09/30

[ruby-core:123307] [Ruby Bug#21612] Make sure we never context switch while holding the VM lock

From: "ko1 (Koichi Sasada) via ruby-core" <ruby-core@...>
Date: 2025-09-19 18:39:23 UTC
List: ruby-core #123307
Issue #21612 has been updated by ko1 (Koichi Sasada).


> We concluded that there was context switching going on while a thread held the VM lock. During the investigation into the issue, we added assertions in the code that we never yield to another thread with the VM lock held.

I agree to check it. Context switches are invoked by `CHECK_INTS` macro. Where the macro is placed in the VM locking?

----------------------------------------
Bug #21612: Make sure we never context switch while holding the VM lock
https://bugs.ruby-lang.org/issues/21612#change-114672

* Author: luke-gru (Luke Gruber)
* Status: Open
* Target version: 3.5
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
## The Problem

We're seeing errors in our application that uses ractors. The errors look like:

```
[BUG] unexpected situation - recordd:1 current:0
error.c:1097 rb_bug_without_die_internal
vm_sync.c:275 disallow_reentry
eval_intern.h:136 rb_ec_vm_lock_rec_check
eval_intern.h:147 rb_ec_tag_state
vm.c:2619 rb_vm_exec
vm.c:1702 rb_yield
eval.c:1173 rb_ensure
```

We concluded that there was context switching going on while a thread held the VM lock. During the investigation into the issue, we added
assertions in the code that we never yield to another thread with the VM lock held. We enabled these VM lock assertions even in single ractor mode. These assertions were failing in a few places, but most notably in finalizers. Finalizers are running with the VM lock held, and they were context switching and causing this issue.

## Why Is This Bad?

There are a few reasons we shouldn't be able to context switch while holding the VM lock.

In single-ractor mode with threads A and B:

1) Anything in this critical section should be thought of as a transaction related to the memory that's changed inside. if A has the lock, manipulates some global memory and yields to B with the lock still taken and without finishing the memory updates and then B takes it and starts writing to the same memory, the state of this global memory could be corrupted.

Currently we don't actually take the VM lock in single-ractor mode, but that doesn't mean these issues can't happen. Yielding to another thread in the middle of manipulating global memory *can* still happen and it causes similar issues.

In multi-ractor mode with ractors A and B:

1) We get the same issues as in single-ractor mode.

2) We can also get deadlocks if A has the lock, yields to B and B is blocked waiting on the lock.

Unfortunately, many things can cause context switching in Ruby, so what is safe to call when the VM lock is taken?

## Guidelines

I've come up with some guidelines. With the VM lock held,

You should be able to:

* Create ruby objects, call `ruby_xmalloc`, etc.

* Jump using `EC_JUMP_TAG`. The lock will automatically be unlocked depending on how far up the call stack you locked it and where you're jumping to.

* Check ruby interrupts. Since jumping can pop ruby frames and popping frames checks interrupts, you are allowed. It should never context switch with the VM lock held, even if the ruby thread's quantum is up.

You shouldn't be able to:

* Call any ruby method or enter Ruby's VM loop. For example, `rb_funcall` is not allowed, nor is `rb_warn` (it can call ruby code). `rb_sprintf` is not allowed because it can call `rb_inspect`.

* Call `rb_nogvl`

* Enter any blocking operation managed by Ruby.

* Call a ruby-level mechanism that can context switch, like `rb_mutex_lock`.

## The Fix

Of course, unlocking during finalizers is the main fix but there are other places that also need unlocking. I think adding assertions that the VM lock is not held will be important in finding these bugs and not creating regressions in the future. We don't have to add lots of these, just in a few places. These assertions, which only run in debug mode, should also run when in single-ractor mode.

## Future Work

I think some documentation would be helpful for what is and isn't allowed while holding the VM lock and other locks in the cruby source. I am currently working on a `Concurrency Guide` for cruby developers that includes this info. It will not go over every lock, just the VM lock and the "all other locks" category.



-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/


In This Thread