[#117746] [Ruby master Bug#20462] Native threads are no longer reused — "tenderlovemaking (Aaron Patterson) via ruby-core" <ruby-core@...>

Issue #20462 has been reported by tenderlovemaking (Aaron Patterson).

8 messages 2024/05/01

[#117763] [Ruby master Bug#20468] Segfault on safe navigation in for target — "kddnewton (Kevin Newton) via ruby-core" <ruby-core@...>

Issue #20468 has been reported by kddnewton (Kevin Newton).

11 messages 2024/05/03

[#117765] [Ruby master Feature#20470] Extract Ruby's Garbage Collector — "peterzhu2118 (Peter Zhu) via ruby-core" <ruby-core@...>

Issue #20470 has been reported by peterzhu2118 (Peter Zhu).

8 messages 2024/05/03

[#117812] [Ruby master Bug#20478] Circular parameter syntax error rules — "kddnewton (Kevin Newton) via ruby-core" <ruby-core@...>

Issue #20478 has been reported by kddnewton (Kevin Newton).

11 messages 2024/05/08

[#117838] [Ruby master Bug#20485] Simple use of Mutex and Fiber makes GC leak objects with singleton method — "skhrshin (Shintaro Sakahara) via ruby-core" <ruby-core@...>

Issue #20485 has been reported by skhrshin (Shintaro Sakahara).

14 messages 2024/05/12

[#117882] [Ruby master Bug#20490] Process.waitpid2(-1, Process::WNOHANG) misbehaves on Ruby 3.1 & 3.2 with detached process — "stanhu (Stan Hu) via ruby-core" <ruby-core@...>

Issue #20490 has been reported by stanhu (Stan Hu).

7 messages 2024/05/15

[#117905] [Ruby master Bug#20493] Segfault on rb_io_getline_fast — "josegomezr (Jose Gomez) via ruby-core" <ruby-core@...>

Issue #20493 has been reported by josegomezr (Jose Gomez).

14 messages 2024/05/17

[#117918] [Ruby master Bug#20494] Non-default directories are not searched when checking for a gmp header — "lish82 (Hiroki Katagiri) via ruby-core" <ruby-core@...>

Issue #20494 has been reported by lish82 (Hiroki Katagiri).

10 messages 2024/05/19

[#117921] [Ruby master Bug#20495] Running "make clean" deletes critical "coroutine/amd64/Context.S" file and causes "make" to fail — "fallwith (James Bunch) via ruby-core" <ruby-core@...>

Issue #20495 has been reported by fallwith (James Bunch).

7 messages 2024/05/19

[#117929] [Ruby master Feature#20498] Negated method calls — "MaxLap (Maxime Lapointe) via ruby-core" <ruby-core@...>

Issue #20498 has been reported by MaxLap (Maxime Lapointe).

10 messages 2024/05/19

[#117957] [Ruby master Bug#20500] Non-system directories are not searched when checking for jemalloc headers and libs, and building `enc` — "lish82 (Hiroki Katagiri) via ruby-core" <ruby-core@...>

Issue #20500 has been reported by lish82 (Hiroki Katagiri).

12 messages 2024/05/21

[#117968] [Ruby master Bug#20501] ruby SEGV — "akr (Akira Tanaka) via ruby-core" <ruby-core@...>

Issue #20501 has been reported by akr (Akira Tanaka).

15 messages 2024/05/22

[#117992] [Ruby master Bug#20505] Reassigning the block argument in method body keeps old block when calling super with implicit arguments — "Earlopain (A S) via ruby-core" <ruby-core@...>

Issue #20505 has been reported by Earlopain (A S).

7 messages 2024/05/24

[#118003] [Ruby master Bug#20506] Failure compiling Ruby 3.4.0-preview1 on aarch64 on a mac and linux (Ubuntu 24.04) — "schneems (Richard Schneeman) via ruby-core" <ruby-core@...>

Issue #20506 has been reported by schneems (Richard Schneeman).

12 messages 2024/05/24

[#118090] [Ruby master Bug#20513] the feature of kwargs in index methods has been removed without due consideration of utility and compatibility — "bughit (bug hit) via ruby-core" <ruby-core@...>

Issue #20513 has been reported by bughit (bug hit).

16 messages 2024/05/30

[#118110] [Ruby master Bug#20515] --with-gmp is not working - GMP support won't be built — "sorah (Sorah Fukumori) via ruby-core" <ruby-core@...>

Issue #20515 has been reported by sorah (Sorah Fukumori).

8 messages 2024/05/30

[#118128] [Ruby master Bug#20516] The version of rexml in ruby 3.3.2 has not been updated since 3.2.6. — "naitoh (Jun NAITOH) via ruby-core" <ruby-core@...>

Issue #20516 has been reported by naitoh (Jun NAITOH).

13 messages 2024/05/31

[ruby-core:117910] [Ruby master Bug#20493] Segfault on rb_io_getline_fast

From: "kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core" <ruby-core@...>
Date: 2024-05-18 09:37:07 UTC
List: ruby-core #117910
Issue #20493 has been updated by kjtsanaktsidis (KJ Tsanaktsidis).


OK, I believe I've worked out what's wrong here.

The BLOCKING_REGION macro is used to release the GVL, run some code, and re-acquire the GVL. It looks like this:

```
#define BLOCKING_REGION(th, exec, ubf, ubfarg, fail_if_interrupted) do { \
    struct rb_blocking_region_buffer __region; \
    if (blocking_region_begin(th, &__region, (ubf), (ubfarg), fail_if_interrupted) || \
        /* always return true unless fail_if_interrupted */ \
        !only_if_constant(fail_if_interrupted, TRUE)) { \
        exec; \
        blocking_region_end(th, &__region); \
    }; \
} while(0)
```

`blocking_region_begin` calls `RB_VM_SAVE_MACHINE_CONTEXT`, which:

* Updates the "end of machine stack" pointer in `ec->machine` to be the current stack pointer,
* Abuses `setjmp` to save the register state in the `ec->machine` structure as well

The idea being that while the thread is off doing things in C-land, another thread performing GC can mark the portion of the machine stack from before the `BLOCKING_REGION` call.

The problem is that the disassembly for `blocking_region_begin` looks like this:

```
(rr) disassemble
Dump of assembler code for function blocking_region_begin:
   0x000074794fa36900 <+0>:	push   %r15
   0x000074794fa36902 <+2>:	push   %r14
   0x000074794fa36904 <+4>:	mov    %rdx,%r14
   0x000074794fa36907 <+7>:	push   %r13
   0x000074794fa36909 <+9>:	push   %r12
   0x000074794fa3690b <+11>:	mov    %rsi,%r12
   0x000074794fa3690e <+14>:	push   %rbp
   0x000074794fa3690f <+15>:	push   %rbx
```

i.e. it pushes several of the callee-saved registers to the stack, and uses them for its own purposes - including `%r15`, which had the value of `str` from the `rb_io_getline_fast` frame below. When it calls `setjmp` to capture the register state, the value of `%r15` that's saved is the value used in _this_ function, not the old value.

That would be OK, because it's spilled to the stack, and the "end of stack pointer" is also incremented, so it should be reachable by the GC that way. However, `blocking_region_begin` is a function, and it's not inlined, the machine stack pointer is restored at the function end. Then, the memory where `%r15` and hence `str` were spilled to gets re-used for the C code doing the without-the-GVL work.

I.e. the sequence of operations is:

* Call `blocking_region_begin`. Stack pointer is decremented (stack made bigger)
* Callee-saved registers including the value of `str` are spilled to the stack.
* %r15 is overwritten
* Current $sp is set as EC's "end of the stack" value.
* Registers are saved to EC's regs structure
* `blocking_region_begin` returns. Stack pointer is incremented (stack made smaller). The memory containing the value of `str` now lies beyond the end of the stack. The real value of `str` is returned to %r15, because it's callee-saved.
* More work happens. The stack is made bigger again by other functions, and junk is written to that stack slot
* GC marking then can't see the value of `str`

A "quick fix" for this is to inline the saving of machine stack state into the `BLOCKING_REGION` macro. That way, it's not in a separate function call, and so the stack frame containing the spilled registers is not invalidated until after the blocking completes.

Longer term, though, I'd like to rethink how this works a bit. I think `BLOCKING_REGION` being a kind of "macro accepting a block" is a bad idea, because it mixes up the careful stack management that it has to do with the stack state of the caller. I think it should be converted to use a function pointer callback like `rb_thread_call_without_gvl` etc, and we should also consider using hand-written assembly to perform the stack management I think (like what we use in the fiber switching code).

----------------------------------------
Bug #20493: Segfault on rb_io_getline_fast
https://bugs.ruby-lang.org/issues/20493#change-108326

* Author: josegomezr (Jose Gomez)
* Status: Open
* Assignee: kjtsanaktsidis (KJ Tsanaktsidis)
* ruby -v: 3.3.1
* Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
We've spotted a consistent segfault when running bundle install with `--jobs 4`

When running: `bundle install -j 4` we'd get a Segfault at:

```
/usr/lib64/ruby/3.3.0/rubygems/ext/builder.rb:93: [BUG] Segmentation fault at 0x0000000000000014
ruby 3.3.1 (2024-04-23 revision c56cd86388) [x86_64-linux-gnu]
```

Full [log is available here][0].

I could not find a shorter reproducer besides using bundler with `--jobs 4` or
`--jobs 8`.

Here's a sample command to trigger the behavior (it creates the Gemfile and
calls bundler) [1].

We installed all debug symbols and narrowed down the location of the segfault to
`rb_io_getline_fast` in io.c

At [line 4001][3] `str` is `T_NONE`, which makes further usage down the line in
[`io_enc_str`][4] raise a null pointer dereference.

With the notes from [extension.rdoc - Appendix E. RB_GC_GUARD to protect from premature GC][8] I've prepared a patched ruby 3.3.1 package that does not
segfault. It's on [OBS Project home:josegomezr:branches:ruby/ruby3.3][6].

Adding a `RB_GC_GUARD` on `rb_io_getline_fast` @ `io.c:4004` just before the return

```diff
--- ruby3.3.orig/ruby-3.3.1/io.c
+++ ruby3.3/ruby-3.3.1/io.c
@@ -4004,6 +4004,7 @@ rb_io_getline_fast(rb_io_t *fptr, rb_enc
     ENC_CODERANGE_SET(str, cr);
     fptr->lineno++;
 
+    RB_GC_GUARD(str);
     return str;
 }
```

Fixes the segfault in our tests. `bundle` finish the installation and the image is built.

I've set up a project in OBS to provide reproduceables.

- [ruby3.3.1 package][5].
- [ruby3.3.1 base image with enough dependencies to reproduce][7] with [the reproducer script][1].

And the corresponding container is exported in the `containers-patched`
repository.

Here I leave the docker images generated by OBS:

- 3.3.1 [without patches, segfaults.]
```
registry.opensuse.org/home/josegomezr/branches/ruby/containers/containers/base-ruby33:latest
```


- 3.3.1 [with patch, does not fail]
```
registry.opensuse.org/home/josegomezr/branches/ruby/containers/containers-patched/base-ruby33:latest
```


[0]: https://gist.github.com/josegomezr/441c271cc731b0ec57213cb98743a699
[1]: https://gist.github.com/josegomezr/e17129bf2df33f3bea60e84a616a8322
[2]: https://gist.github.com/josegomezr/6f81878c979af334efee59b8f2225e58
[3]: https://github.com/ruby/ruby/blob/v3_3_1/io.c#L4001
[4]: https://github.com/ruby/ruby/blob/v3_3_1/io.c#L4003
[5]: https://build.opensuse.org/package/show/devel:languages:ruby/ruby3.3
[6]: https://build.opensuse.org/package/show/home:josegomezr:branches:ruby/ruby3.3
[7]: https://build.opensuse.org/package/show/home:josegomezr:branches:ruby:containers/base-ruby33
[8]: https://github.com/ruby/ruby/blob/master/doc/extension.rdoc#label-Appendix+E.+RB_GC_GUARD+to+protect+from+premature+GC




-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

In This Thread