[ruby-core:95847] [Ruby master Bug#16288] Segmentation fault with finalizers, threads
From:
davidnwelton@...
Date:
2019-11-14 03:45:09 UTC
List:
ruby-core #95847
Issue #16288 has been updated by davidw (David Welton).
This code pretty reliably produces a segmentation fault on my machine:
```
require 'timeout'
Thread.DEBUG = 1
class Foo
def initialize
ObjectSpace.define_finalizer(self, proc do
Foo.foo_finalizer
end)
end
def bar
puts 'foo'
end
def Foo.foo_finalizer
STDERR.puts "finalizing a Foo"
Thread.new do
sleep 5
STDERR.puts "finalizing foo thread done"
end
end
end
class Bar
def initialize
ObjectSpace.define_finalizer(self, proc do
Bar.bar_finalizer
end)
end
def foo
puts 'foo'
end
def Bar.bar_finalizer
Timeout::timeout(2) do
100.times do
f = Foo.new
f.bar
end
end
end
end
b = Bar.new
b.foo
```
Written like that, it looks weird, but the fact that it's appearing for a lot of people is because there is some combination of finalizers and threads being used by, I think, the mongo, mongoid and concurrent-ruby gems, which is why a number of people have reported problems in the github issue I linked in the initial report. And I would guess that a larger number are simply seeing the segmentation fault and not knowing where to file a report.
Something that I *think* is important here is the Timeout. That runs a `join` against the main thread, and I *think* (I am not familiar with this code!) that is tickling `thread_join_sleep`, which does `th->status = THREAD_STOPPED_FOREVER;`, so that then in `thread_s_new`, it doesn't raise the "can't alloc thread" exception, because it's in a `SLEEP_FOREVER` state.
Does that make sense?
----------------------------------------
Bug #16288: Segmentation fault with finalizers, threads
https://bugs.ruby-lang.org/issues/16288#change-82680
* Author: davidw (David Welton)
* Status: Open
* Priority: Normal
* Assignee:
* Target version:
* ruby -v: ruby 2.6.6p116 (2019-10-02 revision 67825) [x86_64-linux]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
Hi,
This is a tricky one and I am still working on narrowing it down, but I will report what I have so far.
I compiled a version of 2_6_6 from github: ruby 2.6.6p116 (2019-10-02 revision 67825) [x86_64-linux]
I have a minimal Rails project that uses Mongoid. It crashes with a segmentation fault when rspec runs. The concurrent ruby gem is in some way involved, and I have been posting there: https://github.com/ruby-concurrency/concurrent-ruby/issues/808
However, I think there is a deeper problem - I would not expect a user level script to cause a segmentation fault.
I have been putting a lot of debugging statements in, and turned on Thread.DEBUG, and have noticed some things. I am not experienced with Ruby's internals, so some of these bits of data might be normal or irrelevant:
* The concurrent-ruby gem uses ObjectSpace.define_finalizer to set a finalizer
* That finalizer creates a new Thread
* However, it appears as if that thread is running after the main thread is already dead, so code that expects to reference the main thread crashes, because it's a NULL reference.
I tried the following test code:
```
class Foo
def initialize
ObjectSpace.define_finalizer(self, proc do
Foo.foo_finalizer
end)
end
def bar
puts 'bar'
end
def Foo.foo_finalizer
puts "foo_finalizer"
t = Thread.new do
puts "Thread reporting for duty"
end
puts "foo_finalizer thread launched"
sleep 5
end
end
f = Foo.new
f.bar
f = nil
```
While trying to develop a simple test case to demonstrate the problem. It triggers rb_raise(rb_eThreadError, "can't alloc thread"); in thread_s_new, because it looks like the main thread has already been marked as 'killed' in this case. When I check the main thread status in thread_s_new with the above code, it reports 'dead'.
When I run my rspec code in the sample Rails project, thread_s_new shows the main thread's status as 'run' even if it should be dead?
I have seen some debugging things that shows some exceptions and thread_join interrupts and so on.
Is it possible that something like this is happening?
Main thread starts doing a cleanup, and gets an exception or something that generates an interrupt, and its KILLED status gets reset to RUNNABLE
Then, in the finalizer, it starts creating a Thread, but at this point the main thread actually does get killed, and when that finalizer thread tries to run it runs into a null reference?
I can provide the Rails sample project if needs be.
Sorry if any of the above isn't clear; I've been staring at the C code for several hours and am a bit cross-eyed!
Thank you for any insights.
--
https://bugs.ruby-lang.org/
Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>