[#23457] [Bug #1471] "Mutual join" deadlock detection faulty in 1.8.6 and 1.8.7 — John Carter <redmine@...>

Bug #1471: "Mutual join" deadlock detection faulty in 1.8.6 and 1.8.7

17 messages 2009/05/15

[#23483] [Bug #1478] Ruby archive — Oleg Puchinin <redmine@...>

Bug #1478: Ruby archive

29 messages 2009/05/16
[#29225] [Feature #1478] Ruby archive — Luis Lavena <redmine@...> 2010/04/02

Issue #1478 has been updated by Luis Lavena.

[#30345] Re: [Feature #1478] Ruby archive — "NAKAMURA, Hiroshi" <nakahiro@...> 2010/05/21

On Fri, Apr 2, 2010 at 17:13, Luis Lavena <redmine@ruby-lang.org> wrote:

[#30346] Re: [Feature #1478] Ruby archive — Jonathan Nielsen <jonathan@...> 2010/05/21

> Thanks for your comment.

[#30347] Re: [Feature #1478] Ruby archive — Jonathan Nielsen <jonathan@...> 2010/05/21

OK Hiroshi, I read some of the comments earlier in the thread that I

[#30355] Re: [Feature #1478] Ruby archive — Caleb Clausen <vikkous@...> 2010/05/21

On 5/20/10, Jonathan Nielsen <jonathan@jmnet.us> wrote:

[#30364] Re: [Feature #1478] Ruby archive — Benoit Daloze <eregontp@...> 2010/05/22

Hi,

[#23505] [Bug #1494] tempfile#unlink may silently fail on windows — Nicholas Manning <redmine@...>

Bug #1494: tempfile#unlink may silently fail on windows

19 messages 2009/05/19

[#23572] [Bug #1525] Deadlock in Ruby 1.9's VM caused by ConditionVariable.wait and fork? — Hongli Lai <redmine@...>

Bug #1525: Deadlock in Ruby 1.9's VM caused by ConditionVariable.wait and fork?

27 messages 2009/05/27

[#23595] Meaning of RUBY_PLATFORM — Rick DeNatale <rick.denatale@...>

The RUBY_PLATFORM constant is documented in the latest Pickaxe as "The

17 messages 2009/05/28
[#23596] Re: Meaning of RUBY_PLATFORM — Luis Lavena <luislavena@...> 2009/05/28

On Thu, May 28, 2009 at 3:41 PM, Rick DeNatale <rick.denatale@gmail.com> wrote:

[#23602] Re: Meaning of RUBY_PLATFORM — Rick DeNatale <rick.denatale@...> 2009/05/28

On Thu, May 28, 2009 at 2:52 PM, Luis Lavena <luislavena@gmail.com> wrote:

[#23608] Re: Meaning of RUBY_PLATFORM — Luis Lavena <luislavena@...> 2009/05/28

On Thu, May 28, 2009 at 7:08 PM, Rick DeNatale <rick.denatale@gmail.com> wrote:

[#23609] Re: Meaning of RUBY_PLATFORM — Rick DeNatale <rick.denatale@...> 2009/05/29

On Thu, May 28, 2009 at 7:22 PM, Luis Lavena <luislavena@gmail.com> wrote:

[ruby-core:23457] [Bug #1471] "Mutual join" deadlock detection faulty in 1.8.6 and 1.8.7

From: John Carter <redmine@...>
Date: 2009-05-15 08:19:51 UTC
List: ruby-core #23457
Bug #1471: "Mutual join" deadlock detection faulty in 1.8.6 and 1.8.7
http://redmine.ruby-lang.org/issues/show/1471

Author: John Carter
Status: Open, Priority: Normal
Category: core
ruby -v: ruby 1.8.7 (2009-04-08 patchlevel 160) [i686-linux]

If a third thread is involved, the code in eval.c in function
rb_thread_join0 to detect a mutual join dead lock is faulty.

This effects ruby 1.8.6 p368 and ruby 1.8.7 p 160 but not ruby 1.9.1 p
129

What seems to be happening is a sequence like this....

main thread grabs mutex

thread two blocks waiting for mutex

thread one blocks waiting for mutex and th->join is set to the main thread.

the main thread unlocks, which wakes thread two. Thread two now has mutex, thread one is still blocked.

the main thread attempts to join thread one, rb_thread_join0 notes
that thread one is in a WAIT_JOIN waitfor state AND th->join still
points at the main thread and (erroneously) reports deadlock.

ruby-1.8.7-p160/eval.c================================================
rb_thread_join0(th, limit)
    rb_thread_t th;
    double limit;
{
    enum rb_thread_status last_status = THREAD_RUNNABLE;

    if (rb_thread_critical) rb_thread_deadlock();
    if (!rb_thread_dead(th)) {
        if (th == curr_thread)
            rb_raise(rb_eThreadError, "thread 0x%lx tried to join itself",
                     th->thread);
        if ((th->wait_for & WAIT_JOIN) && th->join == curr_thread)
            rb_raise(rb_eThreadError, "Thread#join: deadlock 0x%lx - mutual join(0x%lx)",
                     curr_thread->thread, th->thread);
======================================================================
Affected versions.

/opt/ruby/ruby-1.8.6-p368/bin/ruby -v
ruby 1.8.6 (2009-03-31 patchlevel 368) [i686-linux]

/opt/ruby/ruby-1.8.7-p160/bin/ruby -v
ruby 1.8.7 (2009-04-08 patchlevel 160) [i686-linux]

Here is a test script to trigger it....
=mutual_join_bug.rb============================================================
require 'thread'

m = Mutex.new
m.lock
wt2 = Thread.new do
   m.lock
   sleep 2
   m.unlock
end

# Ensure wt2 is waiting on m
sleep 0.1

wt1 = Thread.new do
   m.lock
   m.unlock
end
# Ensure wt1 is waiting on m
sleep 0.1

# Give it to wt2
m.unlock

wt1.join
======================================================================

Output...
/opt/ruby/ruby-1.8.7-p160/bin/ruby -w mutual_join_bug.rb mutual_join_bug.rb:24:in `join': Thread#join: deadlock 0xb7daa1bc - mutual join(0xb7d9bc48) (ThreadError)
        from mutual_join_bug.rb:24




The following patch "fixes" it...
======================================================================
--- eval.c_orig	2009-03-23 22:28:31.000000000 +1300
+++ eval.c	2009-05-15 15:15:34.141270568 +1200
@@ -11426,9 +11426,13 @@
 	if (th == curr_thread)
 	    rb_raise(rb_eThreadError, "thread 0x%lx tried to join itself",
 		     th->thread);
+/* This test for mutual join deadlock is faulty in that
+   it doesn't allow for the possibility that a third thread
+   may now hold the mutex.
 	if ((th->wait_for & WAIT_JOIN) && th->join == curr_thread)
 	    rb_raise(rb_eThreadError, "Thread#join: deadlock 0x%lx - mutual join(0x%lx)",
 		     curr_thread->thread, th->thread);
+*/
 	if (curr_thread->status == THREAD_TO_KILL)
 	    last_status = THREAD_TO_KILL;
 	if (limit == 0) return Qfalse;
======================================================================

But I'm not sure it is the correct fix. I'm still experience some
strangeness which I haven't pinned down yet.

I would seem to me that the correct fix is when a mutex is unlocked,
and then locked by some other thread, the th->join pointers of all
threads waiting on that mutex should be updated to point at the thread
now holding the mutex.


----------------------------------------
http://redmine.ruby-lang.org

In This Thread

Prev Next