From: Petr Salinger Date: 2009-08-31T20:43:05+09:00 Subject: [ruby-core:25217] [Bug #2025] problem with pthread handling on non NPTL platform Bug #2025: problem with pthread handling on non NPTL platform http://redmine.ruby-lang.org/issues/show/2025 Author: Petr Salinger Status: Open, Priority: Normal Target version: 1.9.x ruby -v: 1.9.1.243 I tried to fix some testsuite failures on GNU/kFreeBSD, http://bugs.debian.org//cgi-bin/bugreport.cgi?bug=542927. I observed some problems in the pthread related code. The hang in 1st test in http://redmine.ruby-lang.org/issues/show/1525 also applies for us. IMO, the ruby should try to work under any POSIX pthread conforming implementation, not only NPTL. The code audit in this area seems needed. There are some problems with handling of fork()/exec(). There really should be reinitialization of locks in child, the timer should be started using pthread_once(), the current approach is fragile and might lead to start of more timer threads. http://www.opengroup.org/onlinepubs/9699919799/functions/pthread_once.html In general, I do not understand how code in thread_pthread.c: static pthread_t timer_thread_id; static pthread_cond_t timer_thread_cond = PTHREAD_COND_INITIALIZER; static pthread_mutex_t timer_thread_lock = PTHREAD_MUTEX_INITIALIZER; rb_thread_create_timer_thread() thread_timer() could survive correctly fork(), see also http://www.opengroup.org/onlinepubs/009695399/functions/pthread_atfork.html I really doubt the following code in process.c for rb_f_fork(VALUE obj) is correct: switch (pid = rb_fork(0, 0, 0, Qnil)) { case 0: #ifdef linux after_exec(); #endif rb_thread_atfork(); if (rb_block_given_p()) { int status; rb_protect(rb_yield, Qundef, &status); ruby_stop(status); } The conditional after_exec() shouldn't be here. There is already "after_fork()" at line 2331, which is executed for both parent and child. The exception is when chfunc is not NULL, then it is not executed at all. The bug is timing dependent, i.e. there is a race condition. Sometimes the child process would have 2 timer threads, sometimes it would have the expected 1. Only the probability of 2 is higher on linuxthreads compared to NPTL, but it can happen under any pthread implementation. Ruby should not use PTHREAD_CREATE_DETACHED and after that use pthread_join. http://www.opengroup.org/onlinepubs/9699919799/functions/pthread_join.html: "The behavior is undefined if the value specified by the thread argument to pthread_join() does not refer to a joinable thread." Ruby should use pthread_sigmask() instead of sigprocmask() when available and so on. http://www.opengroup.org/onlinepubs/9699919799/functions/pthread_sigmask.html: "The use of the sigprocmask() function is unspecified in a This would work correctly on both linuxthreads/NPTL and should on any POSIX pthread conforming implementation. Ideally, ruby would not require full conformance, but also accept some known exceptions, like our getpid() difference. ---------------------------------------- http://redmine.ruby-lang.org