From: Eric Wong Date: 2014-08-16T08:34:53+00:00 Subject: [ruby-core:64415] Re: [ruby-trunk - Bug #10009] IO operation is 10x slower in multi-thread environment normalperson@yhbt.net wrote: > I doubt I can noticeably improve performance with futexes vs mutex/condvar. Totally not-speed-optimized futex-based lock/condvar implementation at git://bogomips.org/ruby.git (futex branch) http://bogomips.org/ruby.git/patch?id=ae93c50c8de I am not sure if my implementation is correct, but "make check" passes with both 8 cores and 1 core active (8-core Vishera). I will probably write an independent (C-only) test for more parallelism and maybe steal some from glibc (I also plan on using this futex-based lock implementation outside of Ruby). Benchmarks don't seem to show much (if any) improvement, yet. Speed improvement from reimplementing GVL around bare futex interface may be possible (w/o using separate condvar/mutex layer). On amd64 GNU/Linux, pthread_mutex_t is 40 bytes, but these futex-based locks only need 4 bytes. Similarly, pthread_cond_t is 48 bytes, making rb_nativethread_cond_t 56 bytes with pthreads; this futex implementation currently requires only 16 bytes for a condvar. Size improvement may be noticeable for some apps with many Mutexes: the lock/cond reductions mean rb_mutex_struct is now 48 bytes instead of 128 bytes.