From: Narihiro Nakamura Date: 2012-01-05T20:02:53+09:00 Subject: [ruby-core:41916] Proposal: Bitmap Marking GC Hi. I created Bitmap Marking GC for Ruby2.0. Source code: https://github.com/authorNari/ruby/tree/bitmap_marking Patch: https://github.com/authorNari/patch_bag/blob/master/ruby/gc_bitmap_using_alignment_r33786.patch In following environment, this patch works 'make check' and 'make TESTS="--gc-stress" test-all'. $ ruby -v ruby 2.0.0dev (2011-11-18 trunk 33786) [x86_64-linux] = Performance evaluation == make benchmark The result of make benchmark OPTS="-r 5" is here. https://gist.github.com/1542547 In general, it's a little bit slower. In Bitmap Marking GC, GC will need to find a bitmap for a object in a mark process. So, GC will be a little bit slow. == skkzipcode Bitmap Marking GC is copy-on-write friendly as Ruby Enterprise Edition does. http://www.rubyenterpriseedition.com/faq.html I measured a above improvement by skkzipcode which is a benchmark program. In skkzipcode, the parent process keeps many data and child processes uses data that is shared with the parent process. https://github.com/authorNari/skkzipcode (This program uses /proc/PID/smaps to profile memory usages) origin PROCESS_CNT : 5 SHARED_TOTAL: 59124 kb PRIV_TOTAL : 224892 kb REE - GC.copy_on_write_friendly = true PROCESS_CNT : 5 SHARED_TOTAL: 207720 kb PRIV_TOTAL : 164572 kb bmap - Bitmap Marking GC for Ruby 2.0 PROCESS_CNT : 5 SHARED_TOTAL: 170744 kb PRIV_TOTAL : 138336 kb * PROCESS_CNT: count of child processes * SHARED_TOTAL: total of shared memory usage of child processes (KB) * PRIV_TOTAL: total of private memory usage of child processes (KB) bmap is copy-on-write friendly!! = Implementation Let me introduce some implementation topics. * A heap block address is aligned by 16KB to find fast a bitmap. * In Linux, it uses posix_memalign() or memalign(). * In Windows, it uses _aligned_malloc(). * To avoid unnecessary writing, GC decreases to relink freelist. * GC doesn't relink objects that are linked on freelist at starting GC. * A heap slot has freelist. * I embed a struct heaps_slot to a heap block. This patch improves memory usage on programs that are using fork() in Linux. We have to use fork() when we need a real parallel performance in CRuby. And, we already have many libraries that are using fork(). (e.g. Unicorn, Resque). And, GC is a little bit slower. But, I think it's in acceptable range. I already posted this topic to ruby-dev. http://bugs.ruby-lang.org/issues/5839 Matz agreed to commit this patch to trunk. Thanks. -- Narihiro Nakamura (nari)