From: ko1@... Date: 2014-06-09T11:37:47+00:00 Subject: [ruby-core:63018] [ruby-trunk - Bug #9607] Change the full GC timing Issue #9607 has been updated by Koichi Sasada. File discourse_benchmark.png added File young_objects.png added After introducing AGE2_PROMOTION patch, only a few old objects are added for each minor GC. The impact of this issue (too many slots) are not critical now. But some application can not protect increasing total slots. I rewrite algorithm more simple: ```ruby def gc if do_major_gc_at_next_gc major_gc = false minor_mark() else major_gc = true major_mark() last_major_gc = GC.count end sweep() # Actually it is lazy sweep. if total_slots * 0.7 < using_slots # not enough space if major_gc || GC.count - last_major_gc > 2 # (A) extend heap at least 2 minor GC run extend_heap else do_major_gc_at_next_gc = true end end end ``` This algorithm simply do: (1) If not enough slots after minor GC, do major GC at next GC (2) If not enough slots after major GC, extend heap (3) If not enough slots after minor GC *and* only a few minor GC until last major GC (it should be not enough slots), extend heap ((A) in the above algorithm) This strategy keeps lower total slots. ![discorse benchmark result](discourse_benchmark.png) This picture shows the (a) total_slots (b) old_objects (c) ru_minflt (used memory pages given by getrusage) for current trunk and modified (above algorithm introduced) trunk using gc_tracer. You can see: (1) ru_minflt (rough memory usage) is linear to total_slots. (2) heap_slots/modified is fewer than heap_slots. It has impact. (3) old_objects/modified is reduced periodically. Do major GC aggressively compare with current trunk. (4) Modified trunk invokes higher number of GC events (similar to GC count). This is because total_slots number is fewer than current trunk. It is intentional. You can increment consuming slot number using environment variables. So that I will introduce this patch. It means that default GC parameter is "low memory consuming/low performance" compare with current trunk/Ruby 2.1. The patch is also small: ```diff Index: gc.c =================================================================== --- gc.c (revision 46386) +++ gc.c (working copy) @@ -531,6 +531,9 @@ int parent_object_is_old; int need_major_gc; + + size_t last_major_gc; + size_t remembered_shady_object_count; size_t remembered_shady_object_limit; size_t old_object_count; @@ -3035,15 +3038,13 @@ (int)heap->total_slots, (int)heap_pages_swept_slots, (int)heap_pages_min_free_slots); if (heap_pages_swept_slots < heap_pages_min_free_slots) { - heap_set_increment(objspace, heap_extend_pages(objspace)); - heap_increment(objspace, heap); - -#if USE_RGENGC - if (objspace->rgengc.remembered_shady_object_count + objspace->rgengc.old_object_count > (heap_pages_length * HEAP_OBJ_LIMIT) / 2) { - /* if [old]+[remembered shady] > [all object count]/2, then do major GC */ - objspace->rgengc.need_major_gc = GPR_FLAG_MAJOR_BY_RESCAN; + if (objspace->rgengc.during_minor_gc && objspace->profile.count - objspace->rgengc.last_major_gc > 2 /* magic number */) { + objspace->rgengc.need_major_gc = GPR_FLAG_MAJOR_BY_NOFREE; } -#endif + else { + heap_set_increment(objspace, heap_extend_pages(objspace)); + heap_increment(objspace, heap); + } } gc_prof_set_heap_info(objspace); @@ -4256,6 +4257,7 @@ objspace->profile.major_gc_count++; objspace->rgengc.remembered_shady_object_count = 0; objspace->rgengc.old_object_count = 0; + objspace->rgengc.last_major_gc = objspace->profile.count; rgengc_mark_and_rememberset_clear(objspace, heap_eden); } #endif ``` PS. BTW, AGE2PROMOTION strategy introducing young objects, which is promoted from infant objects and will be promoted to old objects. I also counts them and I found that they are only a few objects in this program. ![young_objects](young_objects.png) ---------------------------------------- Bug #9607: Change the full GC timing https://bugs.ruby-lang.org/issues/9607#change-47104 * Author: Koichi Sasada * Status: Open * Priority: Normal * Assignee: Koichi Sasada * Category: core * Target version: current: 2.2.0 * ruby -v: 2.2 * Backport: 1.9.3: UNKNOWN, 2.0.0: UNKNOWN, 2.1: UNKNOWN ---------------------------------------- Abstract ======== Generational GC (called RGenGC) was introduced from Ruby 2.1.0. RGenGC reduces marking time dramatically (about x10 faster). However, RGenGC introduce huge memory consumption. This problem has impact especially for small memory machines. Ruby 2.1.1 introduced new environment variable RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR to control full GC timing. However, this solution is not solve problem completely. To solve this issue, we modify **Full GC timing strategy**: (1) Always invoke full GC before extending the heap. (2) Increase the heap if not enough old-objects space. This modification introduces a bit slow down, but reduce memory consumption. Background and problem ====================== RGenGC algorithm ---------------- Ruby 2.0 and earlier versions uses simple mark and sweep. Long marking time had been an big issue. To solve this issue, Ruby 2.1.0 introduced new generational GC called RGenGC (restricted generational GC). RGenGC algorithm enables to introduce partial marking (called `minor GC'), which marks only newer created objects, and skips marking fof old objects (*1). Sometime, this marks all objects (called `major GC' or `full GC'). Many minor GC and small number of major GC makes GC faster. (*1) RGenGC doesn't skip sweeping for old-objects. This is another issue. Full GC timing -------------- There is a question: "When should we invoke invoke full GC?". Usually, generational GC uses the strategy that "when a space for old objects is full, then invoke full GC". Ruby 2.1.0 defines the size of old space for old objects with `old_object_limit' and old_object_limit is doubled by the old objects number (`old_object_count') at the last full GC. Before the GC, we determine minor or major by comparing `old_object_limit' and current old objects number (`old_object_count') if we compare current old object number and old_object_limit, and do full GC if old_object_count > old_object_limit. Here is a pseudo code of RGenGC: def gc if old_object_count > old_object_limit major_gc = false minor_mark() else major_gc = true major_mark() end sweep() # Actually it is lazy sweep. # double `old_object_count' here when it is major GC old_object_limit = old_object_count * 2 if major_gc end This strategy works fine for memory rich machines, because only a few full GCs are invoked. However, this strategy causes more and more memory consumption. Fig.1 is a result of (modified) discourse benchmark (Thanks Sam Saffron!!). X-axis is GC count and Y-axis represents a number of slots (objects). `total_slots' is avaialbe slots to use, `old_object' is old_object_count. ![Fig1. Usage of slots on Ruby 2.2 dev](ruby2_2.JPG) As you can see, old_object_limit is too high and total_slots are expanded (x1.8, specified by GC_HEAP_GROWTH_FACTOR) before full GC. Full GC timing tuning from Ruby 2.1.1 ------------------------------------- To solve this issue, Ruby 2.1.1 introduced an environment variable "RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR" (use `old_object_limit_factor' for short). This variable control how to extend `old_object_limit'. In pseudo code, we changed from # double `old_object_count' here when it is major GC old_object_limit = old_object_count * 2 if major_gc to # double `old_object_count' here when it is major GC old_object_limit = old_object_count * old_object_limit_factor if major_gc The default value of this environment variable is 2. So it is same behavior on default. With RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR=1.3, the benchmark result is Fig.2. ![Fig2. Usage of slots on Ruby 2.2 dev w/ old_limit_factor=1.3](ruby_2_2_factor_1_3.JPG) We can observe that the total slots doesn't grow than the default behavior. Try this environment variable if you have trouble with memory usage. Note that if you want to disable generational garbage collection, you can specify 0.9 (any number lesser than 1.0) for RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR. With this technique, on every GC "old_object_count > old_object_limit" is true and do major GC. BTW, this variable should be noted on NEWS file. I missed to add it. More intelligent approach ------------------------- "RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR" with small number can solve this issue, but we need to specify correct value for each application. It is tough work for us. Proposal ======== With these graphes, we find two insights. (1) We need to invoke full GC becore expanding heaps. If we invoke full GC, it is possible to stop expanding heaps. (2) Increasing speed of old objects is completely slow. To invoke full GC before expanding, we set a upper bount for old_object_limit as "total_slots * 0.7". This value is same as the threshold to determin expanding heaps or not. After full GC, it is possible that "old_object_count > old_object_limit" is true, but only a few differences. This situation causes many of full GC. To avoid such situation, we add a few slots if "old_object_limit * 0.7 < old_object_count). In this case, "old_object_limit * 0.7" is a minimum space for old objects. In pseudo code: def gc if old_object_count > old_object_limit major_gc = false minor_mark() else major_gc = true major_mark() end sweep() # Actually it is lazy sweep. if major_gc if total_slots * 0.7 < using_slots # not enough space extend_heap(total_slots * (1.8 - 1)) # 1.8 is growth_factor elsif old_object_limit * 0.7 < old_object_count # not enough old object count extend_heap(old_object_count - object_limit * 0.7) end else do_major_gc_at_next_gc = true end if major_gc a = old_object_count * old_object_limit_factor b = total_slots * 0.7 old_object_limit = [a, b].min end end With this proposal, we can reduce total_slots consumption (Fig3, Fig4). ![Fig3. Usage of slots on proposal strategy](proposed.JPG) ![Fig4. Usage of slots on proposal strategy w/ old_limit_factor=1.3](proposed_factor_1_3.JPG) However, more and more GC invoking time. It is trade-off because reducing total_slots introduces more frequent GC. We can solve this issue by making condition parameter 0.7 as tunable. Future work =========== (1) Promotion strategy Current growing speed of old object number is too high. So we need to consider about promotion strategy. Current strategy is "promote young objects when young objects survive one garbage collection". We already implemented "RGENGC_THREEGEN" mode, which enable to filter unexpected promotion. NOTE: THREEGEN = 3gen is strange name because generation is only two. We will change this mode name to AGE2PROMOTION and so on. (2) Partial sweep We successed to use partial marking on minor GC. However, everytime sweep all available slots. Sweeping time is not so big, but there is a space to optimize it. (3) Incremental major GC With this proposal, we increase major GC count. To avoid long major GC pausing time, we need to implement incremental marking on full GC. ---Files-------------------------------- ruby2_2.JPG (64.4 KB) ruby_2_2_factor_1_3.JPG (72.4 KB) proposed.JPG (69.7 KB) proposed_factor_1_3.JPG (70.3 KB) gc.patch (7.48 KB) discourse_benchmark.png (18.4 KB) young_objects.png (8.17 KB) -- https://bugs.ruby-lang.org/