From: Eric Wong Date: 2022-01-17T23:14:38+00:00 Subject: [ruby-core:107170] Re: [Ruby master Feature#18494] [RFC] ENV["RUBY_GC_..."]= changes GC parameters dynamically --Ag28RZJR3STe0FEq Content-Type: text/plain; charset=utf-8 Content-Disposition: inline > https://bugs.ruby-lang.org/issues/18494 Thanks both for the comments. "ko1 (Koichi Sasada)" wrote: > Some `RUBY_GC_...` vars do not affect correctly because they are used only on setup. > We need to document which vars can be modified dynamically. Updated patch series attached, patch 3/3 adds partial guard to RUBY_GC_INIT_HEAP_SLOTS to prevent gc_set_initial_pages(). Patch 1/3 also fixes an existing bug in how RUBY_GC_MALLOC_LIMIT that is independent of this new feature. All the other parameters seem fine, however I'm not sure about interactions w.r.t. Ractors and double-precision floats. "byroot (Jean Boussier)" wrote: > Should we allow to change them at runtime through some `::GC` > API? I could see some automatic/dynamic GC tuning gems being > implemented using these APIs. e.g. monitor how often and how > long the GC runs, and tweak some values in response. I'm rather strongly against this. I envision use would be: 1. load a bunch of code, read configs, etc... 2. set GC params 3. run main loop until exit The main loop in reasonably-written applications should be highly predictable in terms of memory use and have minimal outliers and spikes. IOW, a web server should be serving reasonably-sized HTML pages/chunks that clients can render w/o falling over; and reading/sending large files in chunks rather than slurping hundreds of MB into RAM. For a rare memory spikes inside the main loop, I'd much rather trigger GC ASAP than be saddled with a larger heap for the remainder of process lifetime (because larger heaps increase potential for fragmentation). I see the presence of tuning knobs to be an admission of the shortcomings of the GC and something that could/should eventually be eliminated via GC improvements. (Fwiw, I've mostly given up on Ruby and it's GC; but I made this patch since it's too time-consuming to rewrite some Ruby in a scripting language with auto ref-counting and faster startup time). Anyways, I've updated the commit message in my original patch (now patch 2/3) to further explain my decision. Here's the relevant snippet: This has no extra API footprint, and will silently be a no-op for other Ruby implementations. I tried to make this change as non-intrusive as possible to minimize the growth in executable and icache size. It is not optimized for repeated changes and (IMHO) should not be. IMHO, tuning knobs should be last resorts and used sparingly to minimize the learning curve and cognitive overhead required to run applications. Thus there is no "GC.foo=" API to reduce documentation and support overhead, since GC parameters are implementation-dependent and may change over time. Developers can make ENV changes freely without worrying about forward, backwards, nor cross-engine incompatibilities. These parameters are already documented in the manpage and other sources, so there's less cognitive overhead required to learn them. Increasing the Ruby API footprint would also hurt startup time and increases memory usage. Available as a self-hosted pull request (generated via "git request-pull"): The following changes since commit eb98275c967d8938526966fe53e52f5a10249492: * 2022-01-18 [ci skip] (2022-01-18 05:39:51 +0900) are available in the Git repository at: https://yhbt.net/ruby.git gc-param-dyn-v2 for you to fetch changes up to ffb336a8ddb0e21d8f3bb4ce1801e90ea78af42d: ruby_gc_set_params: do not set initial pages on dynamic update (2022-01-17 22:59:58 +0000) ---------------------------------------------------------------- Eric Wong (3): ruby_gc_set_params: update malloc_limit when env is set ENV["RUBY_GC_..."]= changes GC parameters dynamically ruby_gc_set_params: do not set initial pages on dynamic update gc.c | 11 +++++++---- hash.c | 5 +++++ internal/gc.h | 2 +- ruby.c | 2 +- test/ruby/test_gc.rb | 4 ++++ 5 files changed, 18 insertions(+), 6 deletions(-) Thanks for reading this far. --Ag28RZJR3STe0FEq Content-Type: text/x-diff; charset=utf-8 Content-Disposition: attachment; filename="0001-ruby_gc_set_params-update-malloc_limit-when-env-is-s.patch" From 33743e133a80063e5cdc605d91f03b1c01b030c6 Mon Sep 17 00:00:00 2001 From: Eric Wong Date: Mon, 17 Jan 2022 20:36:49 +0000 Subject: [PATCH 1/3] ruby_gc_set_params: update malloc_limit when env is set During VM startup, rb_objspace_alloc sets malloc_limit (objspace->malloc_params.limit) before ruby_gc_set_params is called, thus nullifying the effect of RUBY_GC_MALLOC_LIMIT before the initial GC run. The call sequence is as follows: main.c::main() ruby_init ruby_setup Init_BareVM rb_objspace_alloc // malloc_limit = gc_params.malloc_limit_min; ruby_options ruby_process_options process_options ruby_gc_set_params // RUBY_GC_MALLOC_LIMIT => gc_params.malloc_limit_min With ruby_gc_set_params setting malloc_limit, RUBY_GC_MALLOC_LIMIT affects the process sooner. --- gc.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/gc.c b/gc.c index aca15ad426..332581c279 100644 --- a/gc.c +++ b/gc.c @@ -11060,6 +11060,7 @@ gc_set_initial_pages(void) void ruby_gc_set_params(void) { + rb_objspace_t *objspace = &rb_objspace; /* RUBY_GC_HEAP_FREE_SLOTS */ if (get_envparam_size("RUBY_GC_HEAP_FREE_SLOTS", &gc_params.heap_free_slots, 0)) { /* ok */ @@ -11080,7 +11081,9 @@ ruby_gc_set_params(void) gc_params.heap_free_slots_min_ratio, gc_params.heap_free_slots_max_ratio, TRUE); get_envparam_double("RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR", &gc_params.oldobject_limit_factor, 0.0, 0.0, TRUE); - get_envparam_size ("RUBY_GC_MALLOC_LIMIT", &gc_params.malloc_limit_min, 0); + if (get_envparam_size("RUBY_GC_MALLOC_LIMIT", &gc_params.malloc_limit_min, 0)) { + malloc_limit = gc_params.malloc_limit_min; + } get_envparam_size ("RUBY_GC_MALLOC_LIMIT_MAX", &gc_params.malloc_limit_max, 0); if (!gc_params.malloc_limit_max) { /* ignore max-check if 0 */ gc_params.malloc_limit_max = SIZE_MAX; @@ -11089,7 +11092,6 @@ ruby_gc_set_params(void) #if RGENGC_ESTIMATE_OLDMALLOC if (get_envparam_size("RUBY_GC_OLDMALLOC_LIMIT", &gc_params.oldmalloc_limit_min, 0)) { - rb_objspace_t *objspace = &rb_objspace; objspace->rgengc.oldmalloc_increase_limit = gc_params.oldmalloc_limit_min; } get_envparam_size ("RUBY_GC_OLDMALLOC_LIMIT_MAX", &gc_params.oldmalloc_limit_max, 0); --Ag28RZJR3STe0FEq Content-Type: text/x-diff; charset=utf-8 Content-Disposition: attachment; filename="0002-ENV-RUBY_GC_.-changes-GC-parameters-dynamically.patch" From 895b15e27edd34ffa2382aa2f6f7f1bb4e663e99 Mon Sep 17 00:00:00 2001 From: Eric Wong Date: Sun, 16 Jan 2022 07:48:18 +0000 Subject: [PATCH 2/3] ENV["RUBY_GC_..."]= changes GC parameters dynamically This is intended to give Ruby application developers a way to to improve the out-of-the-box experience for end users running tools written in Ruby. In most cases, end users are not and cannot be expected to know how to tune the GC better than the developers who wrote the Ruby code. This has no extra API footprint, and will silently be a no-op for other Ruby implementations. I tried to make this change as non-intrusive as possible to minimize the growth in executable and icache size. It is not optimized for repeated changes and (IMHO) should not be. IMHO, tuning knobs should be last resorts and used sparingly to minimize the learning curve and cognitive overhead required to run applications. Thus there is no "GC.foo=" API to reduce documentation and support overhead, since GC parameters are implementation-dependent and may change over time. Developers can make ENV changes freely without worrying about forward, backwards, nor cross-engine incompatibilities. These parameters are already documented in the manpage and other sources, so there's less cognitive overhead required to learn them. Increasing the Ruby API footprint would also hurt startup time and increases memory usage. One potential incompatibility is users doing something like: ENV["RUBY_GC_..."] = "1m" system(...) However, the different behavior would be largely innocuous aside from different performance characteristics in the parent process. Using: system({ "RUBY_GC_..." => "1m" }, ...) ...would restore the previous behavior (and is generally the preferred usage, anyways, to avoid thread-safety issues). I've yet to check Ractor interactions since haven't followed Ruby in several years. I made this change to reduce memory use in a single-threaded pipeline+process manager designed for audio playback; but it probably makes sense for many long-running daemons that want to clamp memory use after all code is loaded. --- hash.c | 5 +++++ test/ruby/test_gc.rb | 4 ++++ 2 files changed, 9 insertions(+) diff --git a/hash.c b/hash.c index f032ef642a..d7cc797ef5 100644 --- a/hash.c +++ b/hash.c @@ -4911,6 +4911,7 @@ static VALUE env_aset(VALUE nm, VALUE val); static void reset_by_modified_env(const char *nam) { + static char gc_var_pfx[] = "RUBY_GC_"; /* * ENV['TZ'] = nil has a special meaning. * TZ is no longer considered up-to-date and ruby call tzset() as needed. @@ -4919,6 +4920,10 @@ reset_by_modified_env(const char *nam) */ if (ENVMATCH(nam, TZ_ENV)) { ruby_reset_timezone(); + } else if (ENVNMATCH(nam, gc_var_pfx, sizeof(gc_var_pfx) - 1)) { + ENV_LOCK(); + ruby_gc_set_params(); + ENV_UNLOCK(); } } diff --git a/test/ruby/test_gc.rb b/test/ruby/test_gc.rb index 788f2974b5..5fd5924fb3 100644 --- a/test/ruby/test_gc.rb +++ b/test/ruby/test_gc.rb @@ -334,6 +334,10 @@ def test_gc_parameter assert_in_out_err([env, "-w", "-e", "exit"], "", [], /RUBY_GC_OLDMALLOC_LIMIT_MAX=16000000/, "") assert_in_out_err([env, "-w", "-e", "exit"], "", [], /RUBY_GC_OLDMALLOC_LIMIT_GROWTH_FACTOR=2.0/, "") end + + assert_in_out_err(["-w", "-e", <<-'end'], "", [], /RUBY_GC_MALLOC_LIMIT=1024/, "") + ENV['RUBY_GC_MALLOC_LIMIT'] = '1k' + end end def test_profiler_enabled --Ag28RZJR3STe0FEq Content-Type: text/x-diff; charset=utf-8 Content-Disposition: attachment; filename="0003-ruby_gc_set_params-do-not-set-initial-pages-on-dynam.patch" From b7d74570bfe1df47e69095b1eaf7e87a8c274026 Mon Sep 17 00:00:00 2001 From: Eric Wong Date: Mon, 17 Jan 2022 21:48:57 +0000 Subject: [PATCH 3/3] ruby_gc_set_params: do not set initial pages on dynamic update RUBY_GC_INIT_HEAP_SLOTS is only for the initial heap and changing it via ENV#[]= would be surprising behavior and cause unintended heap growth. Changing gc_params.heap_init_slots via RUBY_GC_INIT_HEAP_SLOTS still affects gc_marks_finish(), however, so assigning to ENV["RUBY_GC_INIT_HEAP_SLOTS"] is not entirely a no-op --- gc.c | 5 +++-- hash.c | 2 +- internal/gc.h | 2 +- ruby.c | 2 +- 4 files changed, 6 insertions(+), 5 deletions(-) diff --git a/gc.c b/gc.c index 332581c279..14654c5ea8 100644 --- a/gc.c +++ b/gc.c @@ -11058,7 +11058,7 @@ gc_set_initial_pages(void) */ void -ruby_gc_set_params(void) +ruby_gc_set_params(bool initial) { rb_objspace_t *objspace = &rb_objspace; /* RUBY_GC_HEAP_FREE_SLOTS */ @@ -11068,7 +11068,8 @@ ruby_gc_set_params(void) /* RUBY_GC_HEAP_INIT_SLOTS */ if (get_envparam_size("RUBY_GC_HEAP_INIT_SLOTS", &gc_params.heap_init_slots, 0)) { - gc_set_initial_pages(); + if (initial) + gc_set_initial_pages(); } get_envparam_double("RUBY_GC_HEAP_GROWTH_FACTOR", &gc_params.growth_factor, 1.0, 0.0, FALSE); diff --git a/hash.c b/hash.c index d7cc797ef5..6a149247e3 100644 --- a/hash.c +++ b/hash.c @@ -4922,7 +4922,7 @@ reset_by_modified_env(const char *nam) ruby_reset_timezone(); } else if (ENVNMATCH(nam, gc_var_pfx, sizeof(gc_var_pfx) - 1)) { ENV_LOCK(); - ruby_gc_set_params(); + ruby_gc_set_params(false); ENV_UNLOCK(); } } diff --git a/internal/gc.h b/internal/gc.h index baf4f36a10..1fea5768cb 100644 --- a/internal/gc.h +++ b/internal/gc.h @@ -90,7 +90,7 @@ void ruby_mimfree(void *ptr); void rb_objspace_set_event_hook(const rb_event_flag_t event); VALUE rb_objspace_gc_enable(struct rb_objspace *); VALUE rb_objspace_gc_disable(struct rb_objspace *); -void ruby_gc_set_params(void); +void ruby_gc_set_params(bool); void rb_copy_wb_protected_attribute(VALUE dest, VALUE obj); #if __has_attribute(alloc_align) __attribute__((__alloc_align__(1))) diff --git a/ruby.c b/ruby.c index 747934e6d1..d85b7694f0 100644 --- a/ruby.c +++ b/ruby.c @@ -1886,7 +1886,7 @@ process_options(int argc, char **argv, ruby_cmdline_options_t *opt) translit_char(RSTRING_PTR(opt->script_name), '\\', '/'); #endif - ruby_gc_set_params(); + ruby_gc_set_params(true); ruby_init_loadpath(); #if USE_MJIT --Ag28RZJR3STe0FEq Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline ClVuc3Vic2NyaWJlOiA8bWFpbHRvOnJ1YnktY29yZS1yZXF1ZXN0QHJ1YnktbGFuZy5vcmc/c3Vi amVjdD11bnN1YnNjcmliZT4KPGh0dHA6Ly9saXN0cy5ydWJ5LWxhbmcub3JnL2NnaS1iaW4vbWFp bG1hbi9vcHRpb25zL3J1YnktY29yZT4K --Ag28RZJR3STe0FEq--