From: "pocke (Masataka Kuwabara) via ruby-core" <ruby-core@...>
Date: 2024-09-02T06:24:51+00:00
Subject: [ruby-core:119000] [Ruby master Bug#20710] Reducing Hash allocation introduces large performance degradation (probably related to VWA)

Issue #20710 has been reported by pocke (Masataka Kuwabara).

----------------------------------------
Bug #20710: Reducing Hash allocation introduces large performance degradation (probably related to VWA)
https://bugs.ruby-lang.org/issues/20710

* Author: pocke (Masataka Kuwabara)
* Status: Open
* ruby -v: ruby 3.3.4 (2024-07-09 revision be1089c8ec) [arm64-darwin21]
* Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
I found a surprising performance degradation while developing RBS.
In short, I tried to remove unnecessary Hash allocations for RBS. Then, it made the execution time 2x slower.

VWA for Hash probably causes this degradation. I'd be happy if we could mitigate the impact by updating the memory management strategy.


## Reproduce

You can reproduce this problem on a PR in pocke/rbs repository.
https://github.com/pocke/rbs/pull/2
This PR dedups empty Hash objects.

1. `git clone` and checkout
1. `bundle install`
1. `bundle exec rake compile` for C-ext
1. `bundle ruby benchmark/benchmark_new_env.rb`

The "before" commit is https://github.com/pocke/rbs/commit/2c356c060286429cfdb034f88a74a6f94420fd21.
The "after" commit is https://github.com/pocke/rbs/commit/bfb2c367c7d3b7f93720392252d3a3980d7bf335.

The benchmark results are the following:

```
# Before
$ bundle exec ruby benchmark/benchmark_new_env.rb
(snip)
             new_env      6.426 (�15.6%) i/s -     64.000 in  10.125442s
       new_rails_env      0.968 (� 0.0%) i/s -     10.000 in  10.355738s

# After
$ bundle exec ruby benchmark/benchmark_new_env.rb
(snip)
             new_env      4.371 (�22.9%) i/s -     43.000 in  10.150192s
       new_rails_env      0.360 (� 0.0%) i/s -      4.000 in  11.313158s
```

The IPS decreased 1.47x for `new_env` case (parsing small RBS env), and 2.69x for `new_rails_env` (parsing large RBS env).


## Investigation

### GC.stat

`GC.stat` indicates the number of minor GCs increases.

```ruby
# In the RBS repository
require_relative './benchmark/utils'

tmpdir = prepare_collection!
new_rails_env(tmpdir)
pp GC.stat
```


```
# before
{:count=>126,
 :time=>541,
 :marking_time=>496,
 :sweeping_time=>45,
 :heap_allocated_pages=>702,
 :heap_sorted_length=>984,
 :heap_allocatable_pages=>282,
 :heap_available_slots=>793270,
 :heap_live_slots=>787407,
 :heap_free_slots=>5863,
 :heap_final_slots=>0,
 :heap_marked_slots=>757744,
 :heap_eden_pages=>702,
 :heap_tomb_pages=>0,
 :total_allocated_pages=>702,
 :total_freed_pages=>0,
 :total_allocated_objects=>2220605,
 :total_freed_objects=>1433198,
 :malloc_increase_bytes=>5872,
 :malloc_increase_bytes_limit=>16777216,
 :minor_gc_count=>112,
 :major_gc_count=>14,
 :compact_count=>0,
 :read_barrier_faults=>0,
 :total_moved_objects=>0,
 :remembered_wb_unprotected_objects=>0,
 :remembered_wb_unprotected_objects_limit=>4779,
 :old_objects=>615704,
 :old_objects_limit=>955872,
 :oldmalloc_increase_bytes=>210912,
 :oldmalloc_increase_bytes_limit=>16777216}

# after
{:count=>255,
 :time=>1551,
 :marking_time=>1496,
 :sweeping_time=>55,
 :heap_allocated_pages=>570,
 :heap_sorted_length=>1038,
 :heap_allocatable_pages=>468,
 :heap_available_slots=>735520,
 :heap_live_slots=>731712,
 :heap_free_slots=>3808,
 :heap_final_slots=>0,
 :heap_marked_slots=>728727,
 :heap_eden_pages=>570,
 :heap_tomb_pages=>0,
 :total_allocated_pages=>570,
 :total_freed_pages=>0,
 :total_allocated_objects=>2183278,
 :total_freed_objects=>1451566,
 :malloc_increase_bytes=>1200,
 :malloc_increase_bytes_limit=>16777216,
 :minor_gc_count=>242,
 :major_gc_count=>13,
 :compact_count=>0,
 :read_barrier_faults=>0,
 :total_moved_objects=>0,
 :remembered_wb_unprotected_objects=>0,
 :remembered_wb_unprotected_objects_limit=>5915,
 :old_objects=>600594,
 :old_objects_limit=>1183070,
 :oldmalloc_increase_bytes=>8128,
 :oldmalloc_increase_bytes_limit=>16777216}
```

### Warming up Hashes

The following patch, which creates unnecessary Hash objects before the benchmark, improves the execution time.


```diff
diff --git a/benchmark/benchmark_new_env.rb b/benchmark/benchmark_new_env.rb
index 6dd2b73f..a8da61c6 100644
--- a/benchmark/benchmark_new_env.rb
+++ b/benchmark/benchmark_new_env.rb
@@ -4,6 +4,8 @@ require 'benchmark/ips'
 
 tmpdir = prepare_collection!
 
+(0..30_000_000).map { {} }
+
 Benchmark.ips do |x|
   x.time = 10
```


The results are the following:

```
# Before
Calculating -------------------------------------
             new_env     10.354 (� 9.7%) i/s -    103.000 in  10.013834s
       new_rails_env      1.661 (� 0.0%) i/s -     17.000 in  10.282490s

# After
Calculating -------------------------------------
             new_env     10.771 (� 9.3%) i/s -    107.000 in  10.010446s
       new_rails_env      1.584 (� 0.0%) i/s -     16.000 in  10.178984s
```


### `RUBY_GC_HEAP_FREE_SLOTS_MIN_RATIO`

The `RUBY_GC_HEAP_FREE_SLOTS_MIN_RATIO` env var also mitigates the performance impact.
In this example, I set `RUBY_GC_HEAP_FREE_SLOTS_MIN_RATIO=0.6` (default: 0.20)

```console
# Before
Calculating -------------------------------------
             new_env     10.271 (� 9.7%) i/s -    102.000 in  10.087191s
       new_rails_env      1.529 (� 0.0%) i/s -     16.000 in  10.538043s

# After
$ env RUBY_GC_HEAP_FREE_SLOTS_MIN_RATIO=0.6 bundle exec ruby benchmark/benchmark_new_env.rb
Calculating -------------------------------------
             new_env     11.003 (� 9.1%) i/s -    110.000 in  10.068428s
       new_rails_env      1.347 (� 0.0%) i/s -     14.000 in  11.117665s
```


## Additional Information

* I applied the same change to Array. But it does not cause this problem.
  * I guess the cause is the difference of the Size Pool. An empty Array uses 40 bytes like the ordinal Ruby object, but an empty Hash uses 160 bytes.
  * The Size Pool for 160 bytes objects has fewer objects than the 40 bytes one. So, reducing allocation affects the performance sensitively.
* I tried it on Ruby 3.2. This change on Ruby 3.2 does not degrade the execution time.
  * VWA for Hash is introduced since Ruby 3.3. https://github.com/ruby/ruby/blob/73c39a5f93d3ad4514a06158e2bb7622496372b9/doc/NEWS/NEWS-3.3.0.md#gc--memory-management



## Acknowledgement

@mame, @ko1, and @soutaro helped the investigation. I would like to thank them. 



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/