[#119132] Segfault using ruby C on MacOS (Intel Catalina and M2 Sonoma) — "martin.kufner--- via ruby-core" <ruby-core@...>
Hey guys,
4 messages
2024/09/12
[#119133] Re: Segfault using ruby C on MacOS (Intel Catalina and M2 Sonoma)
— "martin.kufner--- via ruby-core" <ruby-core@...>
2024/09/12
I just saw, that the #includes dont show up in the c file ...
[#119145] [Ruby master Misc#20728] Propose Eileen Uchitelle as a core committer — "kddnewton (Kevin Newton) via ruby-core" <ruby-core@...>
Issue #20728 has been reported by kddnewton (Kevin Newton).
14 messages
2024/09/12
[#119312] [Ruby master Bug#20762] `make test-basic` with -DRGENGC_FORCE_MAJOR_GC is always failure — "hsbt (Hiroshi SHIBATA) via ruby-core" <ruby-core@...>
Issue #20762 has been reported by hsbt (Hiroshi SHIBATA).
6 messages
2024/09/27
[ruby-core:119016] [Ruby master Bug#20710] Reducing Hash allocation introduces large performance degradation (probably related to VWA)
From:
"byroot (Jean Boussier) via ruby-core" <ruby-core@...>
Date:
2024-09-02 16:53:41 UTC
List:
ruby-core #119016
Issue #20710 has been updated by byroot (Jean Boussier).
I still think free pages should be in a global pool rather than tied to a s=
pecific pool size. I believe that would solve this issue.
And yes we don't see it on macro benchmarks, but it might still cause more =
frequent GC than necessary.
----------------------------------------
Bug #20710: Reducing Hash allocation introduces large performance degradati=
on (probably related to VWA)
https://bugs.ruby-lang.org/issues/20710#change-109591
* Author: pocke (Masataka Kuwabara)
* Status: Open
* ruby -v: ruby 3.3.4 (2024-07-09 revision be1089c8ec) [arm64-darwin21]
* Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
I found a surprising performance degradation while developing RBS.
In short, I tried to remove unnecessary Hash allocations for RBS. Then, it =
made the execution time 2x slower.
VWA for Hash probably causes this degradation. I'd be happy if we could mit=
igate the impact by updating the memory management strategy.
## Reproduce
You can reproduce this problem on a PR in pocke/rbs repository.
https://github.com/pocke/rbs/pull/2
This PR dedups empty Hash objects.
1. `git clone` and checkout
1. `bundle install`
1. `bundle exec rake compile` for C-ext
1. `bundle ruby benchmark/benchmark_new_env.rb`
The "before" commit is https://github.com/pocke/rbs/commit/2c356c060286429c=
fdb034f88a74a6f94420fd21.
The "after" commit is https://github.com/pocke/rbs/commit/bfb2c367c7d3b7f93=
720392252d3a3980d7bf335.
The benchmark results are the following:
```
# Before
$ bundle exec ruby benchmark/benchmark_new_env.rb
(snip)
new_env 6.426 (=B115.6%) i/s - 64.000 in 10.125442s
new_rails_env 0.968 (=B1 0.0%) i/s - 10.000 in 10.355738s
# After
$ bundle exec ruby benchmark/benchmark_new_env.rb
(snip)
new_env 4.371 (=B122.9%) i/s - 43.000 in 10.150192s
new_rails_env 0.360 (=B1 0.0%) i/s - 4.000 in 11.313158s
```
The IPS decreased 1.47x for `new_env` case (parsing small RBS env), and 2.6=
9x for `new_rails_env` (parsing large RBS env).
## Investigation
### GC.stat
`GC.stat` indicates the number of minor GCs increases.
```ruby
# In the RBS repository
require_relative './benchmark/utils'
tmpdir =3D prepare_collection!
new_rails_env(tmpdir)
pp GC.stat
```
```
# before
{:count=3D>126,
:time=3D>541,
:marking_time=3D>496,
:sweeping_time=3D>45,
:heap_allocated_pages=3D>702,
:heap_sorted_length=3D>984,
:heap_allocatable_pages=3D>282,
:heap_available_slots=3D>793270,
:heap_live_slots=3D>787407,
:heap_free_slots=3D>5863,
:heap_final_slots=3D>0,
:heap_marked_slots=3D>757744,
:heap_eden_pages=3D>702,
:heap_tomb_pages=3D>0,
:total_allocated_pages=3D>702,
:total_freed_pages=3D>0,
:total_allocated_objects=3D>2220605,
:total_freed_objects=3D>1433198,
:malloc_increase_bytes=3D>5872,
:malloc_increase_bytes_limit=3D>16777216,
:minor_gc_count=3D>112,
:major_gc_count=3D>14,
:compact_count=3D>0,
:read_barrier_faults=3D>0,
:total_moved_objects=3D>0,
:remembered_wb_unprotected_objects=3D>0,
:remembered_wb_unprotected_objects_limit=3D>4779,
:old_objects=3D>615704,
:old_objects_limit=3D>955872,
:oldmalloc_increase_bytes=3D>210912,
:oldmalloc_increase_bytes_limit=3D>16777216}
# after
{:count=3D>255,
:time=3D>1551,
:marking_time=3D>1496,
:sweeping_time=3D>55,
:heap_allocated_pages=3D>570,
:heap_sorted_length=3D>1038,
:heap_allocatable_pages=3D>468,
:heap_available_slots=3D>735520,
:heap_live_slots=3D>731712,
:heap_free_slots=3D>3808,
:heap_final_slots=3D>0,
:heap_marked_slots=3D>728727,
:heap_eden_pages=3D>570,
:heap_tomb_pages=3D>0,
:total_allocated_pages=3D>570,
:total_freed_pages=3D>0,
:total_allocated_objects=3D>2183278,
:total_freed_objects=3D>1451566,
:malloc_increase_bytes=3D>1200,
:malloc_increase_bytes_limit=3D>16777216,
:minor_gc_count=3D>242,
:major_gc_count=3D>13,
:compact_count=3D>0,
:read_barrier_faults=3D>0,
:total_moved_objects=3D>0,
:remembered_wb_unprotected_objects=3D>0,
:remembered_wb_unprotected_objects_limit=3D>5915,
:old_objects=3D>600594,
:old_objects_limit=3D>1183070,
:oldmalloc_increase_bytes=3D>8128,
:oldmalloc_increase_bytes_limit=3D>16777216}
```
### Warming up Hashes
The following patch, which creates unnecessary Hash objects before the benc=
hmark, improves the execution time.
```diff
diff --git a/benchmark/benchmark_new_env.rb b/benchmark/benchmark_new_env.rb
index 6dd2b73f..a8da61c6 100644
--- a/benchmark/benchmark_new_env.rb
+++ b/benchmark/benchmark_new_env.rb
@@ -4,6 +4,8 @@ require 'benchmark/ips'
=20
tmpdir =3D prepare_collection!
=20
+(0..30_000_000).map { {} }
+
Benchmark.ips do |x|
x.time =3D 10
```
The results are the following:
```
# Before
Calculating -------------------------------------
new_env 10.354 (=B1 9.7%) i/s - 103.000 in 10.013834s
new_rails_env 1.661 (=B1 0.0%) i/s - 17.000 in 10.282490s
# After
Calculating -------------------------------------
new_env 10.771 (=B1 9.3%) i/s - 107.000 in 10.010446s
new_rails_env 1.584 (=B1 0.0%) i/s - 16.000 in 10.178984s
```
### `RUBY_GC_HEAP_FREE_SLOTS_MIN_RATIO`
The `RUBY_GC_HEAP_FREE_SLOTS_MIN_RATIO` env var also mitigates the performa=
nce impact.
In this example, I set `RUBY_GC_HEAP_FREE_SLOTS_MIN_RATIO=3D0.6` (default: =
0.20)
```console
# Before
Calculating -------------------------------------
new_env 10.271 (=B1 9.7%) i/s - 102.000 in 10.087191s
new_rails_env 1.529 (=B1 0.0%) i/s - 16.000 in 10.538043s
# After
$ env RUBY_GC_HEAP_FREE_SLOTS_MIN_RATIO=3D0.6 bundle exec ruby benchmark/be=
nchmark_new_env.rb
Calculating -------------------------------------
new_env 11.003 (=B1 9.1%) i/s - 110.000 in 10.068428s
new_rails_env 1.347 (=B1 0.0%) i/s - 14.000 in 11.117665s
```
## Additional Information
* I applied the same change to Array. But it does not cause this problem.
* I guess the cause is the difference of the Size Pool. An empty Array us=
es 40 bytes like the ordinal Ruby object, but an empty Hash uses 160 bytes.
* The Size Pool for 160 bytes objects has fewer objects than the 40 bytes=
one. So, reducing allocation affects the performance sensitively.
* I tried it on Ruby 3.2. This change on Ruby 3.2 does not degrade the exec=
ution time.
* VWA for Hash is introduced since Ruby 3.3. https://github.com/ruby/ruby=
/blob/73c39a5f93d3ad4514a06158e2bb7622496372b9/doc/NEWS/NEWS-3.3.0.md#gc--m=
emory-management
## Acknowledgement
@mame, @ko1, and @soutaro helped the investigation. I would like to thank t=
hem.=20
--=20
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.rub=
y-lang.org/