[ruby-core:93807] [Ruby master Feature#15997] Improve performance of fiber creation by using pool allocation strategy.
From:
samuel@...
Date:
2019-07-16 04:48:07 UTC
List:
ruby-core #93807
Issue #15997 has been updated by ioquatix (Samuel Williams).
On Darwin, comparing Fiber-pool with master:
```
> make benchmark COMPARE_RUBY="../../ruby/build/ruby --disable-gems" ITEM=vm2_fiber RUBY_SHARED_FIBER_POOL_FREE_STACKS=0
Calculating -------------------------------------
master fiber-pool
vm2_fiber_allocate 99.329k 124.488k i/s - 100.000k times in 1.006759s 0.803293s
vm2_fiber_count 3.621k 82.447k i/s - 100.000k times in 27.620062s 1.212895s
vm2_fiber_reuse 55.039 615.402 i/s - 200.000 times in 3.633812s 0.324991s
vm2_fiber_switch 8.803M 8.591M i/s - 20.000M times in 2.272063s 2.328041s
Comparison:
vm2_fiber_allocate
built-ruby: 124487.6 i/s
compare-ruby: 99328.6 i/s - 1.25x slower
vm2_fiber_count
built-ruby: 82447.4 i/s
compare-ruby: 3620.6 i/s - 22.77x slower
vm2_fiber_reuse
built-ruby: 615.4 i/s
compare-ruby: 55.0 i/s - 11.18x slower
vm2_fiber_switch
compare-ruby: 8802572.8 i/s
built-ruby: 8590914.0 i/s - 1.02x slower
> make benchmark COMPARE_RUBY="../../ruby/build/ruby --disable-gems" ITEM=vm2_fiber RUBY_SHARED_FIBER_POOL_FREE_STACKS=1
Calculating -------------------------------------
master fiber-pool
vm2_fiber_allocate 96.834k 121.823k i/s - 100.000k times in 1.032698s 0.820865s
vm2_fiber_count 3.027k 80.419k i/s - 100.000k times in 33.035732s 1.243489s
vm2_fiber_reuse 56.275 449.230 i/s - 200.000 times in 3.553979s 0.445206s
vm2_fiber_switch 8.640M 8.255M i/s - 20.000M times in 2.314890s 2.422917s
Comparison:
vm2_fiber_allocate
built-ruby: 121822.7 i/s
compare-ruby: 96833.7 i/s - 1.26x slower
vm2_fiber_count
built-ruby: 80418.9 i/s
compare-ruby: 3027.0 i/s - 26.57x slower
vm2_fiber_reuse
built-ruby: 449.2 i/s
compare-ruby: 56.3 i/s - 7.98x slower
vm2_fiber_switch
compare-ruby: 8639719.4 i/s
built-ruby: 8254513.1 i/s - 1.05x slower
```
----------------------------------------
Feature #15997: Improve performance of fiber creation by using pool allocation strategy.
https://bugs.ruby-lang.org/issues/15997#change-79676
* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: ko1 (Koichi Sasada)
* Target version:
----------------------------------------
https://github.com/ruby/ruby/pull/2224
This PR improves the performance of fiber allocation and reuse by implementing a better stack cache.
The fiber pool manages a singly linked list of fiber pool allocations. The fiber pool allocation contains 1 or more stack (typically more, e.g. 512). It uses N^2 allocation strategy, starting at 8 initial stacks, next is 8, 16, 32, etc.
```
//
// base = +-------------------------------+-----------------------+ +
// |VM Stack |VM Stack | | |
// | | | | |
// | | | | |
// +-------------------------------+ | |
// |Machine Stack |Machine Stack | | |
// | | | | |
// | | | | |
// | | | . . . . | | size
// | | | | |
// | | | | |
// | | | | |
// | | | | |
// | | | | |
// +-------------------------------+ | |
// |Guard Page |Guard Page | | |
// +-------------------------------+-----------------------+ v
//
// +------------------------------------------------------->
//
// count
//
```
The performance improvement depends on usage:
```
Calculating -------------------------------------
compare-ruby built-ruby
vm2_fiber_allocate 132.900k 180.852k i/s - 100.000k times in 0.752447s 0.552939s
vm2_fiber_count 5.317k 110.724k i/s - 100.000k times in 18.806479s 0.903145s
vm2_fiber_reuse 160.128 347.663 i/s - 200.000 times in 1.249003s 0.575269s
vm2_fiber_switch 13.429M 13.490M i/s - 20.000M times in 1.489303s 1.482549s
Comparison:
vm2_fiber_allocate
built-ruby: 180851.6 i/s
compare-ruby: 132899.7 i/s - 1.36x slower
vm2_fiber_count
built-ruby: 110724.3 i/s
compare-ruby: 5317.3 i/s - 20.82x slower
vm2_fiber_reuse
built-ruby: 347.7 i/s
compare-ruby: 160.1 i/s - 2.17x slower
vm2_fiber_switch
built-ruby: 13490282.4 i/s
compare-ruby: 13429100.0 i/s - 1.00x slower
```
This test is run on Linux server with 64GB memory and 4-core Xeon (Intel Xeon CPU E3-1240 v6 @ 3.70GHz). "compare-ruby" is `master`, and "built-ruby" is `master+fiber-pool`.
Additionally, we conservatively use `madvise(free)` to avoid swap space usage for unused fiber stacks. However, if you remove this requirement, we can get 6x - 10x performance improvement in `vm2_fiber_reuse` benchmark. There are some options to deal with this (e.g. moving it to `GC.compact`) but as this is still a net win, I'd like to merge this PR as is.
--
https://bugs.ruby-lang.org/
Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>