[ruby-core:93802] [Ruby master Feature#15997] Improve performance of fiber creation by using pool allocation strategy.

From: samuel@...
Date: 2019-07-16 02:44:58 UTC
List: ruby-core #93802
Issue #15997 has been updated by ioquatix (Samuel Williams).


Here is comparison on Linux:

```
/home/samuel/.rvm/rubies/ruby-2.6.3/bin/ruby --disable=gems -rrubygems -I../benchmark/lib ../benchmark/benchmark-driver/exe/benchmark-driver \
            --executables="compare-ruby::/home/samuel/.rvm/rubies/ruby-2.6.3/bin/ruby --disable=gems -I.ext/common --disable-gem" \
            --executables="built-ruby::./miniruby -I../lib -I. -I.ext/common  ../tool/runruby.rb --extout=.ext  -- --disable-gems --disable-gem" \
            $(find ../benchmark -maxdepth 1 -name '*vm2_fiber*.yml' -o -name '*vm2_fiber*.rb' | sort) 
Calculating -------------------------------------
                     compare-ruby  built-ruby 
  vm2_fiber_allocate     123.108k    171.839k i/s -    100.000k times in 0.812295s 0.581938s
     vm2_fiber_count       2.548k     82.950k i/s -    100.000k times in 39.248735s 1.205547s
     vm2_fiber_reuse      158.703     953.842 i/s -     200.000 times in 1.260218s 0.209678s
    vm2_fiber_switch      10.127M     13.016M i/s -     20.000M times in 1.974979s 1.536628s

Comparison:
               vm2_fiber_allocate
          built-ruby:    171839.5 i/s 
        compare-ruby:    123108.0 i/s - 1.40x  slower

                  vm2_fiber_count
          built-ruby:     82949.9 i/s 
        compare-ruby:      2547.9 i/s - 32.56x  slower

                  vm2_fiber_reuse
          built-ruby:       953.8 i/s 
        compare-ruby:       158.7 i/s - 6.01x  slower

                 vm2_fiber_switch
          built-ruby:  13015509.6 i/s 
        compare-ruby:  10126692.4 i/s - 1.29x  slower
```

With `#define FIBER_POOL_ALLOCATION_FREE`:

```
Calculating -------------------------------------
                     compare-ruby  built-ruby 
  vm2_fiber_allocate     123.144k    170.006k i/s -    100.000k times in 0.812060s 0.588216s
     vm2_fiber_count       2.528k     76.265k i/s -    100.000k times in 39.560078s 1.311221s
     vm2_fiber_reuse      149.002     446.903 i/s -     200.000 times in 1.342268s 0.447525s
    vm2_fiber_switch      10.112M     13.104M i/s -     20.000M times in 1.977840s 1.526270s

Comparison:
               vm2_fiber_allocate
          built-ruby:    170005.5 i/s 
        compare-ruby:    123143.6 i/s - 1.38x  slower

                  vm2_fiber_count
          built-ruby:     76264.8 i/s 
        compare-ruby:      2527.8 i/s - 30.17x  slower

                  vm2_fiber_reuse
          built-ruby:       446.9 i/s 
        compare-ruby:       149.0 i/s - 3.00x  slower

                 vm2_fiber_switch
          built-ruby:  13103837.6 i/s 
        compare-ruby:  10112039.4 i/s - 1.30x  slower
```

One unexpected benefit of this, is that due to better/simpler stack management, `vm2_fiber_switch` became 30% faster too.

----------------------------------------
Feature #15997: Improve performance of fiber creation by using pool allocation strategy.
https://bugs.ruby-lang.org/issues/15997#change-79668

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: ko1 (Koichi Sasada)
* Target version: 
----------------------------------------
https://github.com/ruby/ruby/pull/2224

This PR improves the performance of fiber allocation and reuse by implementing a better stack cache.

The fiber pool manages a singly linked list of fiber pool allocations. The fiber pool allocation contains 1 or more stack (typically more, e.g. 512). It uses N^2 allocation strategy, starting at 8 initial stacks, next is 8, 16, 32, etc.

```
//
// base = +-------------------------------+-----------------------+  +
//        |VM Stack       |VM Stack       |                       |  |
//        |               |               |                       |  |
//        |               |               |                       |  |
//        +-------------------------------+                       |  |
//        |Machine Stack  |Machine Stack  |                       |  |
//        |               |               |                       |  |
//        |               |               |                       |  |
//        |               |               | .  .  .  .            |  |  size
//        |               |               |                       |  |
//        |               |               |                       |  |
//        |               |               |                       |  |
//        |               |               |                       |  |
//        |               |               |                       |  |
//        +-------------------------------+                       |  |
//        |Guard Page     |Guard Page     |                       |  |
//        +-------------------------------+-----------------------+  v
//
//        +------------------------------------------------------->
//
//                                  count
//
```

The performance improvement depends on usage:

```
Calculating -------------------------------------
                     compare-ruby  built-ruby 
  vm2_fiber_allocate     132.900k    180.852k i/s -    100.000k times in 0.752447s 0.552939s
     vm2_fiber_count       5.317k    110.724k i/s -    100.000k times in 18.806479s 0.903145s
     vm2_fiber_reuse      160.128     347.663 i/s -     200.000 times in 1.249003s 0.575269s
    vm2_fiber_switch      13.429M     13.490M i/s -     20.000M times in 1.489303s 1.482549s

Comparison:
               vm2_fiber_allocate
          built-ruby:    180851.6 i/s 
        compare-ruby:    132899.7 i/s - 1.36x  slower

                  vm2_fiber_count
          built-ruby:    110724.3 i/s 
        compare-ruby:      5317.3 i/s - 20.82x  slower

                  vm2_fiber_reuse
          built-ruby:       347.7 i/s 
        compare-ruby:       160.1 i/s - 2.17x  slower

                 vm2_fiber_switch
          built-ruby:  13490282.4 i/s 
        compare-ruby:  13429100.0 i/s - 1.00x  slower
```

This test is run on Linux server with 64GB memory and 4-core Xeon (Intel Xeon CPU E3-1240 v6 @ 3.70GHz). "compare-ruby" is `master`, and "built-ruby" is `master+fiber-pool`.

Additionally, we conservatively use `madvise(free)` to avoid swap space usage for unused fiber stacks. However, if you remove this requirement, we can get 6x - 10x performance improvement in `vm2_fiber_reuse` benchmark. There are some options to deal with this (e.g. moving it to `GC.compact`) but as this is still a net win, I'd like to merge this PR as is.





-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

In This Thread

Prev Next