[ruby-core:93698] [Ruby master Feature#15997] Improve performance of fiber creation by using pool allocation strategy.

From: ko1@...
Date: 2019-07-12 01:54:02 UTC
List: ruby-core #93698
Issue #15997 has been updated by ko1 (Koichi Sasada).


ioquatix (Samuel Williams) wrote:
> @ko1 asked:
> 
> > (1) stack size assumption
> 
> The fiber pool stack size is (guard page + vm_stack_size + fiber_machine_stack_size).

which size (xx KB, etc)?

> > (2) maximum allocatable size
> 
> On 64-bit platform it's effectively the same, although in some situations it can be better due to reduced number of `mmap`s required.
> 
> On 32-bit platform, it's slightly worse, because I didn't bother implementing fallback on `mmap` failure. In current implementation, worst case difference is 128 fiber stacks. That being said, if you are allocating fibers up to the limit of the 32-bit address space you will quickly run into other issues, so I don't consider this a bug, it's just natural limit of 32-bit address space.

I know you got measurements. please share us.

> > (3) GC.enable/disable usage (edited) 
> 
> - `vm2_fiber_count` is running with normal GC, but due to using alloca on fiber pool stack, GC pressure/count is significantly reduced. It is not expected to represent expected improvement of real world code, but shows that fiber pool code in isolation avoids GC overheads.

In general, we should tell this memory usage to GC with `rb_gc_adjust_memory_usage()`. I don't think it is needed in this case.


----------------------------------------
Feature #15997: Improve performance of fiber creation by using pool allocation strategy.
https://bugs.ruby-lang.org/issues/15997#change-79316

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: ko1 (Koichi Sasada)
* Target version: 
----------------------------------------
https://github.com/ruby/ruby/pull/2224

This PR improves the performance of fiber allocation and reuse by implementing a better stack cache. As per @ko1's request, we also increased fiber stack size to be the same as thread stack size.

The fiber pool manages a singly linked list of fiber pool allocations. The fiber pool allocation contains 1 or more stack (typically more, e.g. 512). It uses N^2 allocation strategy.

```
//
// base = +-------------------------------+-----------------------+  +
//        |VM Stack       |VM Stack       |                       |  |
//        |               |               |                       |  |
//        |               |               |                       |  |
//        +-------------------------------+                       |  |
//        |Machine Stack  |Machine Stack  |                       |  |
//        |               |               |                       |  |
//        |               |               |                       |  |
//        |               |               | .  .  .  .            |  |  size
//        |               |               |                       |  |
//        |               |               |                       |  |
//        |               |               |                       |  |
//        |               |               |                       |  |
//        |               |               |                       |  |
//        +-------------------------------+                       |  |
//        |Guard Page     |Guard Page     |                       |  |
//        +-------------------------------+-----------------------+  v
//
//        +------------------------------------------------------->
//
//                                  count
//
```

The performance improvement depends on usage:

```
Calculating -------------------------------------
                     compare-ruby  built-ruby 
  vm2_fiber_allocate     132.900k    180.852k i/s -    100.000k times in 0.752447s 0.552939s
     vm2_fiber_count       5.317k    110.724k i/s -    100.000k times in 18.806479s 0.903145s
     vm2_fiber_reuse      160.128     347.663 i/s -     200.000 times in 1.249003s 0.575269s
    vm2_fiber_switch      13.429M     13.490M i/s -     20.000M times in 1.489303s 1.482549s

Comparison:
               vm2_fiber_allocate
          built-ruby:    180851.6 i/s 
        compare-ruby:    132899.7 i/s - 1.36x  slower

                  vm2_fiber_count
          built-ruby:    110724.3 i/s 
        compare-ruby:      5317.3 i/s - 20.82x  slower

                  vm2_fiber_reuse
          built-ruby:       347.7 i/s 
        compare-ruby:       160.1 i/s - 2.17x  slower

                 vm2_fiber_switch
          built-ruby:  13490282.4 i/s 
        compare-ruby:  13429100.0 i/s - 1.00x  slower
```

Additionally, we conservatively use `madvise(free)` to avoid swap space usage for unused fiber stacks. However, if you remove this requirement, we can get 6x - 10x performance improvement in `vm2_fiber_reuse` benchmark. There are some options to deal with this (e.g. moving it to `GC.compact`) but as this is still a net win, I'd like to merge this PR as is.





-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

In This Thread

Prev Next