From: ko1@... Date: 2019-07-12T01:54:02+00:00 Subject: [ruby-core:93698] [Ruby master Feature#15997] Improve performance of fiber creation by using pool allocation strategy. Issue #15997 has been updated by ko1 (Koichi Sasada). ioquatix (Samuel Williams) wrote: > @ko1 asked: > > > (1) stack size assumption > > The fiber pool stack size is (guard page + vm_stack_size + fiber_machine_stack_size). which size (xx KB, etc)? > > (2) maximum allocatable size > > On 64-bit platform it's effectively the same, although in some situations it can be better due to reduced number of `mmap`s required. > > On 32-bit platform, it's slightly worse, because I didn't bother implementing fallback on `mmap` failure. In current implementation, worst case difference is 128 fiber stacks. That being said, if you are allocating fibers up to the limit of the 32-bit address space you will quickly run into other issues, so I don't consider this a bug, it's just natural limit of 32-bit address space. I know you got measurements. please share us. > > (3) GC.enable/disable usage (edited) > > - `vm2_fiber_count` is running with normal GC, but due to using alloca on fiber pool stack, GC pressure/count is significantly reduced. It is not expected to represent expected improvement of real world code, but shows that fiber pool code in isolation avoids GC overheads. In general, we should tell this memory usage to GC with `rb_gc_adjust_memory_usage()`. I don't think it is needed in this case. ---------------------------------------- Feature #15997: Improve performance of fiber creation by using pool allocation strategy. https://bugs.ruby-lang.org/issues/15997#change-79316 * Author: ioquatix (Samuel Williams) * Status: Open * Priority: Normal * Assignee: ko1 (Koichi Sasada) * Target version: ---------------------------------------- https://github.com/ruby/ruby/pull/2224 This PR improves the performance of fiber allocation and reuse by implementing a better stack cache. As per @ko1's request, we also increased fiber stack size to be the same as thread stack size. The fiber pool manages a singly linked list of fiber pool allocations. The fiber pool allocation contains 1 or more stack (typically more, e.g. 512). It uses N^2 allocation strategy. ``` // // base = +-------------------------------+-----------------------+ + // |VM Stack |VM Stack | | | // | | | | | // | | | | | // +-------------------------------+ | | // |Machine Stack |Machine Stack | | | // | | | | | // | | | | | // | | | . . . . | | size // | | | | | // | | | | | // | | | | | // | | | | | // | | | | | // +-------------------------------+ | | // |Guard Page |Guard Page | | | // +-------------------------------+-----------------------+ v // // +-------------------------------------------------------> // // count // ``` The performance improvement depends on usage: ``` Calculating ------------------------------------- compare-ruby built-ruby vm2_fiber_allocate 132.900k 180.852k i/s - 100.000k times in 0.752447s 0.552939s vm2_fiber_count 5.317k 110.724k i/s - 100.000k times in 18.806479s 0.903145s vm2_fiber_reuse 160.128 347.663 i/s - 200.000 times in 1.249003s 0.575269s vm2_fiber_switch 13.429M 13.490M i/s - 20.000M times in 1.489303s 1.482549s Comparison: vm2_fiber_allocate built-ruby: 180851.6 i/s compare-ruby: 132899.7 i/s - 1.36x slower vm2_fiber_count built-ruby: 110724.3 i/s compare-ruby: 5317.3 i/s - 20.82x slower vm2_fiber_reuse built-ruby: 347.7 i/s compare-ruby: 160.1 i/s - 2.17x slower vm2_fiber_switch built-ruby: 13490282.4 i/s compare-ruby: 13429100.0 i/s - 1.00x slower ``` Additionally, we conservatively use `madvise(free)` to avoid swap space usage for unused fiber stacks. However, if you remove this requirement, we can get 6x - 10x performance improvement in `vm2_fiber_reuse` benchmark. There are some options to deal with this (e.g. moving it to `GC.compact`) but as this is still a net win, I'd like to merge this PR as is. -- https://bugs.ruby-lang.org/ Unsubscribe: