From: samuel@... Date: 2019-07-17T01:57:06+00:00 Subject: [ruby-core:93818] [Ruby master Feature#15997] Improve performance of fiber creation by using pool allocation strategy. Issue #15997 has been updated by ioquatix (Samuel Williams). There is some kind of performance regression in 2.6.3 -> 2.7.0-master. So, I'm trying with 2.7.0-preview1 to see if it's better or worse. ``` Server Software: Server Hostname: localhost Server Port: 9294 Document Path: /small Document Length: 1200 bytes Concurrency Level: 256 Time taken for tests: 17.464 seconds Complete requests: 100000 Failed requests: 0 Total transferred: 126000000 bytes HTML transferred: 120000000 bytes Requests per second: 5726.11 [#/sec] (mean) Time per request: 44.708 [ms] (mean) Time per request: 0.175 [ms] (mean, across all concurrent requests) Transfer rate: 7045.80 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 20 137.8 1 1029 Processing: 4 24 7.8 21 428 Waiting: 0 10 8.5 9 420 Total: 4 44 138.5 23 1452 Percentage of the requests served within a certain time (ms) 50% 23 66% 24 75% 28 80% 30 90% 34 95% 36 98% 45 99% 1032 100% 1452 (longest request) ``` 2.7.0-preview1 is much worse, relatively speaking. ---------------------------------------- Feature #15997: Improve performance of fiber creation by using pool allocation strategy. https://bugs.ruby-lang.org/issues/15997#change-79685 * Author: ioquatix (Samuel Williams) * Status: Open * Priority: Normal * Assignee: ko1 (Koichi Sasada) * Target version: ---------------------------------------- https://github.com/ruby/ruby/pull/2224 This PR improves the performance of fiber allocation and reuse by implementing a better stack cache. The fiber pool manages a singly linked list of fiber pool allocations. The fiber pool allocation contains 1 or more stack (typically more, e.g. 512). It uses N^2 allocation strategy, starting at 8 initial stacks, next is 8, 16, 32, etc. ``` // // base = +-------------------------------+-----------------------+ + // |VM Stack |VM Stack | | | // | | | | | // | | | | | // +-------------------------------+ | | // |Machine Stack |Machine Stack | | | // | | | | | // | | | | | // | | | . . . . | | size // | | | | | // | | | | | // | | | | | // | | | | | // | | | | | // +-------------------------------+ | | // |Guard Page |Guard Page | | | // +-------------------------------+-----------------------+ v // // +-------------------------------------------------------> // // count // ``` The performance improvement depends on usage: ``` Calculating ------------------------------------- compare-ruby built-ruby vm2_fiber_allocate 132.900k 180.852k i/s - 100.000k times in 0.752447s 0.552939s vm2_fiber_count 5.317k 110.724k i/s - 100.000k times in 18.806479s 0.903145s vm2_fiber_reuse 160.128 347.663 i/s - 200.000 times in 1.249003s 0.575269s vm2_fiber_switch 13.429M 13.490M i/s - 20.000M times in 1.489303s 1.482549s Comparison: vm2_fiber_allocate built-ruby: 180851.6 i/s compare-ruby: 132899.7 i/s - 1.36x slower vm2_fiber_count built-ruby: 110724.3 i/s compare-ruby: 5317.3 i/s - 20.82x slower vm2_fiber_reuse built-ruby: 347.7 i/s compare-ruby: 160.1 i/s - 2.17x slower vm2_fiber_switch built-ruby: 13490282.4 i/s compare-ruby: 13429100.0 i/s - 1.00x slower ``` This test is run on Linux server with 64GB memory and 4-core Xeon (Intel Xeon CPU E3-1240 v6 @ 3.70GHz). "compare-ruby" is `master`, and "built-ruby" is `master+fiber-pool`. Additionally, we conservatively use `madvise(free)` to avoid swap space usage for unused fiber stacks. However, if you remove this requirement, we can get 6x - 10x performance improvement in `vm2_fiber_reuse` benchmark. There are some options to deal with this (e.g. moving it to `GC.compact`) but as this is still a net win, I'd like to merge this PR as is. ---Files-------------------------------- Screen Shot 2019-07-16 at 8.30.59 PM.png (138 KB) -- https://bugs.ruby-lang.org/ Unsubscribe: