From: samuel@... Date: 2019-07-17T01:16:36+00:00 Subject: [ruby-core:93817] [Ruby master Feature#15997] Improve performance of fiber creation by using pool allocation strategy. Issue #15997 has been updated by ioquatix (Samuel Williams). Here is some testing using falcon and `ab`. `ab` is HTTP/1.0 client test. Because of that, each connection/request makes new fiber, so it's going to show if there are improvements/regressions to performance. ``` Server Software: 2.7.0-fiber-pool FREE_STACKS=0 Server Hostname: localhost Server Port: 9292 Document Path: /small Document Length: 1200 bytes Concurrency Level: 256 Time taken for tests: 14.174 seconds Complete requests: 100000 Failed requests: 0 Total transferred: 126000000 bytes HTML transferred: 120000000 bytes Requests per second: 7055.11 [#/sec] (mean) Time per request: 36.286 [ms] (mean) Time per request: 0.142 [ms] (mean, across all concurrent requests) Transfer rate: 8681.10 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 17 122.8 2 3038 Processing: 4 19 5.7 18 231 Waiting: 0 8 6.6 7 225 Total: 10 36 123.1 19 3056 Percentage of the requests served within a certain time (ms) 50% 19 66% 21 75% 23 80% 24 90% 27 95% 28 98% 31 99% 1022 100% 3056 (longest request) Server Software: 2.7.0-fiber-pool FREE_STACKS=1 Server Hostname: localhost Server Port: 9292 Document Path: /small Document Length: 1200 bytes Concurrency Level: 256 Time taken for tests: 14.676 seconds Complete requests: 100000 Failed requests: 0 Total transferred: 126000000 bytes HTML transferred: 120000000 bytes Requests per second: 6813.71 [#/sec] (mean) Time per request: 37.571 [ms] (mean) Time per request: 0.147 [ms] (mean, across all concurrent requests) Transfer rate: 8384.06 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 17 124.6 1 1030 Processing: 4 20 9.3 18 416 Waiting: 0 8 10.0 7 412 Total: 7 37 126.9 20 1437 Percentage of the requests served within a certain time (ms) 50% 20 66% 22 75% 23 80% 24 90% 27 95% 29 98% 35 99% 1027 100% 1437 (longest request) Server Software: 2.7.0-master Server Hostname: localhost Server Port: 9293 Document Path: /small Document Length: 1200 bytes Concurrency Level: 256 Time taken for tests: 16.170 seconds Complete requests: 100000 Failed requests: 0 Total transferred: 126000000 bytes HTML transferred: 120000000 bytes Requests per second: 6184.15 [#/sec] (mean) Time per request: 41.396 [ms] (mean) Time per request: 0.162 [ms] (mean, across all concurrent requests) Transfer rate: 7609.41 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 19 133.4 1 3223 Processing: 4 22 7.4 21 432 Waiting: 0 9 8.3 8 422 Total: 5 41 134.3 22 3246 Percentage of the requests served within a certain time (ms) 50% 22 66% 23 75% 25 80% 27 90% 31 95% 33 98% 39 99% 1029 100% 3246 (longest request) Server Software: 2.6.3 Server Hostname: localhost Server Port: 9294 Document Path: /small Document Length: 1200 bytes Concurrency Level: 256 Time taken for tests: 15.600 seconds Complete requests: 100000 Failed requests: 0 Total transferred: 126000000 bytes HTML transferred: 120000000 bytes Requests per second: 6410.16 [#/sec] (mean) Time per request: 39.937 [ms] (mean) Time per request: 0.156 [ms] (mean, across all concurrent requests) Transfer rate: 7887.51 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 18 130.2 1 3132 Processing: 4 21 8.4 20 432 Waiting: 0 9 9.2 8 428 Total: 9 39 131.6 21 3143 Percentage of the requests served within a certain time (ms) 50% 21 66% 22 75% 23 80% 25 90% 31 95% 33 98% 34 99% 1029 100% 3143 (longest request) ``` ---------------------------------------- Feature #15997: Improve performance of fiber creation by using pool allocation strategy. https://bugs.ruby-lang.org/issues/15997#change-79684 * Author: ioquatix (Samuel Williams) * Status: Open * Priority: Normal * Assignee: ko1 (Koichi Sasada) * Target version: ---------------------------------------- https://github.com/ruby/ruby/pull/2224 This PR improves the performance of fiber allocation and reuse by implementing a better stack cache. The fiber pool manages a singly linked list of fiber pool allocations. The fiber pool allocation contains 1 or more stack (typically more, e.g. 512). It uses N^2 allocation strategy, starting at 8 initial stacks, next is 8, 16, 32, etc. ``` // // base = +-------------------------------+-----------------------+ + // |VM Stack |VM Stack | | | // | | | | | // | | | | | // +-------------------------------+ | | // |Machine Stack |Machine Stack | | | // | | | | | // | | | | | // | | | . . . . | | size // | | | | | // | | | | | // | | | | | // | | | | | // | | | | | // +-------------------------------+ | | // |Guard Page |Guard Page | | | // +-------------------------------+-----------------------+ v // // +-------------------------------------------------------> // // count // ``` The performance improvement depends on usage: ``` Calculating ------------------------------------- compare-ruby built-ruby vm2_fiber_allocate 132.900k 180.852k i/s - 100.000k times in 0.752447s 0.552939s vm2_fiber_count 5.317k 110.724k i/s - 100.000k times in 18.806479s 0.903145s vm2_fiber_reuse 160.128 347.663 i/s - 200.000 times in 1.249003s 0.575269s vm2_fiber_switch 13.429M 13.490M i/s - 20.000M times in 1.489303s 1.482549s Comparison: vm2_fiber_allocate built-ruby: 180851.6 i/s compare-ruby: 132899.7 i/s - 1.36x slower vm2_fiber_count built-ruby: 110724.3 i/s compare-ruby: 5317.3 i/s - 20.82x slower vm2_fiber_reuse built-ruby: 347.7 i/s compare-ruby: 160.1 i/s - 2.17x slower vm2_fiber_switch built-ruby: 13490282.4 i/s compare-ruby: 13429100.0 i/s - 1.00x slower ``` This test is run on Linux server with 64GB memory and 4-core Xeon (Intel Xeon CPU E3-1240 v6 @ 3.70GHz). "compare-ruby" is `master`, and "built-ruby" is `master+fiber-pool`. Additionally, we conservatively use `madvise(free)` to avoid swap space usage for unused fiber stacks. However, if you remove this requirement, we can get 6x - 10x performance improvement in `vm2_fiber_reuse` benchmark. There are some options to deal with this (e.g. moving it to `GC.compact`) but as this is still a net win, I'd like to merge this PR as is. ---Files-------------------------------- Screen Shot 2019-07-16 at 8.30.59 PM.png (138 KB) -- https://bugs.ruby-lang.org/ Unsubscribe: