[ruby-core:93817] [Ruby master Feature#15997] Improve performance of fiber creation by using pool allocation strategy.
From:
samuel@...
Date:
2019-07-17 01:16:36 UTC
List:
ruby-core #93817
Issue #15997 has been updated by ioquatix (Samuel Williams).
Here is some testing using falcon and `ab`. `ab` is HTTP/1.0 client test. Because of that, each connection/request makes new fiber, so it's going to show if there are improvements/regressions to performance.
```
Server Software: 2.7.0-fiber-pool FREE_STACKS=0
Server Hostname: localhost
Server Port: 9292
Document Path: /small
Document Length: 1200 bytes
Concurrency Level: 256
Time taken for tests: 14.174 seconds
Complete requests: 100000
Failed requests: 0
Total transferred: 126000000 bytes
HTML transferred: 120000000 bytes
Requests per second: 7055.11 [#/sec] (mean)
Time per request: 36.286 [ms] (mean)
Time per request: 0.142 [ms] (mean, across all concurrent requests)
Transfer rate: 8681.10 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 17 122.8 2 3038
Processing: 4 19 5.7 18 231
Waiting: 0 8 6.6 7 225
Total: 10 36 123.1 19 3056
Percentage of the requests served within a certain time (ms)
50% 19
66% 21
75% 23
80% 24
90% 27
95% 28
98% 31
99% 1022
100% 3056 (longest request)
Server Software: 2.7.0-fiber-pool FREE_STACKS=1
Server Hostname: localhost
Server Port: 9292
Document Path: /small
Document Length: 1200 bytes
Concurrency Level: 256
Time taken for tests: 14.676 seconds
Complete requests: 100000
Failed requests: 0
Total transferred: 126000000 bytes
HTML transferred: 120000000 bytes
Requests per second: 6813.71 [#/sec] (mean)
Time per request: 37.571 [ms] (mean)
Time per request: 0.147 [ms] (mean, across all concurrent requests)
Transfer rate: 8384.06 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 17 124.6 1 1030
Processing: 4 20 9.3 18 416
Waiting: 0 8 10.0 7 412
Total: 7 37 126.9 20 1437
Percentage of the requests served within a certain time (ms)
50% 20
66% 22
75% 23
80% 24
90% 27
95% 29
98% 35
99% 1027
100% 1437 (longest request)
Server Software: 2.7.0-master
Server Hostname: localhost
Server Port: 9293
Document Path: /small
Document Length: 1200 bytes
Concurrency Level: 256
Time taken for tests: 16.170 seconds
Complete requests: 100000
Failed requests: 0
Total transferred: 126000000 bytes
HTML transferred: 120000000 bytes
Requests per second: 6184.15 [#/sec] (mean)
Time per request: 41.396 [ms] (mean)
Time per request: 0.162 [ms] (mean, across all concurrent requests)
Transfer rate: 7609.41 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 19 133.4 1 3223
Processing: 4 22 7.4 21 432
Waiting: 0 9 8.3 8 422
Total: 5 41 134.3 22 3246
Percentage of the requests served within a certain time (ms)
50% 22
66% 23
75% 25
80% 27
90% 31
95% 33
98% 39
99% 1029
100% 3246 (longest request)
Server Software: 2.6.3
Server Hostname: localhost
Server Port: 9294
Document Path: /small
Document Length: 1200 bytes
Concurrency Level: 256
Time taken for tests: 15.600 seconds
Complete requests: 100000
Failed requests: 0
Total transferred: 126000000 bytes
HTML transferred: 120000000 bytes
Requests per second: 6410.16 [#/sec] (mean)
Time per request: 39.937 [ms] (mean)
Time per request: 0.156 [ms] (mean, across all concurrent requests)
Transfer rate: 7887.51 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 18 130.2 1 3132
Processing: 4 21 8.4 20 432
Waiting: 0 9 9.2 8 428
Total: 9 39 131.6 21 3143
Percentage of the requests served within a certain time (ms)
50% 21
66% 22
75% 23
80% 25
90% 31
95% 33
98% 34
99% 1029
100% 3143 (longest request)
```
----------------------------------------
Feature #15997: Improve performance of fiber creation by using pool allocation strategy.
https://bugs.ruby-lang.org/issues/15997#change-79684
* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: ko1 (Koichi Sasada)
* Target version:
----------------------------------------
https://github.com/ruby/ruby/pull/2224
This PR improves the performance of fiber allocation and reuse by implementing a better stack cache.
The fiber pool manages a singly linked list of fiber pool allocations. The fiber pool allocation contains 1 or more stack (typically more, e.g. 512). It uses N^2 allocation strategy, starting at 8 initial stacks, next is 8, 16, 32, etc.
```
//
// base = +-------------------------------+-----------------------+ +
// |VM Stack |VM Stack | | |
// | | | | |
// | | | | |
// +-------------------------------+ | |
// |Machine Stack |Machine Stack | | |
// | | | | |
// | | | | |
// | | | . . . . | | size
// | | | | |
// | | | | |
// | | | | |
// | | | | |
// | | | | |
// +-------------------------------+ | |
// |Guard Page |Guard Page | | |
// +-------------------------------+-----------------------+ v
//
// +------------------------------------------------------->
//
// count
//
```
The performance improvement depends on usage:
```
Calculating -------------------------------------
compare-ruby built-ruby
vm2_fiber_allocate 132.900k 180.852k i/s - 100.000k times in 0.752447s 0.552939s
vm2_fiber_count 5.317k 110.724k i/s - 100.000k times in 18.806479s 0.903145s
vm2_fiber_reuse 160.128 347.663 i/s - 200.000 times in 1.249003s 0.575269s
vm2_fiber_switch 13.429M 13.490M i/s - 20.000M times in 1.489303s 1.482549s
Comparison:
vm2_fiber_allocate
built-ruby: 180851.6 i/s
compare-ruby: 132899.7 i/s - 1.36x slower
vm2_fiber_count
built-ruby: 110724.3 i/s
compare-ruby: 5317.3 i/s - 20.82x slower
vm2_fiber_reuse
built-ruby: 347.7 i/s
compare-ruby: 160.1 i/s - 2.17x slower
vm2_fiber_switch
built-ruby: 13490282.4 i/s
compare-ruby: 13429100.0 i/s - 1.00x slower
```
This test is run on Linux server with 64GB memory and 4-core Xeon (Intel Xeon CPU E3-1240 v6 @ 3.70GHz). "compare-ruby" is `master`, and "built-ruby" is `master+fiber-pool`.
Additionally, we conservatively use `madvise(free)` to avoid swap space usage for unused fiber stacks. However, if you remove this requirement, we can get 6x - 10x performance improvement in `vm2_fiber_reuse` benchmark. There are some options to deal with this (e.g. moving it to `GC.compact`) but as this is still a net win, I'd like to merge this PR as is.
---Files--------------------------------
Screen Shot 2019-07-16 at 8.30.59 PM.png (138 KB)
--
https://bugs.ruby-lang.org/
Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>