[ruby-core:93817] [Ruby master Feature#15997] Improve performance of fiber creation by using pool allocation strategy.

From: samuel@...
Date: 2019-07-17 01:16:36 UTC
List: ruby-core #93817
Issue #15997 has been updated by ioquatix (Samuel Williams).


Here is some testing using falcon and `ab`. `ab` is HTTP/1.0 client test. Because of that, each connection/request makes new fiber, so it's going to show if there are improvements/regressions to performance.

```
Server Software:        2.7.0-fiber-pool FREE_STACKS=0
Server Hostname:        localhost
Server Port:            9292

Document Path:          /small
Document Length:        1200 bytes

Concurrency Level:      256
Time taken for tests:   14.174 seconds
Complete requests:      100000
Failed requests:        0
Total transferred:      126000000 bytes
HTML transferred:       120000000 bytes
Requests per second:    7055.11 [#/sec] (mean)
Time per request:       36.286 [ms] (mean)
Time per request:       0.142 [ms] (mean, across all concurrent requests)
Transfer rate:          8681.10 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0   17 122.8      2    3038
Processing:     4   19   5.7     18     231
Waiting:        0    8   6.6      7     225
Total:         10   36 123.1     19    3056

Percentage of the requests served within a certain time (ms)
  50%     19
  66%     21
  75%     23
  80%     24
  90%     27
  95%     28
  98%     31
  99%   1022
 100%   3056 (longest request)



 Server Software:        2.7.0-fiber-pool FREE_STACKS=1
 Server Hostname:        localhost
 Server Port:            9292

 Document Path:          /small
 Document Length:        1200 bytes

 Concurrency Level:      256
 Time taken for tests:   14.676 seconds
 Complete requests:      100000
 Failed requests:        0
 Total transferred:      126000000 bytes
 HTML transferred:       120000000 bytes
 Requests per second:    6813.71 [#/sec] (mean)
 Time per request:       37.571 [ms] (mean)
 Time per request:       0.147 [ms] (mean, across all concurrent requests)
 Transfer rate:          8384.06 [Kbytes/sec] received

 Connection Times (ms)
               min  mean[+/-sd] median   max
 Connect:        0   17 124.6      1    1030
 Processing:     4   20   9.3     18     416
 Waiting:        0    8  10.0      7     412
 Total:          7   37 126.9     20    1437

 Percentage of the requests served within a certain time (ms)
   50%     20
   66%     22
   75%     23
   80%     24
   90%     27
   95%     29
   98%     35
   99%   1027
  100%   1437 (longest request)



	Server Software:        2.7.0-master
	Server Hostname:        localhost
	Server Port:            9293

	Document Path:          /small
	Document Length:        1200 bytes

	Concurrency Level:      256
	Time taken for tests:   16.170 seconds
	Complete requests:      100000
	Failed requests:        0
	Total transferred:      126000000 bytes
	HTML transferred:       120000000 bytes
	Requests per second:    6184.15 [#/sec] (mean)
	Time per request:       41.396 [ms] (mean)
	Time per request:       0.162 [ms] (mean, across all concurrent requests)
	Transfer rate:          7609.41 [Kbytes/sec] received

	Connection Times (ms)
	              min  mean[+/-sd] median   max
	Connect:        0   19 133.4      1    3223
	Processing:     4   22   7.4     21     432
	Waiting:        0    9   8.3      8     422
	Total:          5   41 134.3     22    3246

	Percentage of the requests served within a certain time (ms)
	  50%     22
	  66%     23
	  75%     25
	  80%     27
	  90%     31
	  95%     33
	  98%     39
	  99%   1029
	 100%   3246 (longest request)



	 Server Software:        2.6.3
	 Server Hostname:        localhost
	 Server Port:            9294

	 Document Path:          /small
	 Document Length:        1200 bytes

	 Concurrency Level:      256
	 Time taken for tests:   15.600 seconds
	 Complete requests:      100000
	 Failed requests:        0
	 Total transferred:      126000000 bytes
	 HTML transferred:       120000000 bytes
	 Requests per second:    6410.16 [#/sec] (mean)
	 Time per request:       39.937 [ms] (mean)
	 Time per request:       0.156 [ms] (mean, across all concurrent requests)
	 Transfer rate:          7887.51 [Kbytes/sec] received

	 Connection Times (ms)
	               min  mean[+/-sd] median   max
	 Connect:        0   18 130.2      1    3132
	 Processing:     4   21   8.4     20     432
	 Waiting:        0    9   9.2      8     428
	 Total:          9   39 131.6     21    3143

	 Percentage of the requests served within a certain time (ms)
	   50%     21
	   66%     22
	   75%     23
	   80%     25
	   90%     31
	   95%     33
	   98%     34
	   99%   1029
	  100%   3143 (longest request)
```

----------------------------------------
Feature #15997: Improve performance of fiber creation by using pool allocation strategy.
https://bugs.ruby-lang.org/issues/15997#change-79684

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: ko1 (Koichi Sasada)
* Target version: 
----------------------------------------
https://github.com/ruby/ruby/pull/2224

This PR improves the performance of fiber allocation and reuse by implementing a better stack cache.

The fiber pool manages a singly linked list of fiber pool allocations. The fiber pool allocation contains 1 or more stack (typically more, e.g. 512). It uses N^2 allocation strategy, starting at 8 initial stacks, next is 8, 16, 32, etc.

```
//
// base = +-------------------------------+-----------------------+  +
//        |VM Stack       |VM Stack       |                       |  |
//        |               |               |                       |  |
//        |               |               |                       |  |
//        +-------------------------------+                       |  |
//        |Machine Stack  |Machine Stack  |                       |  |
//        |               |               |                       |  |
//        |               |               |                       |  |
//        |               |               | .  .  .  .            |  |  size
//        |               |               |                       |  |
//        |               |               |                       |  |
//        |               |               |                       |  |
//        |               |               |                       |  |
//        |               |               |                       |  |
//        +-------------------------------+                       |  |
//        |Guard Page     |Guard Page     |                       |  |
//        +-------------------------------+-----------------------+  v
//
//        +------------------------------------------------------->
//
//                                  count
//
```

The performance improvement depends on usage:

```
Calculating -------------------------------------
                     compare-ruby  built-ruby 
  vm2_fiber_allocate     132.900k    180.852k i/s -    100.000k times in 0.752447s 0.552939s
     vm2_fiber_count       5.317k    110.724k i/s -    100.000k times in 18.806479s 0.903145s
     vm2_fiber_reuse      160.128     347.663 i/s -     200.000 times in 1.249003s 0.575269s
    vm2_fiber_switch      13.429M     13.490M i/s -     20.000M times in 1.489303s 1.482549s

Comparison:
               vm2_fiber_allocate
          built-ruby:    180851.6 i/s 
        compare-ruby:    132899.7 i/s - 1.36x  slower

                  vm2_fiber_count
          built-ruby:    110724.3 i/s 
        compare-ruby:      5317.3 i/s - 20.82x  slower

                  vm2_fiber_reuse
          built-ruby:       347.7 i/s 
        compare-ruby:       160.1 i/s - 2.17x  slower

                 vm2_fiber_switch
          built-ruby:  13490282.4 i/s 
        compare-ruby:  13429100.0 i/s - 1.00x  slower
```

This test is run on Linux server with 64GB memory and 4-core Xeon (Intel Xeon CPU E3-1240 v6 @ 3.70GHz). "compare-ruby" is `master`, and "built-ruby" is `master+fiber-pool`.

Additionally, we conservatively use `madvise(free)` to avoid swap space usage for unused fiber stacks. However, if you remove this requirement, we can get 6x - 10x performance improvement in `vm2_fiber_reuse` benchmark. There are some options to deal with this (e.g. moving it to `GC.compact`) but as this is still a net win, I'd like to merge this PR as is.



---Files--------------------------------
Screen Shot 2019-07-16 at 8.30.59 PM.png (138 KB)


-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

In This Thread

Prev Next