From: "ioquatix (Samuel Williams) via ruby-core" <ruby-core@...>
Date: 2023-08-25T04:28:13+00:00
Subject: [ruby-core:114525] [Ruby master Bug#17263] Fiber context switch degrades with number of fibers, limit on number of fibers

Issue #17263 has been updated by ioquatix (Samuel Williams).

Status changed from Open to Closed

My current conclusion is this:

Based on the `perf` `cpu-cycles:k`, we see proportional increase in overhead related to the number of fibers, despite ultimately having the same total number of context switches. This is unfortunate, but not exactly unexpected as we are stressing virtual memory.

Here are the results of my testing:

```
| fibers           | elapsed time (s) | rate (t/s)       |
| ---------------- | ---------------- | ---------------- |
|                1 |             0.91 |      10998609.17 |
|                2 |             0.82 |      12239077.16 |
|                4 |             0.77 |      12930013.16 |
|                8 |             0.79 |      12678091.91 |
|               16 |             0.79 |      12578625.99 |
|               32 |             0.79 |      12598729.93 |
|               64 |             0.79 |      12597254.54 |
|              128 |             0.79 |      12643086.20 |
|              256 |             0.83 |      12116891.53 |
|              512 |             0.94 |      10654248.57 |
|             1024 |             1.01 |       9865286.58 |
|             2048 |             1.04 |       9644781.53 |
|             4096 |             1.06 |       9455585.41 |
|             8192 |             1.10 |       9070485.29 |
|            16384 |             1.98 |       5054997.19 |
|            32768 |             3.14 |       3189286.37 |
|            65536 |             3.39 |       2949265.02 |
|           131072 |             3.39 |       2951698.03 |
|           262144 |             3.44 |       2910388.50 |
|           524288 |             3.43 |       2915666.38 |
|          1048576 |             3.43 |       2917077.46 |
```


----------------------------------------
Bug #17263: Fiber context switch degrades with number of fibers, limit on number of fibers
https://bugs.ruby-lang.org/issues/17263#change-104323

* Author: ciconia (Sharon Rosner)
* Status: Closed
* Priority: Normal
* ruby -v: 2.7.1
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN
----------------------------------------
I'm working on developing [Polyphony](https://github.com/digital-fabric/polyphony), a Ruby gem for writing
highly-concurrent Ruby programs with fibers. In the course of my work I have
come up against two problems using Ruby fibers:

1. Fiber context switching performance seem to degrade as the number of fibers
   is increased. This is both with `Fiber#transfer` and
   `Fiber#resume/Fiber.yield`.
2. The number of concurrent fibers that can exist at any time seems to be
   limited. Once a certain number is reached (on my system this seems to be
   31744 fibers), calling `Fiber#transfer` will raise a `FiberError` with the
   message `can't set a guard page: Cannot allocate memory`. This is not due to
   RAM being saturated. With 10000 fibers, my test program hovers at around 150MB
   RSS (on Ruby 2.7.1).

Here's a program for testing the performance of `Fiber#transfer`:

```ruby
# frozen_string_literal: true

require 'fiber'

class Fiber
  attr_accessor :next
end

def run(num_fibers)
  count = 0

  GC.start
  GC.disable

  first = nil
  last = nil
  supervisor = Fiber.current
  num_fibers.times do
    fiber = Fiber.new do
      loop do
        count += 1
        if count == 1_000_000
          supervisor.transfer
        else
          Fiber.current.next.transfer
        end
      end
    end
    first ||= fiber
    last.next = fiber if last
    last = fiber
  end

  last.next = first
  
  t0 = Time.now
  first.transfer
  elapsed = Time.now - t0

  rss = `ps -o rss= -p #{Process.pid}`.to_i

  puts "fibers: #{num_fibers} rss: #{rss} count: #{count} rate: #{count / elapsed}"
rescue Exception => e
  puts "Stopped at #{count} fibers"
  p e
end

run(100)
run(1000)
run(10000)
run(100000)
```

With Ruby 2.6.5 I'm getting:

```
fibers: 100 rss: 23212 count: 1000000 rate: 3357675.1688139187
fibers: 1000 rss: 31292 count: 1000000 rate: 2455537.056439736
fibers: 10000 rss: 127388 count: 1000000 rate: 954251.1674325482
Stopped at 22718 fibers
#<FiberError: can't set a guard page: Cannot allocate memory>
```

With Ruby 2.7.1 I'm getting:

```
fibers: 100 rss: 23324 count: 1000000 rate: 3443916.967616508
fibers: 1000 rss: 34676 count: 1000000 rate: 2333315.3862491543
fibers: 10000 rss: 151364 count: 1000000 rate: 916772.1008060966
Stopped at 31744 fibers
#<FiberError: can't set a guard page: Cannot allocate memory>
```

With ruby-head I get an almost identical result to that of 2.7.1.

As you can see, the performance degradation is similar in all the three versions
of Ruby, going from ~3.4M context switches per second for 100 fibers to less
then 1M context switches per second for 10000 fibers. Running with 100000 fibers
fails to complete.

Here's a program for testing the performance of `Fiber#resume/Fiber.yield`:

```ruby
# frozen_string_literal: true

require 'fiber'

class Fiber
  attr_accessor :next
end

# This program shows how the performance of Fiber.transfer degrades as the fiber
# count increases

def run(num_fibers)
  count = 0

  GC.start
  GC.disable

  fibers = []
  num_fibers.times do
    fibers << Fiber.new { loop { Fiber.yield } }
  end

  t0 = Time.now

  while count < 1000000
    fibers.each do |f|
      count += 1
      f.resume
    end
  end

  elapsed = Time.now - t0

  puts "fibers: #{num_fibers} count: #{count} rate: #{count / elapsed}"
rescue Exception => e
  puts "Stopped at #{count} fibers"
  p e
end

run(100)
run(1000)
run(10000)
run(100000)
```

With Ruby 2.7.1 I'm getting the following output:

```
fibers: 100 count: 1000000 rate: 3048230.049946255
fibers: 1000 count: 1000000 rate: 2362235.6455160403
fibers: 10000 count: 1000000 rate: 950251.7621725246
Stopped at 21745 fibers
#<FiberError: can't set a guard page: Cannot allocate memory>
```

As I understand it, theoretically at least switching between fibers should have
a constant cost in terms of CPU cycles, irrespective of the number of fibers
currently existing in memory. I am completely ignorant the implementation
details of Ruby fibers, so at least for now I don't have any idea where this
problem is coming from.

---Files--------------------------------
clipboard-202308251514-grqb1.png (81.3 KB)
clipboard-202308251514-r7g4l.png (81 KB)
clipboard-202308251538-kmofk.png (13.8 KB)


-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/