From: "ioquatix (Samuel Williams) via ruby-core" <ruby-core@...> Date: 2023-08-25T04:28:13+00:00 Subject: [ruby-core:114525] [Ruby master Bug#17263] Fiber context switch degrades with number of fibers, limit on number of fibers Issue #17263 has been updated by ioquatix (Samuel Williams). Status changed from Open to Closed My current conclusion is this: Based on the `perf` `cpu-cycles:k`, we see proportional increase in overhead related to the number of fibers, despite ultimately having the same total number of context switches. This is unfortunate, but not exactly unexpected as we are stressing virtual memory. Here are the results of my testing: ``` | fibers | elapsed time (s) | rate (t/s) | | ---------------- | ---------------- | ---------------- | | 1 | 0.91 | 10998609.17 | | 2 | 0.82 | 12239077.16 | | 4 | 0.77 | 12930013.16 | | 8 | 0.79 | 12678091.91 | | 16 | 0.79 | 12578625.99 | | 32 | 0.79 | 12598729.93 | | 64 | 0.79 | 12597254.54 | | 128 | 0.79 | 12643086.20 | | 256 | 0.83 | 12116891.53 | | 512 | 0.94 | 10654248.57 | | 1024 | 1.01 | 9865286.58 | | 2048 | 1.04 | 9644781.53 | | 4096 | 1.06 | 9455585.41 | | 8192 | 1.10 | 9070485.29 | | 16384 | 1.98 | 5054997.19 | | 32768 | 3.14 | 3189286.37 | | 65536 | 3.39 | 2949265.02 | | 131072 | 3.39 | 2951698.03 | | 262144 | 3.44 | 2910388.50 | | 524288 | 3.43 | 2915666.38 | | 1048576 | 3.43 | 2917077.46 | ``` ---------------------------------------- Bug #17263: Fiber context switch degrades with number of fibers, limit on number of fibers https://bugs.ruby-lang.org/issues/17263#change-104323 * Author: ciconia (Sharon Rosner) * Status: Closed * Priority: Normal * ruby -v: 2.7.1 * Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN ---------------------------------------- I'm working on developing [Polyphony](https://github.com/digital-fabric/polyphony), a Ruby gem for writing highly-concurrent Ruby programs with fibers. In the course of my work I have come up against two problems using Ruby fibers: 1. Fiber context switching performance seem to degrade as the number of fibers is increased. This is both with `Fiber#transfer` and `Fiber#resume/Fiber.yield`. 2. The number of concurrent fibers that can exist at any time seems to be limited. Once a certain number is reached (on my system this seems to be 31744 fibers), calling `Fiber#transfer` will raise a `FiberError` with the message `can't set a guard page: Cannot allocate memory`. This is not due to RAM being saturated. With 10000 fibers, my test program hovers at around 150MB RSS (on Ruby 2.7.1). Here's a program for testing the performance of `Fiber#transfer`: ```ruby # frozen_string_literal: true require 'fiber' class Fiber attr_accessor :next end def run(num_fibers) count = 0 GC.start GC.disable first = nil last = nil supervisor = Fiber.current num_fibers.times do fiber = Fiber.new do loop do count += 1 if count == 1_000_000 supervisor.transfer else Fiber.current.next.transfer end end end first ||= fiber last.next = fiber if last last = fiber end last.next = first t0 = Time.now first.transfer elapsed = Time.now - t0 rss = `ps -o rss= -p #{Process.pid}`.to_i puts "fibers: #{num_fibers} rss: #{rss} count: #{count} rate: #{count / elapsed}" rescue Exception => e puts "Stopped at #{count} fibers" p e end run(100) run(1000) run(10000) run(100000) ``` With Ruby 2.6.5 I'm getting: ``` fibers: 100 rss: 23212 count: 1000000 rate: 3357675.1688139187 fibers: 1000 rss: 31292 count: 1000000 rate: 2455537.056439736 fibers: 10000 rss: 127388 count: 1000000 rate: 954251.1674325482 Stopped at 22718 fibers #<FiberError: can't set a guard page: Cannot allocate memory> ``` With Ruby 2.7.1 I'm getting: ``` fibers: 100 rss: 23324 count: 1000000 rate: 3443916.967616508 fibers: 1000 rss: 34676 count: 1000000 rate: 2333315.3862491543 fibers: 10000 rss: 151364 count: 1000000 rate: 916772.1008060966 Stopped at 31744 fibers #<FiberError: can't set a guard page: Cannot allocate memory> ``` With ruby-head I get an almost identical result to that of 2.7.1. As you can see, the performance degradation is similar in all the three versions of Ruby, going from ~3.4M context switches per second for 100 fibers to less then 1M context switches per second for 10000 fibers. Running with 100000 fibers fails to complete. Here's a program for testing the performance of `Fiber#resume/Fiber.yield`: ```ruby # frozen_string_literal: true require 'fiber' class Fiber attr_accessor :next end # This program shows how the performance of Fiber.transfer degrades as the fiber # count increases def run(num_fibers) count = 0 GC.start GC.disable fibers = [] num_fibers.times do fibers << Fiber.new { loop { Fiber.yield } } end t0 = Time.now while count < 1000000 fibers.each do |f| count += 1 f.resume end end elapsed = Time.now - t0 puts "fibers: #{num_fibers} count: #{count} rate: #{count / elapsed}" rescue Exception => e puts "Stopped at #{count} fibers" p e end run(100) run(1000) run(10000) run(100000) ``` With Ruby 2.7.1 I'm getting the following output: ``` fibers: 100 count: 1000000 rate: 3048230.049946255 fibers: 1000 count: 1000000 rate: 2362235.6455160403 fibers: 10000 count: 1000000 rate: 950251.7621725246 Stopped at 21745 fibers #<FiberError: can't set a guard page: Cannot allocate memory> ``` As I understand it, theoretically at least switching between fibers should have a constant cost in terms of CPU cycles, irrespective of the number of fibers currently existing in memory. I am completely ignorant the implementation details of Ruby fibers, so at least for now I don't have any idea where this problem is coming from. ---Files-------------------------------- clipboard-202308251514-grqb1.png (81.3 KB) clipboard-202308251514-r7g4l.png (81 KB) clipboard-202308251538-kmofk.png (13.8 KB) -- https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/