From: samuel@... Date: 2020-10-15T11:25:49+00:00 Subject: [ruby-core:100403] [Ruby master Bug#17263] Fiber context switch degrades with number of fibers, limit on number of fibers Issue #17263 has been updated by ioquatix (Samuel Williams). I can confirm I can reproduce your problem. As an experiment, I did: ``` first.transfer count = 0 ``` as warmup to see if it was some kind of lazy allocation issue, but it didn't seem to help. I'll have to take a look at the implementation. Thanks for bringing this to my attention. ---------------------------------------- Bug #17263: Fiber context switch degrades with number of fibers, limit on number of fibers https://bugs.ruby-lang.org/issues/17263#change-88016 * Author: ciconia (Sharon Rosner) * Status: Open * Priority: Normal * ruby -v: 2.7.1 * Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN ---------------------------------------- I'm working on developing [Polyphony](), a Ruby gem for writing highly-concurrent Ruby programs with fibers. In the course of my work I have come up against two problems using Ruby fibers: 1. Fiber context switching performance seem to degrade as the number of fibers is increased. This is both with `Fiber#transfer` and `Fiber#resume/Fiber.yield`. 2. The number of concurrent fibers that can exist at any time seems to be limited. Once a certain number is reached (on my system this seems to be 31744 fibers), calling `Fiber#transfer` will raise a `FiberError` with the message `can't set a guard page: Cannot allocate memory`. This is not due to RAM being saturated. With 10000 fibers, my test program hovers at around 150MB RSS (on Ruby 2.7.1). Here's a program for testing the performance of `Fiber#transfer`: ```ruby # frozen_string_literal: true require 'fiber' class Fiber attr_accessor :next end def run(num_fibers) count = 0 GC.start GC.disable first = nil last = nil supervisor = Fiber.current num_fibers.times do fiber = Fiber.new do loop do count += 1 if count == 1_000_000 supervisor.transfer else Fiber.current.next.transfer end end end first ||= fiber last.next = fiber if last last = fiber end last.next = first t0 = Time.now first.transfer elapsed = Time.now - t0 rss = `ps -o rss= -p #{Process.pid}`.to_i puts "fibers: #{num_fibers} rss: #{rss} count: #{count} rate: #{count / elapsed}" rescue Exception => e puts "Stopped at #{count} fibers" p e end run(100) run(1000) run(10000) run(100000) ``` With Ruby 2.6.5 I'm getting: ``` fibers: 100 rss: 23212 count: 1000000 rate: 3357675.1688139187 fibers: 1000 rss: 31292 count: 1000000 rate: 2455537.056439736 fibers: 10000 rss: 127388 count: 1000000 rate: 954251.1674325482 Stopped at 22718 fibers # ``` With Ruby 2.7.1 I'm getting: ``` fibers: 100 rss: 23324 count: 1000000 rate: 3443916.967616508 fibers: 1000 rss: 34676 count: 1000000 rate: 2333315.3862491543 fibers: 10000 rss: 151364 count: 1000000 rate: 916772.1008060966 Stopped at 31744 fibers # ``` With ruby-head I get an almost identical result to that of 2.7.1. As you can see, the performance degradation is similar in all the three versions of Ruby, going from ~3.4M context switches per second for 100 fibers to less then 1M context switches per second for 10000 fibers. Running with 100000 fibers fails to complete. Here's a program for testing the performance of `Fiber#resume/Fiber.yield`: ```ruby # frozen_string_literal: true require 'fiber' class Fiber attr_accessor :next end # This program shows how the performance of Fiber.transfer degrades as the fiber # count increases def run(num_fibers) count = 0 GC.start GC.disable fibers = [] num_fibers.times do fibers << Fiber.new { loop { Fiber.yield } } end t0 = Time.now while count < 1000000 fibers.each do |f| count += 1 f.resume end end elapsed = Time.now - t0 puts "fibers: #{num_fibers} count: #{count} rate: #{count / elapsed}" rescue Exception => e puts "Stopped at #{count} fibers" p e end run(100) run(1000) run(10000) run(100000) ``` With Ruby 2.7.1 I'm getting the following output: ``` fibers: 100 count: 1000000 rate: 3048230.049946255 fibers: 1000 count: 1000000 rate: 2362235.6455160403 fibers: 10000 count: 1000000 rate: 950251.7621725246 Stopped at 21745 fibers # ``` As I understand it, theoretically at least switching between fibers should have a constant cost in terms of CPU cycles, irrespective of the number of fibers currently existing in memory. I am completely ignorant the implementation details of Ruby fibers, so at least for now I don't have any idea where this problem is coming from. -- https://bugs.ruby-lang.org/ Unsubscribe: