From: "shyouhei (Shyouhei Urabe)" Date: 2022-04-04T02:02:49+00:00 Subject: [ruby-core:108168] [Ruby master Bug#18455] `IO#close` has poor performance and difficult to understand semantics. Issue #18455 has been updated by shyouhei (Shyouhei Urabe). ioquatix (Samuel Williams) wrote in #note-4: > @shyouhei what was the point of https://github.com/ioquatix/ruby/commit/b121cfde5fbc8cd20684a5fd94047f40323a8353 - Performance? Consistency? Something else? Hello, `rb_io_fptr_finalize_internal` this is a very tiny peephole optimisation to omit return value when possible. I didn't indent to make it public. If it is now that isn't what I want. ---------------------------------------- Bug #18455: `IO#close` has poor performance and difficult to understand semantics. https://bugs.ruby-lang.org/issues/18455#change-97142 * Author: ioquatix (Samuel Williams) * Status: Open * Priority: Normal * Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN ---------------------------------------- `IO#close` should be responsible for closing the file descriptor referred to by the IO instance. When dealing with buffered IO, one can also expect this to flush the internal buffers if possible. Currently, all blocking IO operations release the GVL and perform the blocking system call using `rb_thread_io_blocking_region`. The current implementation takes a file descriptor and adds an entry to the VM global `waiting_fds` list. When the operation is completed, the entry is removed from `waiting_fds`. When calling `IO#close`, this list is traversed and any threads performing blocking operations with a matching file descriptor are interrupted. The performance of this is O(number of blocking IO operations) which in practice the performance of `IO#close` can take milliseconds with 10,000 threads performing blocking IO. This performance is unacceptable. ``` ruby #!/usr/bin/env ruby require 'benchmark' class Reading def initialize @r, @w = IO.pipe @thread = Thread.new do @r.read rescue IOError # Ignore. end end attr :r attr :w attr :thread def join @thread.join end end def measure(count = 10) readings = count.times.map do Reading.new end sleep 10 duration = Benchmark.measure do readings.each do |reading| reading.r.close reading.w.close end end average = (duration.total / count) * 1000.0 pp count: count, average: sprintf("%0.2fms", average) readings.each(&:join) end measure( 10) measure( 100) measure( 1000) measure(10000) ``` In addition, the semantics of this operation are confusing at best. While Ruby programs are dealing with IO instances, the VM is dealing with file descriptors, in effect performing some internal de-duplication of IO state. In practice, this leads to strange behaviour: ``` ruby #!/usr/bin/env ruby r, w = IO.pipe r2 = IO.for_fd(r.to_i) pp r: r, r2: r2 t = Thread.new do r2.read rescue nil r2.read # EBADF end sleep 0.5 r.close t.join rescue nil pp r: r, r2: r2 # r is closed, r2 is valid but will raise EBADF on any operation. ``` In addition, this confusing behaviour extends to Ractor and state is leaked between the two: ``` ruby r, w = IO.pipe ractor = Ractor.new(r.to_i) do |fd| r2 = IO.for_fd(fd) r2.read # r2.read # EBADF end sleep 0.5 r.close pp take: ractor.take ``` I propose the following changes to simplify the semantics and improve performance: - Move the semantics of `waiting_fds` from per-fd to per-IO. This means that `IO#close` only interrupts blocking operations performed on the same IO instance rather than ANY IO which refers to the same file descriptor. I think this behaviour is easier to understand and still protects against the vast majority of incorrect usage. - Move the details of `struct rb_io_t` to `internal/io.h` so that the implementation details are not part of the public interface. ## Benchmarks Before: ``` {:count=>10, :average=>"0.19ms"} {:count=>100, :average=>"0.11ms"} {:count=>1000, :average=>"0.18ms"} {:count=>10000, :average=>"1.16ms"} ``` After: ``` {:count=>10, :average=>"0.20ms"} {:count=>100, :average=>"0.11ms"} {:count=>1000, :average=>"0.15ms"} {:count=>10000, :average=>"0.68ms"} ``` After investigating this further I found that the `rb_thread_io_blocking_region` using `ubf_select` can be incredibly slow, proportional to the number of threads. I don't know whether it's advisable but: ``` c BLOCKING_REGION(blocking_node.thread, { val = func(data1); saved_errno = errno; }, NULL /* ubf_select */, blocking_node.thread, FALSE); ``` Disabling the UBF function and relying on `read(fd, ...)`/`write(fd, ...)` blocking operations to fail when `close(fd)` is invoked might be sufficient? This needs more investigation but after making this change, we have constant-time IO#close. ``` {:count=>10, :average=>"0.13ms"} {:count=>100, :average=>"0.06ms"} {:count=>1000, :average=>"0.04ms"} {:count=>10000, :average=>"0.09ms"} ``` Which is ideally what we want. -- https://bugs.ruby-lang.org/ Unsubscribe: