[#107765] [Ruby master Bug#18605] Fails to run on (newer) 32bit Windows with ucrt — "lazka (Christoph Reiter)" <noreply@...>

Issue #18605 has been reported by lazka (Christoph Reiter).

8 messages 2022/03/03

[#107769] [Ruby master Misc#18609] keyword decomposition in enumerable (question/guidance) — "Ethan (Ethan -)" <noreply@...>

Issue #18609 has been reported by Ethan (Ethan -).

10 messages 2022/03/04

[#107784] [Ruby master Feature#18611] Promote best practice for combining multiple values into a hash code — "chrisseaton (Chris Seaton)" <noreply@...>

Issue #18611 has been reported by chrisseaton (Chris Seaton).

12 messages 2022/03/07

[#107791] [Ruby master Bug#18614] Error (busy loop) inTestGemCommandsSetupCommand#test_destdir_flag_does_not_try_to_write_to_the_default_gem_home — duerst <noreply@...>

Issue #18614 has been reported by duerst (Martin D端rst).

7 messages 2022/03/08

[#107794] [Ruby master Feature#18615] Use -Werror=implicit-function-declaration by deault for building C extensions — "Eregon (Benoit Daloze)" <noreply@...>

Issue #18615 has been reported by Eregon (Benoit Daloze).

11 messages 2022/03/08

[#107832] [Ruby master Bug#18622] const_get still looks in Object, while lexical constant lookup no longer does — "Eregon (Benoit Daloze)" <noreply@...>

Issue #18622 has been reported by Eregon (Benoit Daloze).

16 messages 2022/03/10

[#107847] [Ruby master Bug#18625] ruby2_keywords does not unmark the hash if the receiving method has a *rest parameter — "Eregon (Benoit Daloze)" <noreply@...>

Issue #18625 has been reported by Eregon (Benoit Daloze).

13 messages 2022/03/11

[#107886] [Ruby master Feature#18630] Introduce general `IO#timeout` and `IO#timeout=`for all (non-)blocking operations. — "ioquatix (Samuel Williams)" <noreply@...>

Issue #18630 has been reported by ioquatix (Samuel Williams).

28 messages 2022/03/14

[#108026] [Ruby master Feature#18654] Enhancements to prettyprint — "kddeisz (Kevin Newton)" <noreply@...>

Issue #18654 has been reported by kddeisz (Kevin Newton).

9 messages 2022/03/22

[#108039] [Ruby master Feature#18655] Merge `IO#wait_readable` and `IO#wait_writable` into core — "byroot (Jean Boussier)" <noreply@...>

Issue #18655 has been reported by byroot (Jean Boussier).

10 messages 2022/03/23

[#108056] [Ruby master Bug#18658] Need openssl 3 support for Ubuntu 22.04 (Ruby 2.7.x and 3.0.x) — "schneems (Richard Schneeman)" <noreply@...>

Issue #18658 has been reported by schneems (Richard Schneeman).

19 messages 2022/03/24

[#108075] [Ruby master Bug#18663] Autoload doesn't work with fiber context switch. — "ioquatix (Samuel Williams)" <noreply@...>

Issue #18663 has been reported by ioquatix (Samuel Williams).

10 messages 2022/03/25

[#108117] [Ruby master Feature#18668] Merge `io-nonblock` gems into core — "Eregon (Benoit Daloze)" <noreply@...>

Issue #18668 has been reported by Eregon (Benoit Daloze).

22 messages 2022/03/30

[ruby-core:108141] [Ruby master Bug#18455] `IO#close` has poor performance and difficult to understand semantics.

From: "ioquatix (Samuel Williams)" <noreply@...>
Date: 2022-03-31 12:03:03 UTC
List: ruby-core #108141
Issue #18455 has been updated by ioquatix (Samuel Williams).


@shyouhei what was the point of https://github.com/ioquatix/ruby/commit/b121cfde5fbc8cd20684a5fd94047f40323a8353 - Performance? Consistency? Something else?

Even though it's called `rb_io_fptr_finalize_internal` it now seems part of public Ruby API? Can you confirm that was your desire?

----------------------------------------
Bug #18455: `IO#close` has poor performance and difficult to understand semantics.
https://bugs.ruby-lang.org/issues/18455#change-97117

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
`IO#close` should be responsible for closing the file descriptor referred to by the IO instance. When dealing with buffered IO, one can also expect this to flush the internal buffers if possible.

Currently, all blocking IO operations release the GVL and perform the blocking system call using `rb_thread_io_blocking_region`. The current implementation takes a file descriptor and adds an entry to the VM global `waiting_fds` list. When the operation is completed, the entry is removed from `waiting_fds`.

When calling `IO#close`, this list is traversed and any threads performing blocking operations with a matching file descriptor are interrupted. The performance of this is O(number of blocking IO operations) which in practice the performance of `IO#close` can take milliseconds with 10,000 threads performing blocking IO. This performance is unacceptable.

``` ruby
#!/usr/bin/env ruby

require 'benchmark'

class Reading
  def initialize
    @r, @w = IO.pipe

    @thread = Thread.new do
      @r.read
    rescue IOError
      # Ignore.
    end
  end

  attr :r
  attr :w

  attr :thread

  def join
    @thread.join
  end
end

def measure(count = 10)
  readings = count.times.map do
    Reading.new
  end

  sleep 10

  duration = Benchmark.measure do
    readings.each do |reading|
      reading.r.close
      reading.w.close
    end
  end

  average = (duration.total / count) * 1000.0
  pp count: count, average: sprintf("%0.2fms", average)

  readings.each(&:join)
end

measure(   10)
measure(  100)
measure( 1000)
measure(10000)
```

In addition, the semantics of this operation are confusing at best. While Ruby programs are dealing with IO instances, the VM is dealing with file descriptors, in effect performing some internal de-duplication of IO state. In practice, this leads to strange behaviour:

``` ruby
#!/usr/bin/env ruby

r, w = IO.pipe
r2 = IO.for_fd(r.to_i)
pp r: r, r2: r2

t = Thread.new do
  r2.read rescue nil
  r2.read # EBADF
end

sleep 0.5
r.close
t.join rescue nil

pp r: r, r2: r2
# r is closed, r2 is valid but will raise EBADF on any operation.
```

In addition, this confusing behaviour extends to Ractor and state is leaked between the two:

``` ruby
r, w = IO.pipe

ractor = Ractor.new(r.to_i) do |fd|
  r2 = IO.for_fd(fd)
  r2.read
  # r2.read # EBADF
end

sleep 0.5
r.close

pp take: ractor.take
```

I propose the following changes to simplify the semantics and improve performance:

- Move the semantics of `waiting_fds` from per-fd to per-IO. This means that `IO#close` only interrupts blocking operations performed on the same IO instance rather than ANY IO which refers to the same file descriptor. I think this behaviour is easier to understand and still protects against the vast majority of incorrect usage.
- Move the details of `struct rb_io_t` to `internal/io.h` so that the implementation details are not part of the public interface.

## Benchmarks

Before:

```
{:count=>10, :average=>"0.19ms"}
{:count=>100, :average=>"0.11ms"}
{:count=>1000, :average=>"0.18ms"}
{:count=>10000, :average=>"1.16ms"}
```

After:

```
{:count=>10, :average=>"0.20ms"}
{:count=>100, :average=>"0.11ms"}
{:count=>1000, :average=>"0.15ms"}
{:count=>10000, :average=>"0.68ms"}
```

After investigating this further I found that the `rb_thread_io_blocking_region` using `ubf_select` can be incredibly slow, proportional to the number of threads. I don't know whether it's advisable but:

``` c
        BLOCKING_REGION(blocking_node.thread, {
            val = func(data1);
            saved_errno = errno;
        }, NULL /* ubf_select */, blocking_node.thread, FALSE);
```

Disabling the UBF function and relying on `read(fd, ...)`/`write(fd, ...)` blocking operations to fail when `close(fd)` is invoked might be sufficient? This needs more investigation but after making this change, we have constant-time IO#close.

```
{:count=>10, :average=>"0.13ms"}
{:count=>100, :average=>"0.06ms"}
{:count=>1000, :average=>"0.04ms"}
{:count=>10000, :average=>"0.09ms"}
```

Which is ideally what we want.



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

In This Thread