[#109844] [Ruby master Feature#18996] Proposal: Introduce new APIs to reline for changing dialog UI colours — "st0012 (Stan Lo)" <noreply@...>

Issue #18996 has been reported by st0012 (Stan Lo).

14 messages 2022/09/07

[#109850] [Ruby master Feature#19000] Data: Add "Copy with changes method" [Follow-on to #16122 Data: simple immutable value object] — "RubyBugs (A Nonymous)" <noreply@...>

Issue #19000 has been reported by RubyBugs (A Nonymous).

42 messages 2022/09/08

[#109905] [Ruby master Bug#19005] Ruby interpreter compiled XCode 14 cannot build some native gems on macOS — "stanhu (Stan Hu)" <noreply@...>

Issue #19005 has been reported by stanhu (Stan Hu).

28 messages 2022/09/15

[#109930] [Ruby master Bug#19007] Unicode tables differences from Unicode.org 14.0 data and removed properties since 13.0 — "nobu (Nobuyoshi Nakada)" <noreply@...>

Issue #19007 has been reported by nobu (Nobuyoshi Nakada).

8 messages 2022/09/17

[#109937] [Ruby master Feature#19008] Introduce coverage support for `eval`. — "ioquatix (Samuel Williams)" <noreply@...>

Issue #19008 has been reported by ioquatix (Samuel Williams).

23 messages 2022/09/17

[#109961] [Ruby master Bug#19012] BasicSocket#recv* methods return an empty packet instead of nil on closed connections — "byroot (Jean Boussier)" <noreply@...>

Issue #19012 has been reported by byroot (Jean Boussier).

8 messages 2022/09/20

[#109985] [Ruby master Feature#19015] Language extension by a heredoc — "ko1 (Koichi Sasada)" <noreply@...>

Issue #19015 has been reported by ko1 (Koichi Sasada).

14 messages 2022/09/22

[#109995] [Ruby master Bug#19016] syntax_suggest is not working with Ruby 3.2.0-preview2 — "hsbt (Hiroshi SHIBATA)" <noreply@...>

Issue #19016 has been reported by hsbt (Hiroshi SHIBATA).

9 messages 2022/09/22

[#110097] [Ruby master Feature#19024] Proposal: Import Modules — "shioyama (Chris Salzberg)" <noreply@...>

SXNzdWUgIzE5MDI0IGhhcyBiZWVuIHJlcG9ydGVkIGJ5IHNoaW95YW1hIChDaHJpcyBTYWx6YmVy

27 messages 2022/09/27

[#110119] [Ruby master Bug#19026] Add `Coverage.supported?(x)` to detect support for `eval` coverage flag. — "ioquatix (Samuel Williams)" <noreply@...>

SXNzdWUgIzE5MDI2IGhhcyBiZWVuIHJlcG9ydGVkIGJ5IGlvcXVhdGl4IChTYW11ZWwgV2lsbGlh

10 messages 2022/09/28

[#110133] [Ruby master Bug#19028] GCC12 Introduces new warn flags `-Wuse-after-free` — "eightbitraptor (Matthew Valentine-House)" <noreply@...>

SXNzdWUgIzE5MDI4IGhhcyBiZWVuIHJlcG9ydGVkIGJ5IGVpZ2h0Yml0cmFwdG9yIChNYXR0aGV3

8 messages 2022/09/28

[#110145] [Ruby master Misc#19030] [ANN] Migrate lists.ruby-lang.org to Google Groups — "hsbt (Hiroshi SHIBATA)" <noreply@...>

SXNzdWUgIzE5MDMwIGhhcyBiZWVuIHJlcG9ydGVkIGJ5IGhzYnQgKEhpcm9zaGkgU0hJQkFUQSku

12 messages 2022/09/29

[#110154] [Ruby master Bug#19033] One-liner pattern match as Boolean arg syntax error — "baweaver (Brandon Weaver)" <noreply@...>

SXNzdWUgIzE5MDMzIGhhcyBiZWVuIHJlcG9ydGVkIGJ5IGJhd2VhdmVyIChCcmFuZG9uIFdlYXZl

7 messages 2022/09/30

[ruby-core:109901] [Ruby master Feature#18885] End of boot advisory API for RubyVM

From: "byroot (Jean Boussier)" <noreply@...>
Date: 2022-09-15 13:16:19 UTC
List: ruby-core #109901
Issue #18885 has been updated by byroot (Jean Boussier).


So I wrote a reproduction script to showcase the effect of constant caches on Copy on Write performance:

```ruby
class MemInfo
  def initialize(pid = "self")
    @info = parse(File.read("/proc/#{pid}/smaps_rollup"))
  end

  def pss
    @info[:Pss]
  end

  def rss
    @info[:Rss]
  end

  def shared_memory
    @info[:Shared_Clean] + @info[:Shared_Dirty]
  end

  def cow_efficiency
    shared_memory.to_f / MemInfo.new(Process.ppid).rss * 100.0
  end

  private

  def parse(rollup)
    fields = {}
    rollup.each_line do |line|
      if (matchdata = line.match(/(?<field>\w+)\:\s+(?<size>\d+) kB$/))
        fields[matchdata[:field].to_sym] = matchdata[:size].to_i
      end
    end
    fields
  end
end

CONST_NUM = Integer(ENV.fetch("NUM", 100_000))

module App
  CONST_NUM.times do |i|
    class_eval(<<~RUBY, __FILE__, __LINE__ + 1)
      Const#{i} = Module.new

      def self.lookup_#{i}
        Const#{i}
      end
    RUBY
  end

  class_eval(<<~RUBY, __FILE__, __LINE__ + 1)
    def self.warmup
      #{CONST_NUM.times.map { |i| "lookup_#{i}"}.join("\n")}
    end
  RUBY
end

puts "=== fresh parent stats ==="
puts "RSS: #{MemInfo.new.rss} kB"
puts

def print_child_meminfo
  meminfo = MemInfo.new
  puts "PSS: #{meminfo.pss} kB"
  puts "Shared #{meminfo.shared_memory} kB"
  puts "CoW efficiency: #{meminfo.cow_efficiency.round(1)}%"
  puts
end

fork do
  puts "=== fresh fork stats ==="
  print_child_meminfo

  App.warmup

  print_child_meminfo
end

Process.wait

App.warmup

puts "=== warmed parent stats ==="
puts "RSS: #{MemInfo.new.rss} kB"
puts

fork do
  puts "=== warmed fork stats ==="
  print_child_meminfo

  App.warmup

  print_child_meminfo
end

Process.wait
```

Results:

```
$ docker run -v $PWD:/app -it ruby:3.1 ruby /app/app.rb
=== fresh parent stats ===
RSS: 236104 kB

=== fresh fork stats ===
PSS: 117198 kB
Shared 233828 kB
CoW efficiency: 99.0%

PSS: 199734 kB
Shared 72740 kB
CoW efficiency: 30.8%

=== warmed parent stats ===
RSS: 237128 kB

=== warmed fork stats ===
PSS: 117632 kB
Shared 234880 kB
CoW efficiency: 99.1%

PSS: 118318 kB
Shared 235444 kB
CoW efficiency: 99.3%
```

### What this shows

When we first fork the process, the memory cost is close to 0. The parent process has ~230MiB RSS, but 99% of that is shared with the first child, putting the actual cost of the fork at barely a couple MiB.

However as soon as we start executing code in the child that wasn't warmed up in the parent, the inline caches are being filled, which invalidates the shared pages. After that only a third of the parent memory is shared, putting the cost of the child at about 163MiB.

The second part of the reproduction first warmup these caches in the parent before forking. As a result the child doesn't invalidate shared memory when it execute the code, and the cost of the child remain totally negligible.

### What it means for the real world

Of course this repro is specially crafted to show the impact of constant caches, there are other source of invalidations such as method caches etc, but as mentioned now that https://github.com/ruby/ruby/pull/6187 was merged, it should be easy to prewarm the constant caches when that proposed API is called.

I guess all we need is a name. Maybe `ObjectSpace.optimize`?

----------------------------------------
Feature #18885: End of boot advisory API for RubyVM
https://bugs.ruby-lang.org/issues/18885#change-99144

* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
----------------------------------------
### Context

Many optimizations in the Ruby VM rely on lazily computed caches: Strings coderange, constant caches, method caches, etc etc.
As such even without JIT, some operations need a bit of a warm up, and might be flushed if new constants are defined, new code is loaded, or some objects are mutated.

Additionally these lazily computed caches can cause increased memory usage for applications relying on Copy-on-Write memory.
Whenever one of these caches is updated post fork, the entire memory page is invalidated. Precomputing these caches at the end of boot,
even if based on heuristic, could improve Copy-on-Write performance.

The classic example is the objects generation, young objects must be promoted to the old generation before forking, otherwise they'll get invalidated on the next GC run. That's what https://github.com/ko1/nakayoshi_fork addresses.

But there are other sources of CoW invalidation that could be addressed by MRI if it had a clear notification when it needs to be done.

### Proposal

If applications had an API to notify the virtual machine that they're done loading code and are about to start processing user input,
it would give the VM a good point in time to perform optimizations on the existing code and objects.

e.g. could be something like `RubyVM.prepare`, or `RubyVM.ready`.

It's somewhat similar to [Matz's static barrier idea from RubyConf 2020](https://youtu.be/JojpqfaPhjI?t=1908), except that it wouldn't disable any feature.

### Potential optimizations

`nakayoshi_fork` already does the following:

  - Do a major GC run to get rid of as many dangling objects as possible.
  - Promote all surviving objects to the highest generation
  - Compact the heap.

But it would be much simpler to do this from inside the VM rather than do cryptic things such as `4.times { GC.start }` from the Ruby side.

It's also not good to do this on every fork, once you fork the first long lived child, you shouldn't run it again. So decorating `fork` is not a good hook point. 

Also after discussing with @jhawthorn, @tenderlovemaking and @alanwu, we believe this would open the door to several other CoW optimizations:

#### Precompute inline caches

Even though we don't have hard data to prove it, we are convinced that a big source of CoW invalidation are inline caches. Most ISeq are never invoked during initialization, so child processed are forked with mostly cold caches. As a result the first time a method is executed in the child, many memory pages holding ISeq are invalidated as caches get updated.

We think MRI could try to precompute these caches before forking children. Constant cache particularly should be resolvable statically see https://github.com/ruby/ruby/pull/6187.

Method caches are harder to resolve statically, but we can probably apply some heuristics to at least reduce the cache misses.

#### Copy on Write aware GC

We could also keep some metadata about which memory pages are shared, or even introduce a "permanent" generation. [The Instagram engineering team introduced something like that in Python](https://instagram-engineering.com/copy-on-write-friendly-python-garbage-collection-ad6ed5233ddf) ([ticket](https://bugs.python.org/issue31558), [PR](https://github.com/python/cpython/pull/3705)).

That makes the GC aware of which objects live on a shared page. With this information the GC can decide to no free dangling objects leaving on these pages, not to compact these pages, etc.

#### Scan the coderange of all strings

Strings have a lazily computed `coderange` attribute in their flags. So if a string is allocated at boot, but only used after fork, on first use its coderange will mayneed to be computed and the string mutated.

Using https://github.com/ruby/ruby/pull/6076, I noticed that 58% of the strings retained at the end of the boot sequence had an `UNKNOWN` coderange.

So eagerly scanning the coderange of all strings could also improve Copy on Write performance.

#### malloc_trim

This hook will also be a good point to release unused pages to the system with `malloc_trim`.



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

In This Thread