ruby-core

Issue #19406 has been reported by eightbitraptor (Matthew Valentine-House).

----------------------------------------
Feature #19406: Allow declarative reference definition for rb_typed_data_struct
https://bugs.ruby-lang.org/issues/19406

* Author: eightbitraptor (Matthew Valentine-House)
* Status: Open
* Priority: Normal
----------------------------------------
[Github PR 7153](https://github.com/ruby/ruby/pull/7153)

## Summary

This PR proposes an additional API for C extension authors to define wrapped
struct members that point to Ruby objects, when the struct being wrapped
contains only members with primitive types (ie. no arrays or unions). The new
interface passes an offset from the top of the data structure, rather than the
reference `VALUE` itself, allowing the GC to manipulate both the reference edge
(the address holding the pointer), as well as the underlying object.

This allows Ruby's GC to handle marking, object movement and reference updating
independently without calling back into user supplied code.

## Implementation

When a wrapped struct contains a simple list of members (such as the 
`struct enumerator` in `enumerator.c`). We can declare all of the struct members that
may point to valid Ruby objects as `RUBY_REF_EDGE` in a static array.

If we choose to do this, then we can mark the corresponding `rb_data_type_t` as
`RUBY_TYPED_DECL_MARKING` and pass a pointer to the references array in the
`data` field.

To avoid having to also find space in the `rb_data_type_t` to define a length for
the references list, I've chosen to require list termination
with `RUBY_REF_END` - defined as `UINTPTR_MAX`. My assumption is that no
single wrapped struct will ever be large enough that `UINTPTR_MAX` is actually a
valid reference.

We don't have to then define `dmark` or `dcompact` callback functions. Marking,
object movement, and reference updating will be handled for us by the GC.

```C
struct enumerator {
    VALUE obj;
    ID    meth;
    VALUE args;
    VALUE fib;
    VALUE dst;
    VALUE lookahead;
    VALUE feedvalue;
    VALUE stop_exc;
    VALUE size;
    VALUE procs;
    rb_enumerator_size_func *size_fn;
    int kw_splat;
};

static const size_t enumerator_refs[] = {
    RUBY_REF_EDGE(enumerator, obj),
    RUBY_REF_EDGE(enumerator, args),
    RUBY_REF_EDGE(enumerator, fib),
    RUBY_REF_EDGE(enumerator, dst),
    RUBY_REF_EDGE(enumerator, lookahead),
    RUBY_REF_EDGE(enumerator, feedvalue),
    RUBY_REF_EDGE(enumerator, stop_exc),
    RUBY_REF_EDGE(enumerator, size),
    RUBY_REF_EDGE(enumerator, procs),
    RUBY_REF_END
};

static const rb_data_type_t enumerator_data_type = {
    "enumerator",
    {
        NULL,
        enumerator_free,
        enumerator_memsize,
        NULL,
    },
    0, (void *)enumerator_refs, RUBY_TYPED_FREE_IMMEDIATELY | RUBY_TYPED_DECL_MARKING
};
```

### Benchmarking

Benchmarking shows that this reference declaration style does not degrade
performance when compared to the callback style.

To benchmark this we created a C extension that initialized a struct with 20
`VALUE` members, all set to point to Ruby strings. We wrapped each struct using
`TypedData_Make_Struct` in an object. One object was configured with callback
functions and one was configured with declarative references.

In separate scripts we then created 500,000 of these objects, added them to a
list, so they would be marked and not swept and used
`GC.verify_compaction_references` to make sure everything that could move, did.

Finally we created a wrapper script that used seperate processes to run each GC
type (to ensure that the GC's were completely independent), ran each benchmark
50 times, and collected the results of `GC.stat[:time]`.

We did this on an M1 Pro MacBook (aarch64), and a Ryzen 3600 We then plotted the
results:

![chart showing GC time between callback and declarative marking on arm64 and
x86_64](https://user-images.githubusercontent.com/31869/216573409-ddafa3bd-9af7-4b60-ba61-355da7e71910.png)

As we can see from this, there has been no real impact to GC performance in our
benchmarks.

Benchmark code and harnesses is [available in this Github
repo](https://github.com/eightbitraptor/test_decl_marking)

## Justification

Requiring extension authors to implement seperate `dmark` and `dcompact`
callbacks can be error-prone, and pushes GC responsibilities from the GC into
user supplied code. This can be a source of bugs arising from the `dmark` and
`dcompact` functions being implemented incorrectly, or becoming out of sync with each other.

There has already been work done by @Peterzhu2118 [to try and unify these
callbacks](https://github.com/ruby/ruby/pull/7140), so that authors can define a
single function, that will be used for both marking and compacting, removing the
risk of these callbacks becoming out of sync.

This proposal works alongside Peter's earlier work to eliminate the
callbacks entirely for the "simple reference" case.

This means that extension authors with simple structs to wrap can declare which
of their struct members point to Ruby objects to get GC marking and compaction
support. And extension authors with more complex requirements will only have to
implement a single function, using Peter's work.

In addition to this, passing the GC the address of a reference rather than the
reference itself (edge based, rather than object based enqueing), allows the GC
itself to have more control over how it manipulates that reference.

This means that when considering alternative GC implementations for Ruby (such
as our [ongoing work integrating MMTk into
Ruby](https://github.com/mmtk/mmtk-ruby)[^1]), We don't need to call from Ruby
into library code, and then back into Ruby code as often; which can increase
performance, and allow more complex algorithms to be implemented.

[^1]: [MMtk](https://www.mmtk.io/) is the Memory Management Toolkit. A framework
    for implementing automatic memory management strategies

## Trade-offs

This PR provides another method for defining references in C extensions, in
addition to the callback based approach, effectively widening the extension API.
Extension authors will now need to choose whether to use the declarative
approach, or a callback based approach depending on their use case. This is more
complex for extension authors.

However because the callback's do still exist, this does mean that extension
authors can migrate their own code to this new, faster approach at their
leisure.

## Further work

As part of this work we inspected all uses of `rb_data_type_t` in the Ruby
source code and of 134 separete instances, 60 wrapped structs that contained
`VALUE` members that could point to Ruby objects. Out of these 27 were "simple"
structs that would benefit from this approach, 28 contained complex references
(unions, loops etc) that won't work with this approach, and 5 were situations
that were unsure, that we believe we could make work given some slight
refactors.




-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

Thread

Prev Next

In This Thread

Prev Next

[#112159] [Ruby master Feature#13668] Show / log test-all skips in CI here or at http://rubyci.org/ ? — "hsbt (Hiroshi SHIBATA) via ruby-core" <ruby-core@...>

[#112161] [Ruby master Bug#19396] Backport RubyGems 3.4.6 and Bundler 2.4.6 to ruby_3_2 — "hsbt (Hiroshi SHIBATA) via ruby-core" <ruby-core@...>

[#112166] [Ruby master Bug#19397] ruby -h fails with SIGSGV if ulimit -s is any else than unlimited — "john_d_s (John Damm Soerensen) via ruby-core" <ruby-core@...>

[#112172] [Ruby master Bug#19398] Memory leak in WeakMap — "peterzhu2118 (Peter Zhu) via ruby-core" <ruby-core@...>

[#112173] [Ruby master Bug#19399] Ripper::Lexer.parse throws NoMethodError error for some input — "tompng (tomoya ishida) via ruby-core" <ruby-core@...>

[#112179] [Ruby master Bug#19400] testcase failed on 3.2.0 — "xiacunshun (xiacunshun xia) via ruby-core" <ruby-core@...>

[#112183] [Ruby master Bug#19401] [Doc] Broken links in CSV documentation — "eightbitraptor (Matthew Valentine-House) via ruby-core" <ruby-core@...>

[#112185] [Ruby master Bug#19402] CSV skip_lines option not behaving as documented — "jamie_ca (Jamie Macey) via ruby-core" <ruby-core@...>

[#112187] [Ruby master Bug#19403] Unable to Build Native Gems on Mac with Ruby 3.1.0+ — "jcouball@... (James Couball) via ruby-core" <ruby-core@...>

[#112189] [Ruby master Bug#19404] Backport request for 3b83b265f11965582d4b9b439eff8a501792ab68 — "alanwu (Alan Wu) via ruby-core" <ruby-core@...>

[#112193] [Ruby master Bug#19405] Prevent Use of include CustomModule in a Nested Class — kyonides via ruby-core <ruby-core@...>

[#112200] [Ruby master Feature#19406] Allow declarative reference definition for rb_typed_data_struct — "eightbitraptor (Matthew Valentine-House) via ruby-core" <ruby-core@...>

[#112206] [Ruby master Bug#19407] 2 threads taking from current ractor will hang forever — "luke-gru (Luke Gruber) via ruby-core" <ruby-core@...>

[#112207] [Ruby master Bug#19408] Object no longer frozen after moved from a ractor — "luke-gru (Luke Gruber) via ruby-core" <ruby-core@...>

[#112208] [Ruby master Bug#19409] Object's shape is reset after a ractor move — "luke-gru (Luke Gruber) via ruby-core" <ruby-core@...>

[#112209] [Ruby master Bug#14083] Refinement in block calling incorrect method — "alanwu (Alan Wu) via ruby-core" <ruby-core@...>

[#112210] [Ruby master Bug#19410] If move from ractor fails with error, some objects are left in broken state — "luke-gru (Luke Gruber) via ruby-core" <ruby-core@...>

[#112211] [Ruby master Bug#19411] GC issue with moved objects — "luke-gru (Luke Gruber) via ruby-core" <ruby-core@...>

[#112213] [Ruby master Bug#19412] Socket starts queueing and not responding after a certain amount of requests — "brodock (Gabriel Mazetto) via ruby-core" <ruby-core@...>

[#112217] [Ruby master Bug#19413] Can't move object when 2 ivars refer to same object — "luke-gru (Luke Gruber) via ruby-core" <ruby-core@...>

[#112218] [Ruby master Bug#19414] uninitialized constant URI::WSS in 3.0.X and 3.1.X — "noraj (Alexandre ZANNI) via ruby-core" <ruby-core@...>

[#112220] [Ruby master Bug#19415] Incorrect circularity warning for concurrent requires — "fxn (Xavier Noria) via ruby-core" <ruby-core@...>

[#112221] [Ruby master Feature#15778] Expose an API to pry-open the stack frames in Ruby — "Eregon (Benoit Daloze) via ruby-core" <ruby-core@...>

[#112222] [Ruby master Bug#19416] Inconsistent behaviour for Struct.new without any member_names — "herwin (Herwin W) via ruby-core" <ruby-core@...>

[#112223] [Ruby master Bug#19417] Regexp \p{Word} and [[:word:]] do not match Unicode Other_Number character — "ObjectBoxPC (Philip Chung) via ruby-core" <ruby-core@...>

[#112232] [Ruby master Bug#19418] Checking if a date in an open date range times out when the range starts after the test date — "wilhelmsen (Hallgeir Wilhelmsen) via ruby-core" <ruby-core@...>

[#112237] [Ruby master Bug#19419] [BUG] try to mark T_NONE object in `ibf_dump_mark` — "byroot (Jean Boussier) via ruby-core" <ruby-core@...>

[#112239] [Ruby master Feature#19420] Simplify MJIT implementation — "k0kubun (Takashi Kokubun) via ruby-core" <ruby-core@...>

[#112245] [Ruby master Bug#19421] Distribution documentation — "ioquatix (Samuel Williams) via ruby-core" <ruby-core@...>

[#112262] [Ruby master Feature#19422] Make `--enabled-shared` mandatory on macOS — "nobu (Nobuyoshi Nakada) via ruby-core" <ruby-core@...>

[#112281] [Ruby master Feature#19423] IXDTF (Internet Extended Date/Time format) support — "nobu (Nobuyoshi Nakada) via ruby-core" <ruby-core@...>

[#112284] [Ruby master Bug#19424] Degradation in **Marshal load** only in Ruby 3.1.2 compared to 2.7.4 — "sumitdey035 (Sumit Dey) via ruby-core" <ruby-core@...>

[#112286] Ruby 3.2.1 Released — "NARUSE, Yui via ruby-core" <ruby-core@...>

[#112287] [Ruby master Bug#19425] Merge the internal only "private" GC headers together — "eightbitraptor (Matthew Valentine-House) via ruby-core" <ruby-core@...>

[#112301] [Ruby master Bug#19426] Endless `Range#step` of object with `#succ` method does not work — "nobu (Nobuyoshi Nakada) via ruby-core" <ruby-core@...>

[#112304] [Ruby master Bug#19427] Marshal.load(source, freeze: true) doesn't freeze in some cases — "andrykonchin (Andrew Konchin) via ruby-core" <ruby-core@...>

[#112313] [Ruby master Feature#19428] Adding a "piped heredoc" feature — "shreeve (Steve Shreeve) via ruby-core" <ruby-core@...>

[#112318] [Ruby master Bug#11230] Should rb_struct_s_members() be public API? — "mame (Yusuke Endoh) via ruby-core" <ruby-core@...>

[#112320] [Ruby master Misc#19429] DevMeeting-2023-03-09 — "mame (Yusuke Endoh) via ruby-core" <ruby-core@...>

[#112326] [Ruby master Feature#19430] Contribution wanted: DNS lookup by c-ares library — "mame (Yusuke Endoh) via ruby-core" <ruby-core@...>

[#112329] [Ruby master Misc#19431] DevMeeting at RubyKaigi 2023 — "mame (Yusuke Endoh) via ruby-core" <ruby-core@...>

[#112333] [Ruby master Feature#19432] Introduce a wrapping operator (&) to Proc — "joel@... (Joel Drapper) via ruby-core" <ruby-core@...>

[#112342] [Ruby master Feature#15374] Proposal: Enable refinements to `#method_missing` — "shreeve (Steve Shreeve) via ruby-core" <ruby-core@...>

[#112352] [Ruby master Bug#19433] Segmentation fault in 3.2.0/3.2.1 on M1 Mac — "jsc (Justin Collins) via ruby-core" <ruby-core@...>

[#112359] [Ruby master Feature#10343] Postfix notations for `when` and `else` inside `case` statement — "rubyFeedback (mark potter) via ruby-core" <ruby-core@...>

[#112368] [Ruby master Bug#19434] Fix YJIT compilation for Alpine Linux 3.17.2 — "bkuhlmann (Brooke Kuhlmann) via ruby-core" <ruby-core@...>

[#112377] [Ruby master Feature#14982] Improve namespace system in ruby to avoiding top-level names chaos — "shioyama (Chris Salzberg) via ruby-core" <ruby-core@...>

[#112380] [Ruby master Misc#16507] =~ vs include? or match? — "rubyFeedback (robert heiler) via ruby-core" <ruby-core@...>

[#112398] [Ruby master Feature#19435] Expose counts for each GC reason in GC.stat — "byroot (Jean Boussier) via ruby-core" <ruby-core@...>

[#112399] [Ruby master Bug#19436] Call Cache for singleton methods can lead to "memory leaks" — "byroot (Jean Boussier) via ruby-core" <ruby-core@...>

[#112427] [Ruby master Feature#19437] Add marking and sweeping time to GC.stat — "peterzhu2118 (Peter Zhu) via ruby-core" <ruby-core@...>

[#112431] [Ruby master Bug#19438] Ruby 2.7 -> 3.2 Performance Regression — "nick.schwaderer (Nicholas Schwaderer) via ruby-core" <ruby-core@...>

[#112433] [Ruby master Bug#19439] Marshal.load doesn't load Regexp instance variables — "andrykonchin (Andrew Konchin) via ruby-core" <ruby-core@...>

[#112438] [Ruby master Feature#19440] Deprecate ThreadGroup — "Eregon (Benoit Daloze) via ruby-core" <ruby-core@...>

[#112441] [Ruby master Feature#13620] Simplifying MRI's build system: always make install — "Eregon (Benoit Daloze) via ruby-core" <ruby-core@...>

[#112445] [Ruby master Bug#19441] Closing an IO#dup behaviour — "stac47 (Laurent Stacul) via ruby-core" <ruby-core@...>

[#112446] [Ruby master Bug#19442] Remove USE_RINCGC flag — "eightbitraptor (Matthew Valentine-House) via ruby-core" <ruby-core@...>

[#112457] [Ruby master Feature#19443] Cache `Process.pid` — "byroot (Jean Boussier) via ruby-core" <ruby-core@...>

[#112458] [Ruby master Bug#19444] YJIT String#+@ miscompilations — "alanwu (Alan Wu) via ruby-core" <ruby-core@...>

[#112459] [Ruby master Bug#19445] Segmentation fault with Numeric#step — "hsbt (Hiroshi SHIBATA) via ruby-core" <ruby-core@...>

[#112474] [Ruby master Bug#19446] Remove `compiler_wd` related warnings in `tool/update-deps` — "eightbitraptor (Matthew Valentine-House) via ruby-core" <ruby-core@...>

[#112475] [Ruby master Bug#19447] Merge `internal/rgengc.h` into public `internal/gc.h` header — "eightbitraptor (Matthew Valentine-House) via ruby-core" <ruby-core@...>

[#112477] [Ruby master Bug#19448] [Hash] Using Set as default value — "bobanj (Boban Jovanoski) via ruby-core" <ruby-core@...>

[#112480] [Ruby master Bug#19449] en-us — "Kongpcmail (KAMPANAT THUMWONG) via ruby-core" <ruby-core@...>

[#112493] RUBY_PLATFORM shows `x32' on `x86' userspace — Eric Wong via ruby-core <ruby-core@...>

[#112494] [Ruby master Feature#19450] Is there an official way to set a class name without setting a constant? — "ioquatix (Samuel Williams) via ruby-core" <ruby-core@...>

[#112503] [Ruby master Feature#19451] Extract path and line number from SyntaxError? — "ioquatix (Samuel Williams) via ruby-core" <ruby-core@...>

[#112505] [Ruby master Bug#19452] `Thread::Backtrace::Location` should have column information if possible. — "ioquatix (Samuel Williams) via ruby-core" <ruby-core@...>

[#112517] [Ruby master Feature#19453] Move `Fiber.current` into core. — "ioquatix (Samuel Williams) via ruby-core" <ruby-core@...>

[#112523] [Ruby master Bug#19454] Instruction `send` has nil blockiseq parameter and ARGS_SIMPLE flag — "dmitry.pogrebnoy (Dmitry Pogrebnoy) via ruby-core" <ruby-core@...>

[#112533] [Ruby master Bug#19455] Ruby 3.2: wrong Regexp encoding with non-ASCII comments — janosch-x via ruby-core <ruby-core@...>

[#112534] [Ruby master Bug#19456] Incorrect line numbers in GC hook — "peterzhu2118 (Peter Zhu) via ruby-core" <ruby-core@...>

[#112536] [Ruby master Feature#19457] Some improvements to test-all suite — "luke-gru (Luke Gruber) via ruby-core" <ruby-core@...>

[#112538] [Ruby master Feature#19458] Expose HEREDOC identifier — "joelhawksley (Joel Hawksley) via ruby-core" <ruby-core@...>

[#112541] [Ruby master Bug#19459] Is `length` of `IO::Buffer#read` required or optional? — "nobu (Nobuyoshi Nakada) via ruby-core" <ruby-core@...>

[#112546] [Ruby master Bug#19460] inline method cache won't let class be garbage collected — "luke-gru (Luke Gruber) via ruby-core" <ruby-core@...>

[#112552] [Ruby master Bug#19461] Time.local performance tanks in forked process (on macOS only?) — "ioquatix (Samuel Williams) via ruby-core" <ruby-core@...>

[#112560] [Ruby master Bug#19462] MJIT not enabled with universal macOS x86_64 + arm64 build — "benhamilton (Ben Hamilton) via ruby-core" <ruby-core@...>

[#112562] [Ruby master Bug#13831] error when try to install — "joshc (Josh C) via ruby-core" <ruby-core@...>

[#112578] [Ruby master Bug#19463] YJIT `[BUG] Stack consistency error` under certain invalidation scenarios — "alanwu (Alan Wu) via ruby-core" <ruby-core@...>

[#112284] [Ruby master Bug#19424] Degradation in Marshal load only in Ruby 3.1.2 compared to 2.7.4 — "sumitdey035 (Sumit Dey) via ruby-core" <ruby-core@...>