[#108771] [Ruby master Bug#18816] Ractor segfaulting MacOS 12.4 (aarch64 / M1 processor) — "brodock (Gabriel Mazetto)" <noreply@...>

Issue #18816 has been reported by brodock (Gabriel Mazetto).

8 messages 2022/06/05

[#108802] [Ruby master Feature#18821] Expose Pattern Matching interfaces in core classes — "baweaver (Brandon Weaver)" <noreply@...>

Issue #18821 has been reported by baweaver (Brandon Weaver).

9 messages 2022/06/08

[#108822] [Ruby master Feature#18822] Ruby lack a proper method to percent-encode strings for URIs (RFC 3986) — "byroot (Jean Boussier)" <noreply@...>

Issue #18822 has been reported by byroot (Jean Boussier).

18 messages 2022/06/09

[#108937] [Ruby master Bug#18832] Suspicious superclass mismatch — "fxn (Xavier Noria)" <noreply@...>

Issue #18832 has been reported by fxn (Xavier Noria).

16 messages 2022/06/15

[#108976] [Ruby master Misc#18836] DevMeeting-2022-07-21 — "mame (Yusuke Endoh)" <noreply@...>

Issue #18836 has been reported by mame (Yusuke Endoh).

12 messages 2022/06/17

[#109043] [Ruby master Bug#18876] OpenSSL is not available with `--with-openssl-dir` — "Gloomy_meng (Gloomy Meng)" <noreply@...>

Issue #18876 has been reported by Gloomy_meng (Gloomy Meng).

18 messages 2022/06/23

[#109052] [Ruby master Bug#18878] parse.y: Foo::Bar {} is inconsistently rejected — "qnighy (Masaki Hara)" <noreply@...>

Issue #18878 has been reported by qnighy (Masaki Hara).

9 messages 2022/06/26

[#109055] [Ruby master Bug#18881] IO#read_nonblock raises IOError when called following buffered character IO — "javanthropus (Jeremy Bopp)" <noreply@...>

Issue #18881 has been reported by javanthropus (Jeremy Bopp).

9 messages 2022/06/26

[#109063] [Ruby master Bug#18882] File.read cuts off a text file with special characters when reading it on MS Windows — magynhard <noreply@...>

Issue #18882 has been reported by magynhard (Matth辰us Johannes Beyrle).

15 messages 2022/06/27

[#109081] [Ruby master Feature#18885] Long lived fork advisory API (potential Copy on Write optimizations) — "byroot (Jean Boussier)" <noreply@...>

Issue #18885 has been reported by byroot (Jean Boussier).

23 messages 2022/06/28

[#109083] [Ruby master Bug#18886] Struct aref and aset don't trigger any tracepoints. — "ioquatix (Samuel Williams)" <noreply@...>

Issue #18886 has been reported by ioquatix (Samuel Williams).

8 messages 2022/06/29

[#109095] [Ruby master Misc#18888] Migrate ruby-lang.org mail services to Google Domains and Google Workspace — "shugo (Shugo Maeda)" <noreply@...>

Issue #18888 has been reported by shugo (Shugo Maeda).

16 messages 2022/06/30

[ruby-core:109042] [Ruby master Feature#18875] Speed up ISeq marking by introducing a bitmap and rearranging inline caches

From: "tenderlovemaking (Aaron Patterson)" <noreply@...>
Date: 2022-06-22 22:55:37 UTC
List: ruby-core #109042
Issue #18875 has been reported by tenderlovemaking (Aaron Patterson).

----------------------------------------
Feature #18875: Speed up ISeq marking by introducing a bitmap and rearranging inline caches
https://bugs.ruby-lang.org/issues/18875

* Author: tenderlovemaking (Aaron Patterson)
* Status: Open
* Priority: Normal
----------------------------------------
A large percentage of major GC time is spent marking instruction sequence objects.  This PR aims to speed up major GC by speeding up marking instruction sequence objects.

## Marking ISeq objects

Today we have to disassemble instruction sequences in order to mark them.  The disassembly process looks for GC allocated objects and marks them.  To disassemble an iseq, we have to iterate over each instruction, convert the instruction from an address back to the original op code integer, then look up the parameters for the op code.  Once we know the parameter types, we can iterate though them and mark "interesting" references.  We can see this process in [the `iseq_extract_values` function](https://github.com/ruby/ruby/blob/744d17ff6c33b09334508e8110007ea2a82252f5/iseq.c#L247-L313).

According to profile results, the biggest bottleneck in this function is converting addresses back to instruction ids.

## Speeding up ISeq marking

To speed up ISeq marking, this PR introduces two changes.  The first change is adding a bitmap, and the second change is rearranging inline caches to be more "convenient".

## Bitmaps

At compilation time, we allocate a bitmap along side of the iseq object.  The bitmap indicates offsets of VALUE objects inside the instruction sequences.  When marking an instruction, we can simply iterate over the bitmap to find VALUE objects that need to be marked.

## Inline Cache Rearrangement

Inline cache types `IC`, `IVC`, `ICVARC`, and `ISE` are allocated from [a buffer that is stored on the iseq constant body](https://github.com/ruby/ruby/blob/744d17ff6c33b09334508e8110007ea2a82252f5/vm_core.h#L447).  These caches are a union type.  Unfortunately, these union types don't have a "type" field, so they can only be distinguished by looking at the parameter types of an instruction.

Take the following Ruby code for example:

```
Foo =~ /#{foo}/o;
```

The instruction sequences for this code are as follows:

```
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,17)> (catch: FALSE)
0000 opt_getinlinecache                     9, <is:0>                 (   1)[Li]
0003 putobject                              true
0005 getconstant                            :Foo
0007 opt_setinlinecache                     <is:0>
0009 once                                   block in <main>, <is:1>
0012 opt_regexpmatch2                       <calldata!mid:=~, argc:1, ARGS_SIMPLE>[CcCr]
0014 leave
```

The ISeq object contains two entries in the `is_entries` buffer, one for the `ISE` cache associated with the `once` instruction, and one for the `IC` cache associated with the `opt_getinlinecache` and `opt_setinlinecache` instructions.

Unfortunately we cannot iterate through the caches in the `is_entries` list because the union types don't have the same layout.  [Marking an `ISE`](https://github.com/ruby/ruby/blob/744d17ff6c33b09334508e8110007ea2a82252f5/iseq.c#L296-L306) is very different than [marking an `IC`](https://github.com/ruby/ruby/blob/744d17ff6c33b09334508e8110007ea2a82252f5/iseq.c#L270-L280), and we can only differentiate them by disassembling and checking the instruction sequences themselves.

To solve this problem, this PR introduces [3 counters for the different types of inline caches](https://github.com/ruby/ruby/compare/master...tenderlove:iseq-bitmap?expand=1#diff-2989766b46aac389cc58ca9c83fac2bb85c03f3a3d37b176b4be673c9d56e0e4R463-R465).  Then, we group inline cache types within the `is_entries` buffer.
Since the inline cache types are grouped, we can use the counters to iterate over the buffer and we know what type is being used.

Combining bitmap marking and inline cache arrangement means that we can mark instruction sequences without disassembling the instructions.

## Speed impact

I benchmarked this change with a basic Rails application using the following script:

```ruby
puts RUBY_DESCRIPTION

require "benchmark/ips"

Benchmark.ips do |x|
  x.report("major GC") { GC.start }
end
```

Here are the results with the master version of Ruby:

```
$ RAILS_ENV=production gel exec bin/rails r test.rb
ruby 3.2.0dev (2022-06-22T12:30:39Z master 744d17ff6c) [arm64-darwin21]
Warming up --------------------------------------
            major GC     4.000  i/100ms
Calculating -------------------------------------
            major GC     47.748  (賊 2.1%) i/s -    240.000  in   5.028520s
```

Here it is with these patches applied:

```
$ RAILS_ENV=production gel exec bin/rails r test.rb
ruby 3.2.0dev (2022-06-22T20:52:13Z iseq-bitmap 2ba736a7f9) [arm64-darwin21]
Warming up --------------------------------------
            major GC     7.000  i/100ms
Calculating -------------------------------------
            major GC     77.208  (賊 1.3%) i/s -    392.000  in   5.079023s
```

With these patches applied, major GC is about 60% faster.

## Memory impact

The memory increase is proportional to the number of instructions stored on an iseq.  This works about to be about 1% increase in the size of an iseq (`ceil(iseq_length / 64)` on 64 bit platforms).

## Future work

This PR always mallocs a bitmap table.  We can eliminate the malloc when:

1. There is nothing to mark
2. iseq_length is <= 64


I posted a [pull request on GitHub](https://github.com/ruby/ruby/pull/6053), and I've attached a squashed version of the patches.  The diff includes some unused code which I used to verify the bitmaps.  If we decide to merge this I'll remove the unused code.

---Files--------------------------------
0001-Squashed-commit-of-the-following.patch (23.6 KB)


-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

In This Thread

Prev Next