From: ko1@... Date: 2019-10-17T08:44:54+00:00 Subject: [ruby-core:95391] [Ruby master Misc#16258] [PATCH] Combine call info and cache to speed up method invocation Issue #16258 has been updated by ko1 (Koichi Sasada). Assignee set to ko1 (Koichi Sasada) Thank you for your patch. Conclusion: OK. Points: * Current implementation separates ci and cc because of CoW friendliness (ci is immutable data and cc is mutable data). However, there are no measurements how it affect on CoW friendliness. Bcause ci is immutable data, we can pre-compile these data and it will improve startup time. However, there are no implementation of it. * For Guild, I will rewrite inline cache (cc) because of atomicity. However, Ruby 2.7 doesn't have this change. For Ruby 2.7 only this patch is accepted. ---------------------------------------- Misc #16258: [PATCH] Combine call info and cache to speed up method invocation https://bugs.ruby-lang.org/issues/16258#change-82104 * Author: alanwu (Alan Wu) * Status: Open * Priority: Normal * Assignee: ko1 (Koichi Sasada) ---------------------------------------- Proposed change: https://github.com/ruby/ruby/pull/2564 To perform a regular method call, the VM needs two structs, `rb_call_info` and `rb_call_cache`. At the moment, we allocate these two structures in separate buffers. In the worst case, the CPU needs to read 4 cache lines to complete a method call. Putting the two structures together reduces the maximum number of cache line reads to 2. Combining the structures also saves 8 bytes per call site as the current layout uses separate pointers for the call info and the call cache. This change saves about 2 MiB on Discourse. The Optcarrot benchmark receives a performance improvement from this patch. I collected the following results using `make install` binaries compiled with `-DRUBY_NDEBUG`, with a sample size of 50 for each category: | | master-a5245c | after patch | speed-up ratio | |-------|---------------|-------------|----------------| | plain | 42.39 | 50.17 | 18.35% | | jit | 71.72 | 72.73 | 1.41% | These are medium FPS from the benchmark output. For raw benchmark results and basic stats, see https://gist.github.com/XrXr/ce5cb7cf2c3c4d29e58c919fa5c86b33. I took these results with a i7-8750H CPU @ 2.20GHz on a 2018 MacBook Pro. I also ran the benchmark with a AMD 2400G running Arch Linux and observed a 3% improvement without the jit. ## Complications - A new instruction attribute `comptime_sp_inc` is introduced to calculate SP increase at compile time without using call caches. At compile time, a `TS_CALLDATA` operand points to a call info struct, but at runtime, the same operand points to a call data struct. Instruction that explicitly define `sp_inc` also need to define `comptime_sp_inc`. - MJIT code for copying call cache becomes slightly more complicated. - This changes the bytecode format, which might break existing tools. I think this patch offers a good general performance boost for a manageable amount of code change. -- https://bugs.ruby-lang.org/ Unsubscribe: