From: XrXr@... Date: 2019-10-17T01:46:15+00:00 Subject: [ruby-core:95373] [Ruby master Misc#16258] [PATCH] Combine call info and cache to speed up method invocation Issue #16258 has been reported by alanwu (Alan Wu). ---------------------------------------- Misc #16258: [PATCH] Combine call info and cache to speed up method invocation https://bugs.ruby-lang.org/issues/16258 * Author: alanwu (Alan Wu) * Status: Open * Priority: Normal * Assignee: ---------------------------------------- Proposed change: https://github.com/ruby/ruby/pull/2564 To perform a regular method call, the VM needs two structs, `rb_call_info` and `rb_call_cache`. At the moment, we allocate these two structures in separate buffers. In the worst case, the CPU needs to read 4 cache lines to complete a method call. Putting the two structures together reduces the maximum number of cache line reads to 2. Combining the structures also saves 8 bytes per call site as the current layout uses separate pointers for the call info and the call cache. This change saves about 2 MiB on Discourse. The Optcarrot benchmark receives a performance improvement from this patch. I collected the following results using `make install` binaries compiled with `-DRUBY_NDEBUG`, with a sample size of 50 for each category: | | master-a5245c | after patch | speed-up ratio | |-------|---------------|-------------|----------------| | plain | 42.39 | 50.17 | 18.35% | | jit | 71.72 | 72.73 | 1.41% | These are medium FPS from the benchmark output. For raw benchmark results and basic stats, see https://gist.github.com/XrXr/ce5cb7cf2c3c4d29e58c919fa5c86b33. I took these results with a i7-8750H CPU @ 2.20GHz on a 2018 MacBook Pro. I also ran the benchmark with a AMD 2400G running Arch Linux and observed a 3% improvement without the jit. ## Complications - A new instruction attribute `comptime_sp_inc` is introduced to calculate SP increase at compile time without using call caches. At compile time, a `TS_CALLDATA` operand points to a call info struct, but at runtime, the same operand points to a call data struct. Instruction that explicitly define `sp_inc` also need to define `comptime_sp_inc`. - MJIT code for copying call cache becomes slightly more complicated. - This changes the bytecode format, which might break existing tools. I think this patch offers a good general performance boost for a manageable amount of code change. -- https://bugs.ruby-lang.org/ Unsubscribe: