From: takashikkbn@... Date: 2018-09-08T06:50:41+00:00 Subject: [ruby-core:88896] [Ruby trunk Feature#15085] Decrease memory cache usage of MJIT Issue #15085 has been updated by k0kubun (Takashi Kokubun). As long as I can see from the benchmark result for the improved case, it looks good. But at least I would like to see micro benchmarks for opt_send_without_block and send. Because of _mjit_compile_send, it may not be affected so much though. Also, how was the result for larger benchmarks (optcarrot, discourse, ...)? > And I guess it is related to memory caching, especially iTLB. > invokesuper can get faster with exported vm_search_super_method(), but I think it is not enough. My assumption on exporting only `rb_vm_search_method_slowpath` was that we should inline things as much as possible to exploit compiler optimizations but compiling (`rb_vm_search_method_slowpath` part of) `vm_search_method` was too slow to compile many methods within the default Optcarrot measurement period. I didn't care CPU cache for not compiling it, and I assume we should inline everything if compilation finishes in 0 second. Why do you think not inlining `vm_search_method` is more friendly for iTLB? Is the generated code size for `vm_search_method` is too big, or is loading instructions from vm_search_method efficient when the code for vm_search_method is shared with VM? ---------------------------------------- Feature #15085: Decrease memory cache usage of MJIT https://bugs.ruby-lang.org/issues/15085#change-73937 * Author: wanabe (_ wanabe) * Status: Open * Priority: Normal * Assignee: * Target version: ---------------------------------------- MJIT makes ruby-methods faster by ordinary, but I have observed that some cases are exceptional. I guess the one is caused by `invokesuper` instruction. And I guess it is related to memory caching, especially iTLB. Attached "export-big-func.patch" makes MJIT binary code for `invokesuper` smaller. "super.rb" is a benchmark script with benchmark_driver. "benchmark.log" is a result of super.rb. "benchmark-with-perf.log" is another result with `PERF_STAT` environment variable. The results are merely in my environment and depend to a large part on machine specs. `invokesuper` can get faster with exported `vm_search_super_method()`, but I think it is not enough. Because `perf stat` shows that there are still many iTLB-load-misses. I believe MJIT can grow fast with good care for CPU memory cache, not only iTLB but also L1 / L2 and so on. ---Files-------------------------------- export-big-func.patch (934 Bytes) super.rb (897 Bytes) benchmark.log (624 Bytes) benchmark-with-perf.log (7.05 KB) -- https://bugs.ruby-lang.org/ Unsubscribe: