From: the.codefolio.guy@... Date: 2016-08-25T20:55:14+00:00 Subject: [ruby-core:77070] [Ruby trunk Bug#12599] For CLang, increase inline-threshold to get 7%-10% speedup of optcarrot Issue #12599 has been updated by Noah Gibbs. Including ALWAYS_INLINE in the header and then defining the method in a .c file doesn't seem to successfully inline the function invocations -- they still show up in a GPerfTools profiling listing, for instance. So I think that using ALWAYS_INLINE successfully is going to require inlining by including the function body in the header, like with rb_scan_args_lead_p(). That will be hard in some of these cases - for instance, rb_get_alloc_func uses macro definitions in the function body that aren't always defined when include/ruby/ruby.h is included. So that will require both an extra copy of the function, and changes to the function definition. That's possible, but seems like a significant cost to me. Opinions? ---------------------------------------- Bug #12599: For CLang, increase inline-threshold to get 7%-10% speedup of optcarrot https://bugs.ruby-lang.org/issues/12599#change-60291 * Author: Noah Gibbs * Status: Open * Priority: Normal * Assignee: * ruby -v: 2.4.0dev * Backport: 2.1: UNKNOWN, 2.2: UNKNOWN, 2.3: UNKNOWN ---------------------------------------- Here's a patch to set -inline-threshold where it's supported -- it's only for CLang, so I think this is mostly on Mac OS. Clang's default inline threshold complexity is 225 (see "https://groups.google.com/forum/#!topic/llvm-dev/GpU79q9JzJI"). By turning it up to 5000, the Ruby binary's size goes from about 3MB to 6MB, but there's an overall speedup of the optcarrot benchmark of about 7%. Here are roughly the speedups I found, using 500+ runs of the optcarrot benchmark for each check: Threshold: Binary size: Speedup on optcarrot: 5000 6MB 7% 2500 5.5MB 6% 1800 4.8MB 5% 1000 4.4MB 5% (hard to measure diff between 1000 and 1800) There doesn't seem to be any increase in dynamic memory use - this is only inlining the C code compiled by CLang/LLVM, not changing any Ruby data structures at runtime, so the memory cost seems to only be paid once. For a desktop Mac in particular, it seems like using 3MB extra for a 7% speedup is a really good deal. ---Files-------------------------------- inline-threshold.patch (1.03 KB) -- https://bugs.ruby-lang.org/ Unsubscribe: