From: normalperson@... Date: 2014-01-22T08:12:53+00:00 Subject: [ruby-core:59968] [ruby-trunk - Feature #9425] [PATCH] st: use power-of-two sizes to avoid slow modulo ops Issue #9425 has been updated by Eric Wong. Urabe Shyouhei wrote: > No, sorry I see no evident speedup. When I wrote the patch I thought the > function was used for Hash#rehash, but it turned out Hash#rehash uses > something different (don't know why). The optimization is valid I > believe but in fact used very rarely. Alright. My understanding is branch mispredict costs are higher than the memory stores which would be avoided. The expensive part is loading memory on cache miss, and that is not avoided. We'll probably need to poke around with perf or similar tools to analyze/confirm this. ---------------------------------------- Feature #9425: [PATCH] st: use power-of-two sizes to avoid slow modulo ops https://bugs.ruby-lang.org/issues/9425#change-44499 * Author: Eric Wong * Status: Open * Priority: Low * Assignee: * Category: core * Target version: current: 2.2.0 ---------------------------------------- Prime number-sized hash tables are only needed to compensate for bad hash functions. Ruby has good hash functions nowadays, so reduce our code size with power-of-two-sized hash tables which allows us to avoid the slow modulo operation. I expected numhash performance to be worse, but it seems most of those hashes are either too-small-to-matter or well-distributed anyways. If we find problems with some existing numhashes we should start using a proper hash function (Murmur via st_hash_uint) on those. This consistently speeds up the bm_hash_flatten and bm_vm2_bighash. target 0: trunk (ruby 2.2.0dev (2014-01-17 trunk 44631) [x86_64-linux]) at "/home/ew/rrrr/i/bin/ruby --disable=gems" target 1: st-noprime (ruby 2.2.0dev (2014-01-17 trunk 44631) [x86_64-linux]) at "/home/ew/ruby/i/bin/ruby --disable=gems" Benchmarks on a Xeon E3-1230 v3 CPU: minimum results in each 10 measurements. Execution time (sec) name trunk st-noprime hash_flatten 0.500 0.345 hash_keys 0.191 0.192 hash_shift 0.019 0.018 hash_values 0.201 0.200 loop_whileloop2 0.090 0.090 vm2_bighash* 4.457 3.578 Speedup ratio: compare with the result of `trunk' (greater is better) name st-noprime hash_flatten 1.451 hash_keys 0.998 hash_shift 1.046 hash_values 1.003 loop_whileloop2 1.000 vm2_bighash* 1.246 Somewhat less impressive on an AMD FX 8320: minimum results in each 10 measurements. Execution time (sec) name trunk st-noprime hash_flatten 0.633 0.596 hash_keys 0.236 0.232 hash_shift 0.031 0.032 hash_values 0.234 0.238 loop_whileloop2 0.135 0.135 vm2_bighash* 8.198 6.982 Speedup ratio: compare with the result of `trunk' (greater is better) name st-noprime hash_flatten 1.063 hash_keys 1.020 hash_shift 0.976 hash_values 0.982 loop_whileloop2 1.000 vm2_bighash* 1.174 ---------------------------------------------------------------- The following changes since commit 2c3522c3e403dfdadaaf6095564bde364cc4bddf: test_thread.rb: stop at once (2014-01-16 06:34:47 +0000) are available in the git repository at: git://80x24.org/ruby.git st-noprime for you to fetch changes up to ed4f4103f4f407ed99dd6cd25b6c35d3aa9f3479: st: use power-of-two sizes to avoid slow modulo ops (2014-01-18 04:05:21 +0000) ---------------------------------------------------------------- Eric Wong (1): st: use power-of-two sizes to avoid slow modulo ops st.c | 100 +++++++++++++++++-------------------------------------------------- 1 file changed, 25 insertions(+), 75 deletions(-) ---Files-------------------------------- 0001-st-use-power-of-two-sizes-to-avoid-slow-modulo-ops.patch (10.9 KB) -- http://bugs.ruby-lang.org/