[#61424] [REJECT?] xmalloc/xfree: reduce atomic ops w/ thread-locals — Eric Wong <normalperson@...>

I'm unsure about this. I _hate_ the extra branches this adds;

13 messages 2014/03/12

[ruby-core:61575] [ruby-trunk - Feature #9425] [PATCH] st: use power-of-two sizes to avoid slow modulo ops

From: normalperson@...
Date: 2014-03-18 09:12:02 UTC
List: ruby-core #61575
Issue #9425 has been updated by Eric Wong.


 normalperson@yhbt.net wrote:
 >  	hash_aref_sym	1.000
 
 Lack of improvement here was disappointing since symbol keys are common,
 and this showed a regression on my x86 (32-bit) VMs.
 
 I tweaked rb_any_hash to be symbol-aware:
 
 	http://bogomips.org/ruby.git/patch?id=497ed6355
 
 12-30% improvement on this test from trunk depending on CPU so far \o/
 (Phenom II X4 got 30%, newer/faster x86-64 CPUs show less speedup).
 
 I'm comfortable with improvements of this series on x86 VMs running on
 x86-64 (and of course native x86-64).
 
 Can anybody with real 32-bit hardware verify this series?  Not sure I
 can trust VM results; my remaining x86 hardware is on its last legs
 and showing occasional HW errors.
 
 git://80x24.org/ruby.git st-noprime-v4
 
       st: use power-of-two sizes to avoid slow modulo ops
       add hash benchmarks
       st.c: tweak numhash to match common Ruby use cases
       hash.c: improve symbol hash distribution

----------------------------------------
Feature #9425: [PATCH] st: use power-of-two sizes to avoid slow modulo ops
https://bugs.ruby-lang.org/issues/9425#change-45859

* Author: Eric Wong
* Status: Open
* Priority: Low
* Assignee: 
* Category: core
* Target version: current: 2.2.0
----------------------------------------
Prime number-sized hash tables are only needed to compensate for bad
hash functions.  Ruby has good hash functions nowadays, so reduce our
code size with power-of-two-sized hash tables which allows us to avoid
the slow modulo operation.  I expected numhash performance to be worse,
but it seems most of those hashes are either too-small-to-matter or
well-distributed anyways.  If we find problems with some existing
numhashes we should start using a proper hash function (Murmur via
st_hash_uint) on those.

This consistently speeds up the bm_hash_flatten and bm_vm2_bighash.

target 0: trunk (ruby 2.2.0dev (2014-01-17 trunk 44631) [x86_64-linux]) at "/home/ew/rrrr/i/bin/ruby --disable=gems"
target 1: st-noprime (ruby 2.2.0dev (2014-01-17 trunk 44631) [x86_64-linux]) at "/home/ew/ruby/i/bin/ruby --disable=gems"

Benchmarks on a Xeon E3-1230 v3 CPU:

	minimum results in each 10 measurements.
	Execution time (sec)
	name	trunk	st-noprime
	hash_flatten	0.500	0.345
	hash_keys	0.191	0.192
	hash_shift	0.019	0.018
	hash_values	0.201	0.200
	loop_whileloop2	0.090	0.090
	vm2_bighash*	4.457	3.578

	Speedup ratio: compare with the result of `trunk' (greater is better)
	name	st-noprime
	hash_flatten	1.451
	hash_keys	0.998
	hash_shift	1.046
	hash_values	1.003
	loop_whileloop2	1.000
	vm2_bighash*	1.246

Somewhat less impressive on an AMD FX 8320:

	minimum results in each 10 measurements.
	Execution time (sec)
	name    trunk   st-noprime
	hash_flatten    0.633   0.596
	hash_keys       0.236   0.232
	hash_shift      0.031   0.032
	hash_values     0.234   0.238
	loop_whileloop2 0.135   0.135
	vm2_bighash*    8.198   6.982

	Speedup ratio: compare with the result of `trunk' (greater is better)
	name    st-noprime
	hash_flatten    1.063
	hash_keys       1.020
	hash_shift      0.976
	hash_values     0.982
	loop_whileloop2 1.000
	vm2_bighash*    1.174

----------------------------------------------------------------
The following changes since commit 2c3522c3e403dfdadaaf6095564bde364cc4bddf:

  test_thread.rb: stop at once (2014-01-16 06:34:47 +0000)

are available in the git repository at:

  git://80x24.org/ruby.git st-noprime

for you to fetch changes up to ed4f4103f4f407ed99dd6cd25b6c35d3aa9f3479:

  st: use power-of-two sizes to avoid slow modulo ops (2014-01-18 04:05:21 +0000)

----------------------------------------------------------------
Eric Wong (1):
      st: use power-of-two sizes to avoid slow modulo ops

 st.c | 100 +++++++++++++++++--------------------------------------------------
 1 file changed, 25 insertions(+), 75 deletions(-)


---Files--------------------------------
0001-st-use-power-of-two-sizes-to-avoid-slow-modulo-ops.patch (10.9 KB)


-- 
https://bugs.ruby-lang.org/

In This Thread

Prev Next