[#61424] [REJECT?] xmalloc/xfree: reduce atomic ops w/ thread-locals — Eric Wong <normalperson@...>

I'm unsure about this. I _hate_ the extra branches this adds;

13 messages 2014/03/12

[ruby-core:61577] [ruby-trunk - Feature #9425] [PATCH] st: use power-of-two sizes to avoid slow modulo ops

From: funny.falcon@...
Date: 2014-03-18 09:48:57 UTC
List: ruby-core #61577
Issue #9425 has been updated by Yura Sokolov.


 Can you test this
  + if (SYMBOL_P(a)) a = (a >> RUBY_SPECIAL_SHIFT) ^ (a >>
 (RUBY_SPECIAL_SHIFT + 3)); /* 3=ID_SCOPE_SHIFT */
 
 
 2014-03-18 13:02 GMT+04:00 Eric Wong <normalperson@yhbt.net>:
 
 > normalperson@yhbt.net wrote:
 > >       hash_aref_sym   1.000
 >
 > Lack of improvement here was disappointing since symbol keys are common,
 > and this showed a regression on my x86 (32-bit) VMs.
 >
 > I tweaked rb_any_hash to be symbol-aware:
 >
 >         http://bogomips.org/ruby.git/patch?id=497ed6355
 >
 > 12-30% improvement on this test from trunk depending on CPU so far \o/
 > (Phenom II X4 got 30%, newer/faster x86-64 CPUs show less speedup).
 >
 > I'm comfortable with improvements of this series on x86 VMs running on
 > x86-64 (and of course native x86-64).
 >
 > Can anybody with real 32-bit hardware verify this series?  Not sure I
 > can trust VM results; my remaining x86 hardware is on its last legs
 > and showing occasional HW errors.
 >
 > git://80x24.org/ruby.git st-noprime-v4
 >
 >       st: use power-of-two sizes to avoid slow modulo ops
 >       add hash benchmarks
 >       st.c: tweak numhash to match common Ruby use cases
 >       hash.c: improve symbol hash distribution
 >

----------------------------------------
Feature #9425: [PATCH] st: use power-of-two sizes to avoid slow modulo ops
https://bugs.ruby-lang.org/issues/9425#change-45860

* Author: Eric Wong
* Status: Open
* Priority: Low
* Assignee: 
* Category: core
* Target version: current: 2.2.0
----------------------------------------
Prime number-sized hash tables are only needed to compensate for bad
hash functions.  Ruby has good hash functions nowadays, so reduce our
code size with power-of-two-sized hash tables which allows us to avoid
the slow modulo operation.  I expected numhash performance to be worse,
but it seems most of those hashes are either too-small-to-matter or
well-distributed anyways.  If we find problems with some existing
numhashes we should start using a proper hash function (Murmur via
st_hash_uint) on those.

This consistently speeds up the bm_hash_flatten and bm_vm2_bighash.

target 0: trunk (ruby 2.2.0dev (2014-01-17 trunk 44631) [x86_64-linux]) at "/home/ew/rrrr/i/bin/ruby --disable=gems"
target 1: st-noprime (ruby 2.2.0dev (2014-01-17 trunk 44631) [x86_64-linux]) at "/home/ew/ruby/i/bin/ruby --disable=gems"

Benchmarks on a Xeon E3-1230 v3 CPU:

	minimum results in each 10 measurements.
	Execution time (sec)
	name	trunk	st-noprime
	hash_flatten	0.500	0.345
	hash_keys	0.191	0.192
	hash_shift	0.019	0.018
	hash_values	0.201	0.200
	loop_whileloop2	0.090	0.090
	vm2_bighash*	4.457	3.578

	Speedup ratio: compare with the result of `trunk' (greater is better)
	name	st-noprime
	hash_flatten	1.451
	hash_keys	0.998
	hash_shift	1.046
	hash_values	1.003
	loop_whileloop2	1.000
	vm2_bighash*	1.246

Somewhat less impressive on an AMD FX 8320:

	minimum results in each 10 measurements.
	Execution time (sec)
	name    trunk   st-noprime
	hash_flatten    0.633   0.596
	hash_keys       0.236   0.232
	hash_shift      0.031   0.032
	hash_values     0.234   0.238
	loop_whileloop2 0.135   0.135
	vm2_bighash*    8.198   6.982

	Speedup ratio: compare with the result of `trunk' (greater is better)
	name    st-noprime
	hash_flatten    1.063
	hash_keys       1.020
	hash_shift      0.976
	hash_values     0.982
	loop_whileloop2 1.000
	vm2_bighash*    1.174

----------------------------------------------------------------
The following changes since commit 2c3522c3e403dfdadaaf6095564bde364cc4bddf:

  test_thread.rb: stop at once (2014-01-16 06:34:47 +0000)

are available in the git repository at:

  git://80x24.org/ruby.git st-noprime

for you to fetch changes up to ed4f4103f4f407ed99dd6cd25b6c35d3aa9f3479:

  st: use power-of-two sizes to avoid slow modulo ops (2014-01-18 04:05:21 +0000)

----------------------------------------------------------------
Eric Wong (1):
      st: use power-of-two sizes to avoid slow modulo ops

 st.c | 100 +++++++++++++++++--------------------------------------------------
 1 file changed, 25 insertions(+), 75 deletions(-)


---Files--------------------------------
0001-st-use-power-of-two-sizes-to-avoid-slow-modulo-ops.patch (10.9 KB)


-- 
https://bugs.ruby-lang.org/

In This Thread

Prev Next