From: vmakarov@...
Date: 2016-04-29T22:36:06+00:00
Subject: [ruby-core:75257] [Ruby trunk Feature#12142] Hash tables with open	addressing

Issue #12142 has been updated by Vladimir Makarov.

File base.patch added
File hash.patch added
File strong_hash.patch added
File city.patch added

  Since submitting my first patch for new Ruby hash tables, a lot of
code was reworked and a lot of new implementation details were tried.
I feel that I reached a point from where I can not improve the hash
table performance anymore.  So I am submitting the final patches (of
course I am ready to change code if/after people review it and propose
meaningful changes).

  I broke the code into 4 patches.
  
  The first patch is the base patch.  The most work I did for new hash
table implementation is in this patch.  The patch alone gives 32-48%
average performance improvement on Ruby hash table benchmarks

  The second patch changes some simple hash functions to make them
faster.  I gives 5-8% speedup additionally.

  The third patch might seem controversial.  It implements a code to
use faster but not strong hash functions and to recognize hash table
denial attacks and switch to stronger hash function (currently to Ruby
siphash).  It gives additional 6-7% average performance improvement.

  The forth patch changes MurMur hash to Google City64 hash.  Although
on very long keys City64 is two times faster than MurMur in Ruby, its
usage actually makes Ruby hash table benchmarks slower.  So I **do
not** propose to add this patch to Ruby trunk.  I put this patch here
only for people who might be interesting to play with it.

  All patches were tested on x86/x86-64, ARM, and PPC64 with `make
check`.  I did not find any additional test regressions.

  The patches were also benchmarked on Ruby hash table benchmarks on
Intel 4.2GHz i7-4970K, 2GHz ARM Exynos5422, and 3.55GHz Power7.  This
time I used more accurate measurements using the following script:

```
ruby ../ruby/benchmark/driver.rb -p hash -r 3 -e trunk::<trunk-miniruby> -e current::<miniruby-with-patches> 2>/dev/null|awk 'NF==2 && /hash/ {s+=$2;n++;print} END{print s/n}'
```

Here are the *average* performance results of the patches relative to
the old hash tables (old execution time / new execution time):

```
                         x86-64       ARM        PPC64
base patch               1.32326      1.48507    1.36359
above+hash patch         1.39067      1.53537    1.44748
above+strong hash patch  1.45615      1.59833    1.50193
above+City64 patch       1.43611      1.58185    1.48878
```

  If somebody is interesting in the results for each benchmark, I put
them at the end of the message.

  Are the first three patches OK for the trunk?  If it is so, I don't
know how to commit them to the trunk.  So could somebody commit them
or explain the procedure how to do it in this case.


```
base patch:         x86-64       ARM        PPC64
bighash             1.640        1.523      1.062
hash_aref_dsym      0.929        0.924      0.950
hash_aref_dsym_long 1.419        1.445      1.366
hash_aref_fix       0.857        0.885      1.002
hash_aref_flo       1.433        1.078      1.136
hash_aref_miss      1.011        0.897      0.989
hash_aref_str       0.975        0.979      0.940
hash_aref_sym       1.002        0.910      0.999
hash_aref_sym_long  1.035        0.923      1.025
hash_flatten        1.184        1.282      1.377
hash_ident_flo      0.930        1.087      1.054
hash_ident_num      0.911        0.931      0.987
hash_ident_obj      0.950        0.937      0.992
hash_ident_str      0.950        0.908      1.002
hash_ident_sym      0.936        0.907      0.990
hash_keys           2.793        1.365      1.786
hash_long           0.993        0.949      1.008
hash_shift          1.265        2.615      1.735
hash_shift_u16      1.292        2.208      1.724
hash_shift_u24      1.286        2.192      1.672
hash_shift_u32      1.287        2.003      1.651
hash_small2         0.991        1.018      0.946
hash_small4         0.993        0.990      0.992
hash_small8         2.186        2.195      2.301
hash_to_proc        1.036        1.032      1.026
hash_values         2.781        1.303      1.751
vm2_bighash*        2.663        6.611      4.354
                    1.32326      1.48507    1.36359


base+hash patches:
bighash             1.594        1.193      1.126
hash_aref_dsym      0.942        0.920      0.983
hash_aref_dsym_long 1.421        1.438      1.362
hash_aref_fix       1.031        0.961      1.139
hash_aref_flo       1.921        1.093      1.455
hash_aref_miss      1.047        1.010      1.058
hash_aref_str       1.039        0.981      1.030
hash_aref_sym       1.000        0.897      1.000
hash_aref_sym_long  1.039        0.899      1.041
hash_flatten        1.122        1.290      1.326
hash_ident_flo      0.962        0.966      1.091
hash_ident_num      0.957        0.935      1.035
hash_ident_obj      0.952        0.959      0.990
hash_ident_str      0.942        0.931      0.976
hash_ident_sym      0.967        0.859      0.990
hash_keys           2.733        1.364      1.681
hash_long           1.012        0.970      1.028
hash_shift          1.402        2.973      1.894
hash_shift_u16      1.332        2.367      1.720
hash_shift_u24      1.317        2.374      1.735
hash_shift_u32      1.330        2.169      1.705
hash_small2         1.010        0.965      0.938
hash_small4         1.006        0.962      1.033
hash_small8         2.252        2.158      2.341
hash_to_proc        1.031        1.045      1.010
hash_values         2.864        1.305      1.626
vm2_bighash*        3.323        7.471      5.769
                    1.39067      1.53537    1.44748

base+hash+strong hash patches:
bighash             1.580        1.169      1.097
hash_aref_dsym      0.968        0.962      1.015
hash_aref_dsym_long 1.428        1.501      1.333
hash_aref_fix       1.047        0.977      1.146
hash_aref_flo       1.917        1.052      1.398
hash_aref_miss      1.236        1.424      1.168
hash_aref_str       1.346        1.451      1.123
hash_aref_sym       1.000        0.913      0.989
hash_aref_sym_long  1.049        0.936      1.055
hash_flatten        1.133        1.287      1.374
hash_ident_flo      0.943        1.024      1.091
hash_ident_num      0.968        0.934      1.007
hash_ident_obj      0.968        0.888      0.968
hash_ident_str      0.939        0.918      0.993
hash_ident_sym      0.984        0.868      0.990
hash_keys           2.979        1.372      1.746
hash_long           1.594        2.374      1.487
hash_shift          1.442        2.683      1.962
hash_shift_u16      1.391        2.291      1.841
hash_shift_u24      1.359        2.240      1.794
hash_shift_u32      1.366        2.076      1.791
hash_small2         1.024        1.005      1.012
hash_small4         1.080        0.965      1.104
hash_small8         2.272        2.167      2.459
hash_to_proc        1.016        1.026      1.016
hash_values         2.889        1.291      1.760
vm2_bighash*        3.398        7.361      5.833
                    1.45615      1.59833    1.50193

base+hash+strong hash+City64 patches:
bighash             1.584        1.174      1.158
hash_aref_dsym      0.941        0.872      0.996
hash_aref_dsym_long 1.428        1.449      1.345
hash_aref_fix       1.041        0.842      1.153
hash_aref_flo       1.928        1.025      1.386
hash_aref_miss      1.191        1.210      1.140
hash_aref_str       1.264        1.269      1.182
hash_aref_sym       0.970        0.835      1.007
hash_aref_sym_long  1.079        0.899      1.044
hash_flatten        1.118        1.282      1.371
hash_ident_flo      0.953        0.995      1.089
hash_ident_num      0.963        0.914      1.012
hash_ident_obj      0.965        0.778      0.981
hash_ident_str      0.968        0.851      0.980
hash_ident_sym      0.962        0.849      0.968
hash_keys           2.778        1.372      1.713
hash_long           1.524        1.907      1.367
hash_shift          1.412        3.038      1.930
hash_shift_u16      1.391        2.446      1.812
hash_shift_u24      1.353        2.452      1.799
hash_shift_u32      1.329        2.278      1.807
hash_small2         1.025        1.004      0.992
hash_small4         1.007        0.994      1.069
hash_small8         2.302        2.258      2.425
hash_to_proc        1.050        1.079      0.993
hash_values         2.805        1.292      1.687
vm2_bighash*        3.444        7.346      5.791
```

----------------------------------------
Feature #12142: Hash tables with open addressing
https://bugs.ruby-lang.org/issues/12142#change-58387

* Author: Vladimir Makarov
* Status: Open
* Priority: Normal
* Assignee: 
----------------------------------------
~~~
 Hello, the following patch contains a new implementation of hash
tables (major files st.c and include/ruby/st.h).

  Modern processors have several levels of cache.  Usually,the CPU
reads one or a few lines of the cache from memory (or another level of
cache).  So CPU is much faster at reading data stored close to each
other.  The current implementation of Ruby hash tables does not fit
well to modern processor cache organization, which requires better
data locality for faster program speed.

The new hash table implementation achieves a better data locality
mainly by

  o switching to open addressing hash tables for access by keys.
    Removing hash collision lists lets us avoid *pointer chasing*, a
    common problem that produces bad data locality.  I see a tendency
    to move from chaining hash tables to open addressing hash tables
    due to their better fit to modern CPU memory organizations.
    CPython recently made such switch
    (https://hg.python.org/cpython/file/ff1938d12240/Objects/dictobject.c).
    PHP did this a bit earlier
    https://nikic.github.io/2014/12/22/PHPs-new-hashtable-implementation.html.
    GCC has widely-used such hash tables
    (https://gcc.gnu.org/svn/gcc/trunk/libiberty/hashtab.c) internally
    for more than 15 years.

  o removing doubly linked lists and putting the elements into an array
    for accessing to elements by their inclusion order.  That also
    removes pointer chaising on the doubly linked lists used for
    traversing elements by their inclusion order.

A more detailed description of the proposed implementation can be
found in the top comment of the file st.c.

The new implementation was benchmarked on 21 MRI hash table benchmarks
for two most widely used targets x86-64 (Intel 4.2GHz i7-4790K) and ARM
(Exynos 5410 - 1.6GHz Cortex-A15):

make benchmark-each ITEM=bm_hash OPTS='-r 3 -v' COMPARE_RUBY='<trunk ruby>'

Here the results for x86-64:

hash_aref_dsym       1.094
hash_aref_dsym_long          1.383
hash_aref_fix        1.048
hash_aref_flo        1.860
hash_aref_miss       1.107
hash_aref_str        1.107
hash_aref_sym        1.191
hash_aref_sym_long           1.113
hash_flatten         1.258
hash_ident_flo       1.627
hash_ident_num       1.045
hash_ident_obj       1.143
hash_ident_str       1.127
hash_ident_sym       1.152
hash_keys            2.714
hash_shift           2.209
hash_shift_u16       1.442
hash_shift_u24       1.413
hash_shift_u32       1.396
hash_to_proc         2.831
hash_values          2.701

The average performance improvement is more 50%.  ARM results are
analogous -- no any benchmark performance degradation and about the
same average improvement.

The patch can be seen as

https://github.com/vnmakarov/ruby/compare/trunk...hash_tables_with_open_addressing.patch

or in a less convenient way as pull request changes

https://github.com/ruby/ruby/pull/1264/files


This is my first patch for MRI and may be my proposal and
implementation have pitfalls.  But I am keen to learn and work on
inclusion of this code into MRI.

~~~

---Files--------------------------------
0001-st.c-change-st_table-implementation.patch (59.4 KB)
st-march31.patch (114 KB)
base.patch (93.8 KB)
hash.patch (4.48 KB)
strong_hash.patch (8.08 KB)
city.patch (19.4 KB)


-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>