From: vmakarov@... Date: 2016-04-29T22:36:06+00:00 Subject: [ruby-core:75257] [Ruby trunk Feature#12142] Hash tables with open addressing Issue #12142 has been updated by Vladimir Makarov. File base.patch added File hash.patch added File strong_hash.patch added File city.patch added Since submitting my first patch for new Ruby hash tables, a lot of code was reworked and a lot of new implementation details were tried. I feel that I reached a point from where I can not improve the hash table performance anymore. So I am submitting the final patches (of course I am ready to change code if/after people review it and propose meaningful changes). I broke the code into 4 patches. The first patch is the base patch. The most work I did for new hash table implementation is in this patch. The patch alone gives 32-48% average performance improvement on Ruby hash table benchmarks The second patch changes some simple hash functions to make them faster. I gives 5-8% speedup additionally. The third patch might seem controversial. It implements a code to use faster but not strong hash functions and to recognize hash table denial attacks and switch to stronger hash function (currently to Ruby siphash). It gives additional 6-7% average performance improvement. The forth patch changes MurMur hash to Google City64 hash. Although on very long keys City64 is two times faster than MurMur in Ruby, its usage actually makes Ruby hash table benchmarks slower. So I **do not** propose to add this patch to Ruby trunk. I put this patch here only for people who might be interesting to play with it. All patches were tested on x86/x86-64, ARM, and PPC64 with `make check`. I did not find any additional test regressions. The patches were also benchmarked on Ruby hash table benchmarks on Intel 4.2GHz i7-4970K, 2GHz ARM Exynos5422, and 3.55GHz Power7. This time I used more accurate measurements using the following script: ``` ruby ../ruby/benchmark/driver.rb -p hash -r 3 -e trunk:: -e current:: 2>/dev/null|awk 'NF==2 && /hash/ {s+=$2;n++;print} END{print s/n}' ``` Here are the *average* performance results of the patches relative to the old hash tables (old execution time / new execution time): ``` x86-64 ARM PPC64 base patch 1.32326 1.48507 1.36359 above+hash patch 1.39067 1.53537 1.44748 above+strong hash patch 1.45615 1.59833 1.50193 above+City64 patch 1.43611 1.58185 1.48878 ``` If somebody is interesting in the results for each benchmark, I put them at the end of the message. Are the first three patches OK for the trunk? If it is so, I don't know how to commit them to the trunk. So could somebody commit them or explain the procedure how to do it in this case. ``` base patch: x86-64 ARM PPC64 bighash 1.640 1.523 1.062 hash_aref_dsym 0.929 0.924 0.950 hash_aref_dsym_long 1.419 1.445 1.366 hash_aref_fix 0.857 0.885 1.002 hash_aref_flo 1.433 1.078 1.136 hash_aref_miss 1.011 0.897 0.989 hash_aref_str 0.975 0.979 0.940 hash_aref_sym 1.002 0.910 0.999 hash_aref_sym_long 1.035 0.923 1.025 hash_flatten 1.184 1.282 1.377 hash_ident_flo 0.930 1.087 1.054 hash_ident_num 0.911 0.931 0.987 hash_ident_obj 0.950 0.937 0.992 hash_ident_str 0.950 0.908 1.002 hash_ident_sym 0.936 0.907 0.990 hash_keys 2.793 1.365 1.786 hash_long 0.993 0.949 1.008 hash_shift 1.265 2.615 1.735 hash_shift_u16 1.292 2.208 1.724 hash_shift_u24 1.286 2.192 1.672 hash_shift_u32 1.287 2.003 1.651 hash_small2 0.991 1.018 0.946 hash_small4 0.993 0.990 0.992 hash_small8 2.186 2.195 2.301 hash_to_proc 1.036 1.032 1.026 hash_values 2.781 1.303 1.751 vm2_bighash* 2.663 6.611 4.354 1.32326 1.48507 1.36359 base+hash patches: bighash 1.594 1.193 1.126 hash_aref_dsym 0.942 0.920 0.983 hash_aref_dsym_long 1.421 1.438 1.362 hash_aref_fix 1.031 0.961 1.139 hash_aref_flo 1.921 1.093 1.455 hash_aref_miss 1.047 1.010 1.058 hash_aref_str 1.039 0.981 1.030 hash_aref_sym 1.000 0.897 1.000 hash_aref_sym_long 1.039 0.899 1.041 hash_flatten 1.122 1.290 1.326 hash_ident_flo 0.962 0.966 1.091 hash_ident_num 0.957 0.935 1.035 hash_ident_obj 0.952 0.959 0.990 hash_ident_str 0.942 0.931 0.976 hash_ident_sym 0.967 0.859 0.990 hash_keys 2.733 1.364 1.681 hash_long 1.012 0.970 1.028 hash_shift 1.402 2.973 1.894 hash_shift_u16 1.332 2.367 1.720 hash_shift_u24 1.317 2.374 1.735 hash_shift_u32 1.330 2.169 1.705 hash_small2 1.010 0.965 0.938 hash_small4 1.006 0.962 1.033 hash_small8 2.252 2.158 2.341 hash_to_proc 1.031 1.045 1.010 hash_values 2.864 1.305 1.626 vm2_bighash* 3.323 7.471 5.769 1.39067 1.53537 1.44748 base+hash+strong hash patches: bighash 1.580 1.169 1.097 hash_aref_dsym 0.968 0.962 1.015 hash_aref_dsym_long 1.428 1.501 1.333 hash_aref_fix 1.047 0.977 1.146 hash_aref_flo 1.917 1.052 1.398 hash_aref_miss 1.236 1.424 1.168 hash_aref_str 1.346 1.451 1.123 hash_aref_sym 1.000 0.913 0.989 hash_aref_sym_long 1.049 0.936 1.055 hash_flatten 1.133 1.287 1.374 hash_ident_flo 0.943 1.024 1.091 hash_ident_num 0.968 0.934 1.007 hash_ident_obj 0.968 0.888 0.968 hash_ident_str 0.939 0.918 0.993 hash_ident_sym 0.984 0.868 0.990 hash_keys 2.979 1.372 1.746 hash_long 1.594 2.374 1.487 hash_shift 1.442 2.683 1.962 hash_shift_u16 1.391 2.291 1.841 hash_shift_u24 1.359 2.240 1.794 hash_shift_u32 1.366 2.076 1.791 hash_small2 1.024 1.005 1.012 hash_small4 1.080 0.965 1.104 hash_small8 2.272 2.167 2.459 hash_to_proc 1.016 1.026 1.016 hash_values 2.889 1.291 1.760 vm2_bighash* 3.398 7.361 5.833 1.45615 1.59833 1.50193 base+hash+strong hash+City64 patches: bighash 1.584 1.174 1.158 hash_aref_dsym 0.941 0.872 0.996 hash_aref_dsym_long 1.428 1.449 1.345 hash_aref_fix 1.041 0.842 1.153 hash_aref_flo 1.928 1.025 1.386 hash_aref_miss 1.191 1.210 1.140 hash_aref_str 1.264 1.269 1.182 hash_aref_sym 0.970 0.835 1.007 hash_aref_sym_long 1.079 0.899 1.044 hash_flatten 1.118 1.282 1.371 hash_ident_flo 0.953 0.995 1.089 hash_ident_num 0.963 0.914 1.012 hash_ident_obj 0.965 0.778 0.981 hash_ident_str 0.968 0.851 0.980 hash_ident_sym 0.962 0.849 0.968 hash_keys 2.778 1.372 1.713 hash_long 1.524 1.907 1.367 hash_shift 1.412 3.038 1.930 hash_shift_u16 1.391 2.446 1.812 hash_shift_u24 1.353 2.452 1.799 hash_shift_u32 1.329 2.278 1.807 hash_small2 1.025 1.004 0.992 hash_small4 1.007 0.994 1.069 hash_small8 2.302 2.258 2.425 hash_to_proc 1.050 1.079 0.993 hash_values 2.805 1.292 1.687 vm2_bighash* 3.444 7.346 5.791 ``` ---------------------------------------- Feature #12142: Hash tables with open addressing https://bugs.ruby-lang.org/issues/12142#change-58387 * Author: Vladimir Makarov * Status: Open * Priority: Normal * Assignee: ---------------------------------------- ~~~ Hello, the following patch contains a new implementation of hash tables (major files st.c and include/ruby/st.h). Modern processors have several levels of cache. Usually,the CPU reads one or a few lines of the cache from memory (or another level of cache). So CPU is much faster at reading data stored close to each other. The current implementation of Ruby hash tables does not fit well to modern processor cache organization, which requires better data locality for faster program speed. The new hash table implementation achieves a better data locality mainly by o switching to open addressing hash tables for access by keys. Removing hash collision lists lets us avoid *pointer chasing*, a common problem that produces bad data locality. I see a tendency to move from chaining hash tables to open addressing hash tables due to their better fit to modern CPU memory organizations. CPython recently made such switch (https://hg.python.org/cpython/file/ff1938d12240/Objects/dictobject.c). PHP did this a bit earlier https://nikic.github.io/2014/12/22/PHPs-new-hashtable-implementation.html. GCC has widely-used such hash tables (https://gcc.gnu.org/svn/gcc/trunk/libiberty/hashtab.c) internally for more than 15 years. o removing doubly linked lists and putting the elements into an array for accessing to elements by their inclusion order. That also removes pointer chaising on the doubly linked lists used for traversing elements by their inclusion order. A more detailed description of the proposed implementation can be found in the top comment of the file st.c. The new implementation was benchmarked on 21 MRI hash table benchmarks for two most widely used targets x86-64 (Intel 4.2GHz i7-4790K) and ARM (Exynos 5410 - 1.6GHz Cortex-A15): make benchmark-each ITEM=bm_hash OPTS='-r 3 -v' COMPARE_RUBY='' Here the results for x86-64: hash_aref_dsym 1.094 hash_aref_dsym_long 1.383 hash_aref_fix 1.048 hash_aref_flo 1.860 hash_aref_miss 1.107 hash_aref_str 1.107 hash_aref_sym 1.191 hash_aref_sym_long 1.113 hash_flatten 1.258 hash_ident_flo 1.627 hash_ident_num 1.045 hash_ident_obj 1.143 hash_ident_str 1.127 hash_ident_sym 1.152 hash_keys 2.714 hash_shift 2.209 hash_shift_u16 1.442 hash_shift_u24 1.413 hash_shift_u32 1.396 hash_to_proc 2.831 hash_values 2.701 The average performance improvement is more 50%. ARM results are analogous -- no any benchmark performance degradation and about the same average improvement. The patch can be seen as https://github.com/vnmakarov/ruby/compare/trunk...hash_tables_with_open_addressing.patch or in a less convenient way as pull request changes https://github.com/ruby/ruby/pull/1264/files This is my first patch for MRI and may be my proposal and implementation have pitfalls. But I am keen to learn and work on inclusion of this code into MRI. ~~~ ---Files-------------------------------- 0001-st.c-change-st_table-implementation.patch (59.4 KB) st-march31.patch (114 KB) base.patch (93.8 KB) hash.patch (4.48 KB) strong_hash.patch (8.08 KB) city.patch (19.4 KB) -- https://bugs.ruby-lang.org/ Unsubscribe: