From: bruno@... Date: 2015-07-27T19:51:51+00:00 Subject: [ruby-core:70135] [Ruby trunk - Bug #11396] Bad performance in ruby >= 2.2 for Hash with many symbol keys Issue #11396 has been updated by Bruno Escherl. Eric Wong wrote: > Possible fix is to memoize hashval inside struct RSymbol: > > http://80x24.org/spew/m/1437992270-20549-1-git-send-email-e@80x24.org.txt > > Much better than before, but still slower than 2.1, I think. > > Only lightly-tested, and on hardware which doesn't benefit from > power-of-two sizing anyways. Sorry busy and don't have access to > better HW for benchmarking for a few days. Hi Eric, I compiled the ruby_2_2 branch with your patch and got the following results ruby 2.1.6p336 (2015-04-13 revision 50298) [x86_64-darwin14.0] string 144.345 (�� 3.5%) i/s - 728.000 symbol 506.609 (�� 2.4%) i/s - 2.550k ruby 2.2.3p147 (2015-07-04 revision 51143) [x86_64-darwin14] without patch string 138.830 (�� 6.5%) i/s - 700.000 symbol 75.236 (�� 4.0%) i/s - 378.000 ruby 2.2.3p147 (2015-07-04 revision 51143) [x86_64-darwin14] with patch string 147.566 (�� 4.7%) i/s - 742.000 symbol 495.675 (�� 6.9%) i/s - 2.494k The patch is working and getting quite close to 2.1.6 :-) For more realistic hashes I also used the script with 2000 keys and 100 lookups: ruby 2.1.6p336 (2015-04-13 revision 50298) [x86_64-darwin14.0] string 43.020k (�� 1.1%) i/s - 217.406k symbol 72.882k (�� 0.9%) i/s - 367.565k ruby 2.2.2p95 (2015-04-13 revision 50295) [x86_64-darwin14] string 43.348k (�� 2.8%) i/s - 219.402k symbol 22.336k (�� 3.6%) i/s - 113.049k ruby 2.2.3p147 (2015-07-04 revision 51143) [x86_64-darwin14] without patch string 44.412k (�� 3.9%) i/s - 224.773k symbol 41.240k (�� 3.3%) i/s - 209.721k ruby 2.2.3p147 (2015-07-04 revision 51143) [x86_64-darwin14] with patch string 44.537k (�� 2.3%) i/s - 224.561k symbol 85.511k (�� 1.7%) i/s - 427.952k So performance-wise this looks great! Can't judge of course, if there could be other side effects of this change. ---------------------------------------- Bug #11396: Bad performance in ruby >= 2.2 for Hash with many symbol keys https://bugs.ruby-lang.org/issues/11396#change-53566 * Author: Bruno Escherl * Status: Open * Priority: Normal * Assignee: * ruby -v: * Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN, 2.2: UNKNOWN ---------------------------------------- This started out as an issue on stackoverflow, where I found strange performance anomalies when comparing Set.include? and Array.include? in different ruby versions: http://stackoverflow.com/questions/31631284/performance-anomaly-in-ruby-set-include-with-symbols-2-2-2-vs-2-1-6 In the end it came down to problems with lookup of Hash keys. While for smaller Hashes the performance issues went away using ruby_2_2 branch, they staid for bigger Hashes. I'll attach a benchmark script (hash_bench_3.rb) I used that creates a Hash with 200000 keys and does a lookup of 10000 of them. Here my results: ruby 2.1.6p336 (2015-04-13 revision 50298) [x86_64-darwin14.0] string 142.818 (�� 2.8%) i/s - 714.000 symbol 505.831 (�� 3.0%) i/s - 2.550k ruby 2.2.2p95 (2015-04-13 revision 50295) [x86_64-darwin14] string 143.404 (�� 3.5%) i/s - 728.000 symbol 76.945 (�� 6.5%) i/s - 385.000 ruby 2.2.3p147 (2015-07-04 revision 51143) [x86_64-darwin14] self-compiled string 138.349 (�� 2.2%) i/s - 702.000 symbol 77.495 (�� 3.9%) i/s - 392.000 As you can see 2.2 is much slower than 2.1.6 for symbol keys. I was recommended to disable Garbage Collection for Symbols for testing and did so on the ruby_2_2 branch ruby 2.2.3p147 (2015-07-04 revision 51143) [x86_64-darwin14] self-compiled, USE_SYMBOL_GC=0 string 145.179 (�� 3.4%) i/s - 728.000 symbol 602.008 (�� 7.6%) i/s - 3.050k I would have expected that symbol GC may have some performance impact, but this looks like it is too big. I can't say exactly at which point Garbage Collection really hurts, but the bigger the Hash and the bigger the number of include? calls, the slower it gets. ---Files-------------------------------- hash_bench_3.rb (605 Bytes) -- https://bugs.ruby-lang.org/