From: mame@... Date: 2020-10-22T09:18:22+00:00 Subject: [ruby-core:100497] [Ruby master Feature#16851] Ruby hashing algorithm could be improved using Tabulation Hashing Issue #16851 has been updated by mame (Yusuke Endoh). Status changed from Open to Feedback By replacing `rb_obj_id` with `other_func`, I cannot see any significant difference between the proposed patch and the current master. I believe that the large part of the performance improvement comes from the wrong omission of `#hash` call. So, I set the ticket status as "Feedback". Some advices: * The current hash function does not simply return object_id. Instead, it mixes a random salt to make it difficult for attackers to predict hash values for security reason. You may use [the initialization function](https://github.com/ruby/ruby/blob/d23d5c3130a0944e7e591370cbf8599009318a7c/random.c#L1578-L1585) to fill the random matrix. * Not only it mixes the salt, but also it scrambles a hash value with some big primes. See the function [mult_and_mix](https://github.com/ruby/ruby/blob/d23d5c3130a0944e7e591370cbf8599009318a7c/hash.c#L256-L286). You may want to try replacing the function with tabulation hashing. * Before any performance measurement, please make sure it pass the test suite. ---------------------------------------- Feature #16851: Ruby hashing algorithm could be improved using Tabulation Hashing https://bugs.ruby-lang.org/issues/16851#change-88117 * Author: ana06 (Ana Maria Martinez Gomez) * Status: Feedback * Priority: Normal ---------------------------------------- I have implemented Linear Probing and Simple tabulation in Ruby: https://github.com/Ana06/ruby/compare/master...Ana06:tabulation I tested it using the following code: https://github.com/Ana06/ruby-tabulation/blob/master/benchmark_tabulation.rb It benchmarks the insertion of 600000 elements (by hashing their 64 bits ids) in an empty hash 100 times. Below are the times (in seconds) I got executing this code for different versions of the Ruby code: - master (without Simple Tabulation): 29.68 - master with Linear Probing (without Simple Tabulation): 99.76 - master with Quadratic Probing (without Simple Tabulation): 29.97 - master with Simple Tabulation: 15.76 - master with Linear Probing and Simple Tabulation: 24.23 - **master with Quadratic Probing and Simple Tabulation: 13.73** `master` refers to ruby 2.8.0dev: (2020-05-07T16:22:38Z master 7ded8fd) [x86_64-linux]. I tried with 8 x Intel i5-8265U. Although this is just a proof of concept and not something that could be already use in production, I think it proves the potential of Simple Tabulation to improve current Ruby implementation. It may be worthwhile to explore this idea further. What do you think? I did this as part of a project for my [Advanced Algorithms course](http://www.cs.columbia.edu/~andoni/advancedS20/index.html). For more details check the project report: https://github.com/Ana06/ruby-tabulation/blob/master/latex/RubyTabulation_Project.pdf -- https://bugs.ruby-lang.org/ Unsubscribe: