[#78633] ruby/spec needs help from CRuby committers — Benoit Daloze <eregontp@...>

Currently, ruby/spec is maintained mostly by individuals and enjoys the

13 messages 2016/12/13

[ruby-core:78493] Re: [Ruby trunk Bug#13002] Hash calculations no longer using universal hashing

From: Martin J. Dürst <duerst@...>
Date: 2016-12-05 10:22:22 UTC
List: ruby-core #78493
Hello Victor, others,

On 2016/12/04 15:28, vmakarov@redhat.com wrote:
> Issue #13002 has been updated by Vladimir Makarov.

> Nobuyoshi Nakada wrote:
>> `test_wrapper_of_special_const` failed, that is, it's impossible to emulate switching weak and strong versions under the hood.
>> I think this behavior is quirky.
>
> You are right, the behavior with strong/weak hashes is wrong for Ruby.  This test is not guaranteed to work.  After reading the documentation thoroughly, I got a conclusion that hash value used for the hash tables should be *always* the same.  So the hash tables can not use two different hash functions at all.

Over the weekend, I started to think about this issue some more and to 
write an email. First, I should apologize for not having remembered the 
strong/weak hash proposal, because I had read it.

But I came up with a very simple example where I had doubts. After 
Nobu's mail, I saw that this is very similar to the example in Bug #9381.

> Although strong/weak hash approach gave about 5-6% improvement out of 45% on Ruby hash table benchmarks, I think we should not use it for Ruby.  I will provide a patch to get rid of it in 2 days.

I think it may still be somewhat too early to completely give up on the 
strong/weak distinction. I really like using CPU cycles only when 
necessary, and that's exactly what this proposal is about.

I think we have to distinguish three different cases:

1) Calls to #hash inside a class (as e.g. in Bug #9381):
This always has to use strong (in the sense of universal, not 
necessarily in the sense of cryptographically strong) hashing.

2) Calls to #hash for general objects/classes from inside st.c:
These have to always use strong hashing, because the hash is defined as 
a method on the class, and can't be switched between weak and strong.

3) Calculations of hash for special objects such as String, Symbol, 
Integer,... from directly inside st.c: In this case, I think it would be 
possible to use the weak/strong distinction. I also think that the 
majority, probably even a big majority, of hash keys are Strings, 
Symbols, and Integers, so that would keep a big part of the savings.
I haven't studied the code in detail, but I could imagine that 
special-casing for String, Symbol, and (the old Fixnum part of) Integer 
e.g. in do_hash could do the job. Something along the lines of the 
following pseudo-code:

/* The reserved hash value and its substitution.  */
#define RESERVED_HASH_VAL (~(st_hash_t) 0)
#define RESERVED_HASH_SUBSTITUTION_VAL ((st_hash_t) 0)

/* Return hash value of KEY for table TAB.  */
static inline st_hash_t
do_hash(st_data_t key, st_table *tab) {
     st_index_t h;
     switch (key.class) {
       case String:
       case Symbol:
       case Fixnum:
         # use weak or strong for basic types
         st_index_t h = (st_index_t)(tab->curr_hash)(key);
         break;
       default:
         # always use strong for all other types
         st_index_t h = (st_index_t)(strong_hash)(key);
     }
#### aside: Don't see why we need conditional compilation for the
####        following 5 lines.
#if SIZEOF_INT == SIZEOF_VOIDP
     st_hash_t hash = h;
#else
     st_hash_t hash = h;
#endif

     /* RESERVED_HASH_VAL is used for a deleted entry.  Map it into
        another value.  Such mapping should be extremely rare.  */
     return hash == RESERVED_HASH_VAL ? RESERVED_HASH_SUBSTITUTION_VAL : 
hash;
}

I understand that we don't want to 'pollute' st.c with Ruby-specific 
stuff, so this is just pseudocode. It adds a switch, but we can maybe 
balance that by reducing the switch in any_hash_general in hash.c. Even 
if not, the switch may still be faster than a complicated hash 
calculation on a string or symbol, which will loop per character.

Hope this helps,    Martin.

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

In This Thread

Prev Next