[ruby-core:93962] [Ruby master Feature#15602] Eliminate recording full-width hash value for small Hash
From:
ko1@...
Date:
2019-07-29 05:01:32 UTC
List:
ruby-core #93962
Issue #15602 has been updated by ko1 (Koichi Sasada).
Assignee set to ko1 (Koichi Sasada)
Status changed from Open to Assigned
Patch is updated:
https://github.com/ko1/ruby/tree/hash_small_ar
NEWS:
* Support 32bit CPU
* Support > 127 Hash iteration level
* Fix bugs
# Evaluation
## Discourse benchmark
### overall benchmark
With discourse benchmark, there is no speed improvement.
```
# master
ruby 2.7.0dev (2019-07-26T07:34:15Z master 51f22deadb) [x86_64-linux]
categories:
50: 37
75: 38
90: 46
99: 76
home:
50: 38
75: 39
90: 47
99: 83
topic:
50: 38
75: 39
90: 41
99: 63
categories_admin:
50: 60
75: 64
90: 75
99: 113
home_admin:
50: 63
75: 64
90: 77
99: 111
topic_admin:
50: 64
75: 66
90: 73
99: 102
timings:
load_rails: 3593
ruby-version: 2.7.0-p-1
rss_kb: 316540
pss_kb: 307648
# this patch
ruby 2.7.0dev (2019-07-26T08:11:10Z :detached: 8f8d83fa3f) [x86_64-linux]
categories:
50: 36
75: 37
90: 44
99: 70
home:
50: 37
75: 38
90: 47
99: 82
topic:
50: 37
75: 38
90: 45
99: 71
categories_admin:
50: 62
75: 65
90: 74
99: 130
home_admin:
50: 62
75: 64
90: 76
99: 125
topic_admin:
50: 63
75: 65
90: 75
99: 118
timings:
load_rails: 3674
ruby-version: 2.7.0-p-1
rss_kb: 269296
pss_kb: 260475
```
However, it achieves 50MB memory efficiency.
```
master:
rss_kb: 316540
pss_kb: 307648
this patch:
rss_kb: 269296
pss_kb: 260475
```
### conflict measurement
I added new debug counters:
* hit: the count hint match and `eql?() #=> true`
* miss: the count hint match but `eql?() #=> false`
* notfound: the count that there are no hint match.
In otherwords, lookup count is "hit + notfound".
```
[RUBY_DEBUG_COUNTER] artable_hint_hit 81,394,995
[RUBY_DEBUG_COUNTER] artable_hint_miss 968,533
[RUBY_DEBUG_COUNTER] artable_hint_notfound 84,984,795
```
With discourse benchmark, we can see 160M lookup and 1M miss.
Hint values will be conflict in 1/256 (because hint is 1B). So not strange result (*1).
(*1) 0.6M is ideal, so there is a room to improve. However, making hint algorithm more complicated introduce additional overhead.
1M times usless `eql?` can be a matter.
### hint algorithm
To make 1B from hash value (8B), now I only use `(unsigned char)hash_value`, the lowest 8 bits.
There are several algorithm:
* (1) lowest 8 bit
* (2) xor with least 4B
* (3) xor with least 2B
* (4) using 15 to 8 bits (`(unsigned char)hash_value >> 8`)
However, (1) got high-score (1M misses. others > 2M misses).
## Rdoc benchmark
(`make gcbench-rdoc`)
```
master
{:count=>179,
:heap_allocated_pages=>9008,
:heap_sorted_length=>9008,
:heap_allocatable_pages=>0,
:heap_available_slots=>3671670,
:heap_live_slots=>2609737,
:heap_free_slots=>1061933,
:heap_final_slots=>0,
:heap_marked_slots=>2447202,
:heap_eden_pages=>9008,
:heap_tomb_pages=>0,
:total_allocated_pages=>9008,
:total_freed_pages=>0,
:total_allocated_objects=>33443045,
:total_freed_objects=>30833308,
:malloc_increase_bytes=>222056,
:malloc_increase_bytes_limit=>33554432,
:minor_gc_count=>151,
:object_id_collisions=>0,
:major_gc_count=>28,
:remembered_wb_unprotected_objects=>2490,
:remembered_wb_unprotected_objects_limit=>4976,
:old_objects=>2443098,
:old_objects_limit=>4848924,
:oldmalloc_increase_bytes=>8658088,
:oldmalloc_increase_bytes_limit=>71306460}
ruby 2.7.0dev (2019-07-26T07:34:15Z master 51f22deadb) [x86_64-linux] ["USE_RGENGC", "RGENGC_DEBUG", "RGENGC_ESTIMATE_OLDMALLOC", "GC_ENABLE_LAZY_SWEEP"]
/home/ko1/ruby/v2/src/trunk/benchmark/gc/rdoc.rb
user system total real
25.753310 0.308549 26.061859 ( 26.065280)
GC total time (sec): 0
VmHWM: 467852 kB
Summary of rdoc on 2.7.0dev 26.065279869362712 0 179
(real time in sec, GC time in sec, GC count)
small_hash
{:count=>175,
:heap_allocated_pages=>10295,
:heap_sorted_length=>10295,
:heap_allocatable_pages=>0,
:heap_available_slots=>4196252,
:heap_live_slots=>2687349,
:heap_free_slots=>1508903,
:heap_final_slots=>0,
:heap_marked_slots=>2445290,
:heap_eden_pages=>10295,
:heap_tomb_pages=>0,
:total_allocated_pages=>10295,
:total_freed_pages=>0,
:total_allocated_objects=>33443475,
:total_freed_objects=>30756126,
:malloc_increase_bytes=>8655184,
:malloc_increase_bytes_limit=>33554432,
:minor_gc_count=>146,
:object_id_collisions=>0,
:major_gc_count=>29,
:remembered_wb_unprotected_objects=>2490,
:remembered_wb_unprotected_objects_limit=>4976,
:old_objects=>2441968,
:old_objects_limit=>4848944,
:oldmalloc_increase_bytes=>16372800,
:oldmalloc_increase_bytes_limit=>89024695}
ruby 2.7.0dev (2019-07-26T08:11:10Z :detached: 8f8d83fa3f) [x86_64-linux] ["USE_RGENGC", "RGENGC_DEBUG", "RGENGC_ESTIMATE_OLDMALLOC", "GC_ENABLE_LAZY_SWEEP"]
../../src/hash_small_ar/benchmark/gc/rdoc.rb
user system total real
25.876454 0.392925 26.269379 ( 26.273089)
GC total time (sec): 0
VmHWM: 495984 kB
Summary of rdoc on 2.7.0dev 26.273089297115803 0 175
(real time in sec, GC time in sec, GC count)
```
VmHWM is corrupted :(
Maybe because GC count is not so high because of low `xmalloc()` consumption.
----------------------------------------
Feature #15602: Eliminate recording full-width hash value for small Hash
https://bugs.ruby-lang.org/issues/15602#change-80153
* Author: ko1 (Koichi Sasada)
* Status: Assigned
* Priority: Normal
* Assignee: ko1 (Koichi Sasada)
* Target version:
----------------------------------------
# Abstract
Let's shape up small hash value (1 to 8 entries) from 192B to 128B on 64bit ptr environments.
# Data structure proposal
(step 1) Record only key and value pairs.
Now Ruby 2.6, 1 to 8 entry Hash objects allocate 192 byte (8B * 3 (key, value and hash value triple) * 8 entry = 192B) with ar_table (instead of st_table).
Eliminating to record hash value will reduce this allocation from 192B to 128B (8 * 2 * 8).
(step 2) 1 byte hash value
For 1 to 8 entries, full-width Hash value (8 bytes) may be too long to lookup the entry.
1 byte hash value can be generated from 8 byte hash value.
(`hash_value & 0xff` is most simple way to get it, but not sure it is enough)
Name 1 byte hash value as "hash hint" on my patch.
(step 3) Embed hash hint into RHash
`RHash::iter_lev` is used to recognize nesting level of a hash (`h.each{ "h's iter_lev is 1 here" }`).
However, there are only a few cases that this value is bigger than 1.
So we can put this value into flags (if it exceeds the limit, we can store this value into hidden attribute).
8 hash hints becomese `8B == sizeof(VALUE)`, so we can embed this value into RHash.
# Discussion
* Pros.
* We can reduce allocation size of small Hash.
* Increase cache locality on hash lookup
* We don't need to touch ar_table (allocate memory) if hash hints doesn't match.
* We can access correct ar_table entry directly.
* Cons.
* hash hints can conflict more than full-width hash value => may increase `eql?` call.
* performance down
* incompatibility
# Evaluation
I tested this patch and it slightly increase performance (not so big, on my micro-benchmark).
Memory consumption is reduced theoretically.
# Patch
https://github.com/ko1/ruby/tree/hash_small_ar
--
https://bugs.ruby-lang.org/
Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>