From: ko1@... Date: 2018-08-23T07:25:34+00:00 Subject: [ruby-core:88613] [Ruby trunk Feature#14989] Add Hash support for transient heap Issue #14989 has been updated by ko1 (Koichi Sasada). Assignee set to ko1 (Koichi Sasada) ---------------------------------------- Feature #14989: Add Hash support for transient heap https://bugs.ruby-lang.org/issues/14989#change-73674 * Author: tacinight (Yimin Zhao) * Status: Open * Priority: Normal * Assignee: ko1 (Koichi Sasada) * Target version: ---------------------------------------- Hi, there I am a GSoC student following @ko1 to do some work for Ruby, here I want to make a brief introduction for one of my work in this period. ## Transient Heap The first discussion of transient heap was opened by @ko1 at [Introduce 2nd GC heap named Transient heap](https://bugs.ruby-lang.org/issues/14858). In his prototype, he completed the implement for arrays, and the future work should also support the type String, Hash as well as Extend(shrink) policy for transient heap. My work here is adding support for Hash so that transient heap can allocate space for Hash. I encounted some difficults that had to be handled. 1. Hash type objects use st_table to store elements, the correlation between those two is weak. For transient heap, it has to know the ruby type of the objects. 2. Transient heap is based on the hypothesis that most of objects die at a young age. To understand the issue with first need to survey the distribution of hash size. ### The distribution of hash size #### Implement The capacity of st_table is always a power of 2, we recorded the exponent into a global array before gc or functions explicitly frees st_table. Look at the [patch](https://github.com/Tacinight/ruby-gsoc-2018/blob/master/patch/st_table_size_statistics.patch) for more details. #### Results We measured 6 applications in order to find a real distribution of the hash size. Respectively, those applications are: 1. `make check` command under ruby source code. 2. `make gcbench_rdoc` command under ruby source code. 3. `gem uninstall --all` to uninstall over 300 gems 4. `bundle install` commmand under project [discourse](https://github.com/discourse/discourse) 5. `discourse` manually explore the web pages and simulate the different events for about 20 minutes (development environment) 6. `rails_ruby_bench` project [rails_ruby_bench](https://github.com/noahgibbs/rails_ruby_bench) with parameters : 8 workers, 1500 iterations 7. `rails_ruby_bench 2` 8 workers, 12000 iterations 8. `rails_ruby_bench 3` 20 workders, 20000 iterations The results show that objects under size 8(area of exponent 2 to 3) account for more than 80% of the total data in most situations. The raw data can be found at [st_table size statistics](https://docs.google.com/spreadsheets/d/1xAjO_qb5K49aLnvk8SypGwO5Avtbm2X12cYb1d-n6Xs/edit?usp=sharing). ### Add Hash support According to the previous survey result, we use a array-like fixed-size table to store the elements. We name it as LinearTable. In this way, we have two benefits: 1. We don't have to create st_table for small hash(but it will when necessary) 2. the code is written into hash.c (while st_table code is in st.c). its easy to bind small hash with transient heap, and by default, when using st_table, it uses the malloced space. So the basic data structure look like this. ```c #define LINEAR_TABLE_MAX_SIZE 8 typedef struct li_table_entry { VALUE hash; VALUE key; VALUE record; } li_table_entry; typedef struct LinearTable { const struct st_hash_type *type; li_table_entry entries[LINEAR_TABLE_MAX_SIZE]; } li_table; struct RHash { struct RBasic basic; union { struct st_table *ntbl; /* possibly 0 */ struct LinearTable *ltbl; } as; int iter_lev; const VALUE ifnone; }; ``` #### The layout of li_table_entry Considering the size of 8 hashes/keys/records is just the size of a cache line, I also tried different data layouts. ```c #define LINEAR_TABLE_MAX_SIZE 8 // the original layout typedef struct li_table_entry { VALUE hash; VALUE key; VALUE record; } li_table_entry; typedef struct LinearTable { const struct st_hash_type *type; li_table_entry entries[LINEAR_TABLE_MAX_SIZE]; } li_table; ``` ```c #define LINEAR_TABLE_MAX_SIZE 8 // the original layout typedef struct li_table_entry { VALUE hash; VALUE key; VALUE record; } li_table_entry; typedef struct LinearTable { const struct st_hash_type *type; li_table_entry entries[LINEAR_TABLE_MAX_SIZE]; } li_table; ``` ```c #define LINEAR_TABLE_MAX_SIZE 8 // The Variant 1 typedef struct li_table_entry { VALUE key; VALUE record; } li_table_entry; typedef struct LinearTable { const struct st_hash_type *type; VALUE hashes[LINEAR_TABLE_MAX_SIZE]; li_table_entry pairs[LINEAR_TABLE_MAX_SIZE]; } li_table; ``` ```c #define LINEAR_TABLE_MAX_SIZE 8 // The Variant 2 typedef struct li_table_entry { VALUE hash; VALUE key; } li_table_entry; typedef struct LinearTable { const struct st_hash_type *type; li_table_entry entries[LINEAR_TABLE_MAX_SIZE]; VALUE records[LINEAR_TABLE_MAX_SIZE]; } li_table; ``` ```c #define LINEAR_TABLE_MAX_SIZE 8 // The Variant 3 typedef struct LinearTable { const struct st_hash_type *type; VALUE hashes[LINEAR_TABLE_MAX_SIZE]; VALUE keys[LINEAR_TABLE_MAX_SIZE]; VALUE records[LINEAR_TABLE_MAX_SIZE]; } li_table; ``` The [microbench results](https://docs.google.com/spreadsheets/d/1Ag6DoAsmTNJkt3nmHyfXRQgy6fLhEnZJQF2G5GbykLc/edit?usp=sharing) show the original edition and variant 1 have better performance than the other two. The patches can be found in the repository as [linear_table_v1.patch](https://github.com/Tacinight/ruby-gsoc-2018/blob/master/patch/linear_table_v1.patch), [linear_table_v2.patch](https://github.com/Tacinight/ruby-gsoc-2018/blob/master/patch/linear_table_v2.patch), [linear_table_v3.patch](https://github.com/Tacinight/ruby-gsoc-2018/blob/master/patch/linear_table_v3.patch), [linear_table_v4.patch](https://github.com/Tacinight/ruby-gsoc-2018/blob/master/patch/linear_table_v4.patch); #### Integrate data into flag bits Considering the maximum size of entries is 8. We can store the num_entries and num_bound into RBasic(hash)->flag; Related patch is [here](https://github.com/Tacinight/ruby-gsoc-2018/blob/master/patch/0003-integrate-data-to-hash-flag.patch) which based on [linear_table_v1.patch](https://github.com/Tacinight/ruby-gsoc-2018/blob/master/patch/linear_table_v1.patch) ### Benchmark results I execuated comprehensive benchmark test based on official benchmark tool - [benchmark-driver](https://github.com/benchmark-driver/benchmark-driver). 1. [linear_table_benchmark](https://docs.google.com/spreadsheets/d/18JzY-q-boOZFu_GTwraRXSTJJVbt9Wmg7M2P4-8g41o/edit?usp=sharing), includes the `make benchmark` results of trunk vs. v1(the original version) vs. v2(the variant 1). 2. [transient hash benchmark](https://docs.google.com/spreadsheets/d/19074A0H0nwBQoumTb0aF-k9_a3GBBLA9NnFIpPGraPU/edit?usp=sharing), includes the `make benchmark` results of trunk vs. transient_heap(based on v1) vs. integrated flag 3. [linear_table_variant_benchmark](https://docs.google.com/spreadsheets/d/1Ag6DoAsmTNJkt3nmHyfXRQgy6fLhEnZJQF2G5GbykLc/edit?usp=sharing), includes the micro-benchmark results of trunk vs. v1 vs. v2 vs. v3 vs. v4. The used benchrmark scripts are located at microbench directory. 4. [linear_table real app bench](https://docs.google.com/spreadsheets/d/1WoT2zRxm16DA-0fOf55HZKmRN9v2YJK4HoaMi6yq_0k/edit?usp=sharing), includes the test results of the six applications noted above. 5. [transient heap microbench](https://docs.google.com/spreadsheets/d/1guaP_93ds1eSbdSg87gBrtiuJ8JzSFHJb_xcQsxn8eg/edit?usp=sharing), include the micro-benchrmark resutls of trunk vs. v1 vs. transient-hash vs integrated-flag. This time I also add memory usage comparison. 6. [linear_table_vary_size_microbench](https://docs.google.com/spreadsheets/d/1I7NFc893H7cUh0CAFgBupT259pVMqRFjC9CQwlE8Kyw/edit?usp=sharing), we once worried the size of linear table will impact on the benchmark results because of the linear search. So we test the linear table with different size and the results show the table size make no relevant difference. ### Future work 1. In [transient hash benchmark](https://docs.google.com/spreadsheets/d/19074A0H0nwBQoumTb0aF-k9_a3GBBLA9NnFIpPGraPU/edit?usp=sharing), we saw a preformance degradation compare to the preformance improvement in [linear_table_benchmark](https://docs.google.com/spreadsheets/d/18JzY-q-boOZFu_GTwraRXSTJJVbt9Wmg7M2P4-8g41o/edit?usp=sharing) in same items. It needs to be analyzed and to be solved. ### My GitHub Repository I will update my status at a GitHub repository: https://github.com/Tacinight/ruby-gsoc-2018. More explanatory images and other of my work also can be found at this repository. Welcome to review the patches and leave your valuable comments. -- https://bugs.ruby-lang.org/ Unsubscribe: