[ruby-core:108324] [Ruby master Feature#18683] Allow to create hashes with a specific capacity.
From:
"mame (Yusuke Endoh)" <noreply@...>
Date:
2022-04-21 03:06:55 UTC
List:
ruby-core #108324
Issue #18683 has been updated by mame (Yusuke Endoh).
I confirmed the proposed API actually brings performance improvements at least in a micro benchmark.
```
$ time ./miniruby -e '1000.times { h = {}; 100000.times {|x| h[x] = true } }'
real 0m8.403s
user 0m8.343s
sys 0m0.060s
$ time ./miniruby -e '1000.times { h = Hash.new_with_capacity(100000); 100000.times {|x| h[x] = true } }'
real 0m7.603s
user 0m7.533s
sys 0m0.070s
```
My preference of its API style is `Hash.new(capacity: 100000)`. Can we first deprecate any keyword arguments for Hash.new and then introduce the capacity keyword?
```ruby
diff --git a/hash.c b/hash.c
index da85fd35c6..0d0faf6ecc 100644
--- a/hash.c
+++ b/hash.c
@@ -1559,10 +1559,10 @@ copy_compare_by_id(VALUE hash, VALUE basis)
return hash;
}
-MJIT_FUNC_EXPORTED VALUE
-rb_hash_new_with_size(st_index_t size)
+static VALUE
+hash_alloc_with_size(VALUE klass, st_index_t size)
{
- VALUE ret = rb_hash_new();
+ VALUE ret = hash_alloc(klass);
if (size == 0) {
/* do nothing */
}
@@ -1575,6 +1575,12 @@ rb_hash_new_with_size(st_index_t size)
return ret;
}
+MJIT_FUNC_EXPORTED VALUE
+rb_hash_new_with_size(st_index_t size)
+{
+ return hash_alloc_with_size(rb_cHash, size);
+}
+
static VALUE
hash_copy(VALUE ret, VALUE hash)
{
@@ -1904,6 +1910,15 @@ rb_hash_s_create(int argc, VALUE *argv, VALUE klass)
return hash;
}
+static VALUE
+rb_hash_s_new_with_capa(VALUE klass, VALUE size)
+{
+ VALUE hash;
+ hash = hash_alloc_with_size(klass, NUM2LONG(size));
+ hash_verify(hash);
+ return hash;
+}
+
MJIT_FUNC_EXPORTED VALUE
rb_to_hash_type(VALUE hash)
{
@@ -7155,6 +7170,7 @@ Init_Hash(void)
rb_define_alloc_func(rb_cHash, empty_hash_alloc);
rb_define_singleton_method(rb_cHash, "[]", rb_hash_s_create, -1);
rb_define_singleton_method(rb_cHash, "try_convert", rb_hash_s_try_convert, 1);
+ rb_define_singleton_method(rb_cHash, "new_with_capacity", rb_hash_s_new_with_capa, 1);
rb_define_method(rb_cHash, "initialize", rb_hash_initialize, -1);
rb_define_method(rb_cHash, "initialize_copy", rb_hash_replace, 1);
rb_define_method(rb_cHash, "rehash", rb_hash_rehash, 0);
```
----------------------------------------
Feature #18683: Allow to create hashes with a specific capacity.
https://bugs.ruby-lang.org/issues/18683#change-97345
* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
----------------------------------------
Various protocol parsers such as Redis `RESP3` or `msgpack`, have to create hashes, and they know the size in advance.
For efficiency, it would be preferable if they could directly allocate a Hash of the necessary size, so that large hashes wouldn't cause many re-alloccations.
Example of code that would benefit:
- [`hiredis` bindings](https://github.com/redis-rb/redis-client/blob/830d586b665bc9569335d70e82c41377f18e0c16/ext/redis_client/hiredis/hiredis_connection.c#L157-L162)
- [Ruby `redis RESP3` parser](https://github.com/redis-rb/redis-client/blob/830d586b665bc9569335d70e82c41377f18e0c16/lib/redis_client/resp3.rb#L173-L175)
- [magpack-ruby](https://github.com/msgpack/msgpack-ruby/blob/c46bb60f79312cab902356e89f3f6035d7cad03f/ext/msgpack/unpacker.c#L641-L644)
`String` and `Array` both already offer similar APIs:
```ruby
String.new(capacity: XXX)
Array.new(XX) / rb_ary_new_capa(long)
```
However there's no such public API for Hashes, neither in Ruby land not in the C extension API.
### Proposal
I think `Hash.new` should accept a `capacity:` named parameter:
```ruby
hash = Hash.new(capacity: 1000)
```
Additionally I think the internal `rb_hash_new_with_size` function should be exposed to C extensions as `rb_hash_new_capa(long)`, for consistency with `rb_ary_new_capa(long)`.
--
https://bugs.ruby-lang.org/
Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>