[#108176] [Ruby master Bug#18679] Encoding::UndefinedConversionError: "\xE2" from ASCII-8BIT to UTF-8 — "taf2 (Todd Fisher)" <noreply@...>

Issue #18679 has been reported by taf2 (Todd Fisher).

8 messages 2022/04/05

[#108185] [Ruby master Feature#18683] Allow to create hashes with a specific capacity. — "byroot (Jean Boussier)" <noreply@...>

Issue #18683 has been reported by byroot (Jean Boussier).

13 messages 2022/04/06

[#108198] [Ruby master Feature#18685] Enumerator.product: Cartesian product of enumerators — "knu (Akinori MUSHA)" <noreply@...>

Issue #18685 has been reported by knu (Akinori MUSHA).

8 messages 2022/04/08

[#108201] [Ruby master Misc#18687] [ANN] Upgraded bugs.ruby-lang.org to Redmine 5.0 — "hsbt (Hiroshi SHIBATA)" <noreply@...>

Issue #18687 has been reported by hsbt (Hiroshi SHIBATA).

10 messages 2022/04/09

[#108216] [Ruby master Misc#18691] An option to run `make rbconfig.rb` in a different directory — "jaruga (Jun Aruga)" <noreply@...>

Issue #18691 has been reported by jaruga (Jun Aruga).

14 messages 2022/04/12

[#108225] [Ruby master Misc#18726] CI Error on c99 and c2x — "znz (Kazuhiro NISHIYAMA)" <noreply@...>

Issue #18726 has been reported by znz (Kazuhiro NISHIYAMA).

11 messages 2022/04/14

[#108235] [Ruby master Bug#18729] Method#owner and UnboundMethod#owner are incorrect after using Module#public/protected/private — "Eregon (Benoit Daloze)" <noreply@...>

Issue #18729 has been reported by Eregon (Benoit Daloze).

28 messages 2022/04/14

[#108237] [Ruby master Bug#18730] Double `return` event handling with different tracepoints — "hurricup (Alexandr Evstigneev)" <noreply@...>

Issue #18730 has been reported by hurricup (Alexandr Evstigneev).

8 messages 2022/04/14

[#108294] [Ruby master Bug#18743] Enumerator#next / peek re-use each others stacktraces — sos4nt <noreply@...>

Issue #18743 has been reported by sos4nt (Stefan Schテシテ殕er).

20 messages 2022/04/19

[#108301] [Ruby master Bug#18744] I used Jazzy to generate the doc for my iOS library, but it showed me a bug — "zhaoxinqiang (marc steven)" <noreply@...>

Issue #18744 has been reported by zhaoxinqiang (marc steven).

8 messages 2022/04/20

[ruby-core:108324] [Ruby master Feature#18683] Allow to create hashes with a specific capacity.

From: "mame (Yusuke Endoh)" <noreply@...>
Date: 2022-04-21 03:06:55 UTC
List: ruby-core #108324
Issue #18683 has been updated by mame (Yusuke Endoh).


I confirmed the proposed API actually brings performance improvements at least in a micro benchmark.

```
$ time ./miniruby -e '1000.times { h = {}; 100000.times {|x| h[x] = true } }'

real    0m8.403s
user    0m8.343s
sys     0m0.060s

$ time ./miniruby -e '1000.times { h = Hash.new_with_capacity(100000); 100000.times {|x| h[x] = true } }'

real    0m7.603s
user    0m7.533s
sys     0m0.070s
```

My preference of its API style is `Hash.new(capacity: 100000)`. Can we first deprecate any keyword arguments for Hash.new and then introduce the capacity keyword?

```ruby
diff --git a/hash.c b/hash.c
index da85fd35c6..0d0faf6ecc 100644
--- a/hash.c
+++ b/hash.c
@@ -1559,10 +1559,10 @@ copy_compare_by_id(VALUE hash, VALUE basis)
     return hash;
 }

-MJIT_FUNC_EXPORTED VALUE
-rb_hash_new_with_size(st_index_t size)
+static VALUE
+hash_alloc_with_size(VALUE klass, st_index_t size)
 {
-    VALUE ret = rb_hash_new();
+    VALUE ret = hash_alloc(klass);
     if (size == 0) {
         /* do nothing */
     }
@@ -1575,6 +1575,12 @@ rb_hash_new_with_size(st_index_t size)
     return ret;
 }

+MJIT_FUNC_EXPORTED VALUE
+rb_hash_new_with_size(st_index_t size)
+{
+    return hash_alloc_with_size(rb_cHash, size);
+}
+
 static VALUE
 hash_copy(VALUE ret, VALUE hash)
 {
@@ -1904,6 +1910,15 @@ rb_hash_s_create(int argc, VALUE *argv, VALUE klass)
     return hash;
 }

+static VALUE
+rb_hash_s_new_with_capa(VALUE klass, VALUE size)
+{
+    VALUE hash;
+    hash = hash_alloc_with_size(klass, NUM2LONG(size));
+    hash_verify(hash);
+    return hash;
+}
+
 MJIT_FUNC_EXPORTED VALUE
 rb_to_hash_type(VALUE hash)
 {
@@ -7155,6 +7170,7 @@ Init_Hash(void)
     rb_define_alloc_func(rb_cHash, empty_hash_alloc);
     rb_define_singleton_method(rb_cHash, "[]", rb_hash_s_create, -1);
     rb_define_singleton_method(rb_cHash, "try_convert", rb_hash_s_try_convert, 1);
+    rb_define_singleton_method(rb_cHash, "new_with_capacity", rb_hash_s_new_with_capa, 1);
     rb_define_method(rb_cHash, "initialize", rb_hash_initialize, -1);
     rb_define_method(rb_cHash, "initialize_copy", rb_hash_replace, 1);
     rb_define_method(rb_cHash, "rehash", rb_hash_rehash, 0);
```

----------------------------------------
Feature #18683: Allow to create hashes with a specific capacity.
https://bugs.ruby-lang.org/issues/18683#change-97345

* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
----------------------------------------
Various protocol parsers such as Redis `RESP3` or `msgpack`, have to create hashes, and they know the size in advance.
For efficiency, it would be preferable if they could directly allocate a Hash of the necessary size, so that large hashes wouldn't cause many re-alloccations.

Example of code that would benefit:

 - [`hiredis` bindings](https://github.com/redis-rb/redis-client/blob/830d586b665bc9569335d70e82c41377f18e0c16/ext/redis_client/hiredis/hiredis_connection.c#L157-L162)
 - [Ruby `redis RESP3` parser](https://github.com/redis-rb/redis-client/blob/830d586b665bc9569335d70e82c41377f18e0c16/lib/redis_client/resp3.rb#L173-L175)
 - [magpack-ruby](https://github.com/msgpack/msgpack-ruby/blob/c46bb60f79312cab902356e89f3f6035d7cad03f/ext/msgpack/unpacker.c#L641-L644)

`String` and `Array` both already offer similar APIs:

```ruby
String.new(capacity: XXX)
Array.new(XX) / rb_ary_new_capa(long)
```

However there's no such public API for Hashes, neither in Ruby land not in the C extension API.

### Proposal

I think `Hash.new` should accept a `capacity:` named parameter:

```ruby
hash = Hash.new(capacity: 1000)
```

Additionally I think the internal `rb_hash_new_with_size` function should be exposed to C extensions as `rb_hash_new_capa(long)`, for consistency with `rb_ary_new_capa(long)`.




-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

In This Thread