[#117746] [Ruby master Bug#20462] Native threads are no longer reused — "tenderlovemaking (Aaron Patterson) via ruby-core" <ruby-core@...>

Issue #20462 has been reported by tenderlovemaking (Aaron Patterson).

8 messages 2024/05/01

[#117763] [Ruby master Bug#20468] Segfault on safe navigation in for target — "kddnewton (Kevin Newton) via ruby-core" <ruby-core@...>

Issue #20468 has been reported by kddnewton (Kevin Newton).

11 messages 2024/05/03

[#117765] [Ruby master Feature#20470] Extract Ruby's Garbage Collector — "peterzhu2118 (Peter Zhu) via ruby-core" <ruby-core@...>

Issue #20470 has been reported by peterzhu2118 (Peter Zhu).

8 messages 2024/05/03

[#117812] [Ruby master Bug#20478] Circular parameter syntax error rules — "kddnewton (Kevin Newton) via ruby-core" <ruby-core@...>

Issue #20478 has been reported by kddnewton (Kevin Newton).

11 messages 2024/05/08

[#117838] [Ruby master Bug#20485] Simple use of Mutex and Fiber makes GC leak objects with singleton method — "skhrshin (Shintaro Sakahara) via ruby-core" <ruby-core@...>

Issue #20485 has been reported by skhrshin (Shintaro Sakahara).

14 messages 2024/05/12

[#117882] [Ruby master Bug#20490] Process.waitpid2(-1, Process::WNOHANG) misbehaves on Ruby 3.1 & 3.2 with detached process — "stanhu (Stan Hu) via ruby-core" <ruby-core@...>

Issue #20490 has been reported by stanhu (Stan Hu).

7 messages 2024/05/15

[#117905] [Ruby master Bug#20493] Segfault on rb_io_getline_fast — "josegomezr (Jose Gomez) via ruby-core" <ruby-core@...>

Issue #20493 has been reported by josegomezr (Jose Gomez).

14 messages 2024/05/17

[#117918] [Ruby master Bug#20494] Non-default directories are not searched when checking for a gmp header — "lish82 (Hiroki Katagiri) via ruby-core" <ruby-core@...>

Issue #20494 has been reported by lish82 (Hiroki Katagiri).

10 messages 2024/05/19

[#117921] [Ruby master Bug#20495] Running "make clean" deletes critical "coroutine/amd64/Context.S" file and causes "make" to fail — "fallwith (James Bunch) via ruby-core" <ruby-core@...>

Issue #20495 has been reported by fallwith (James Bunch).

7 messages 2024/05/19

[#117929] [Ruby master Feature#20498] Negated method calls — "MaxLap (Maxime Lapointe) via ruby-core" <ruby-core@...>

Issue #20498 has been reported by MaxLap (Maxime Lapointe).

10 messages 2024/05/19

[#117957] [Ruby master Bug#20500] Non-system directories are not searched when checking for jemalloc headers and libs, and building `enc` — "lish82 (Hiroki Katagiri) via ruby-core" <ruby-core@...>

Issue #20500 has been reported by lish82 (Hiroki Katagiri).

12 messages 2024/05/21

[#117968] [Ruby master Bug#20501] ruby SEGV — "akr (Akira Tanaka) via ruby-core" <ruby-core@...>

Issue #20501 has been reported by akr (Akira Tanaka).

15 messages 2024/05/22

[#117992] [Ruby master Bug#20505] Reassigning the block argument in method body keeps old block when calling super with implicit arguments — "Earlopain (A S) via ruby-core" <ruby-core@...>

Issue #20505 has been reported by Earlopain (A S).

7 messages 2024/05/24

[#118003] [Ruby master Bug#20506] Failure compiling Ruby 3.4.0-preview1 on aarch64 on a mac and linux (Ubuntu 24.04) — "schneems (Richard Schneeman) via ruby-core" <ruby-core@...>

Issue #20506 has been reported by schneems (Richard Schneeman).

12 messages 2024/05/24

[#118090] [Ruby master Bug#20513] the feature of kwargs in index methods has been removed without due consideration of utility and compatibility — "bughit (bug hit) via ruby-core" <ruby-core@...>

Issue #20513 has been reported by bughit (bug hit).

16 messages 2024/05/30

[#118110] [Ruby master Bug#20515] --with-gmp is not working - GMP support won't be built — "sorah (Sorah Fukumori) via ruby-core" <ruby-core@...>

Issue #20515 has been reported by sorah (Sorah Fukumori).

8 messages 2024/05/30

[#118128] [Ruby master Bug#20516] The version of rexml in ruby 3.3.2 has not been updated since 3.2.6. — "naitoh (Jun NAITOH) via ruby-core" <ruby-core@...>

Issue #20516 has been reported by naitoh (Jun NAITOH).

13 messages 2024/05/31

[ruby-core:117819] [Ruby master Feature#20415] Precompute literal String hash code during compilation

From: "shyouhei (Shyouhei Urabe) via ruby-core" <ruby-core@...>
Date: 2024-05-09 12:48:56 UTC
List: ruby-core #117819
Issue #20415 has been updated by shyouhei (Shyouhei Urabe).


The benchmark seems great.  But I'm not yet sure if this is worth the hustl=
e.  Is using a string _literal_ as a hash key very common?  It would be muc=
h convincing to me if there are any non-micro benchmarks.

----------------------------------------
Feature #20415: Precompute literal String hash code during compilation
https://bugs.ruby-lang.org/issues/20415#change-108228

* Author: byroot (Jean Boussier)
* Status: Open
----------------------------------------
I worked on a proof of concept with @etienne which I think has some potenti=
al, but I'm looking for feedback on what would be the best implementation.


The proof of concept is here: https://github.com/Shopify/ruby/pull/553

### Idea

Most string literals are relatively short, hence embedded, and have some wa=
sted bytes at the end of their slot. We could use that wasted space to stor=
e the string hash.

The goal being to make **looking up a literal String key in a hash, as fast=
 as a Symbol key**. The goal isn't to memoize the hash code of all strings,=
 but to **only selectively precompute the hash code of literal strings
in the compiler**. The compiler could even selectively do this when we lite=
ral string is used to lookup a hash (`opt_aref`).

Here's the benchmark we used:

```ruby
hash =3D 10.times.to_h do |i|
  [i, i]
end

dyn_sym =3D "dynamic_symbol".to_sym
hash[:some_symbol] =3D 1
hash[dyn_sym] =3D 1
hash["small"] =3D 2
hash["frozen_string_literal"] =3D 2

Benchmark.ips do |x|
  x.report("symbol") { hash[:some_symbol] }
  x.report("dyn_symbol") { hash[:some_symbol] }
  x.report("small_lit") { hash["small"] }
  x.report("frozen_lit") { hash["frozen_string_literal"] }
  x.compare!(order: :baseline)
end
```

On Ruby 3.3.0, looking up a String key is a bit slower based on the key siz=
e:

```
Calculating -------------------------------------
              symbol     24.175M (=B1 1.7%) i/s -    122.002M in   5.048306s
          dyn_symbol     24.345M (=B1 1.6%) i/s -    122.019M in   5.013400s
           small_lit     21.252M (=B1 2.1%) i/s -    107.744M in   5.072042s
          frozen_lit     20.095M (=B1 1.3%) i/s -    100.489M in   5.001681s

Comparison:
              symbol: 24174848.1 i/s
          dyn_symbol: 24345476.9 i/s - same-ish: difference falls within er=
ror
           small_lit: 21252403.2 i/s - 1.14x  slower
          frozen_lit: 20094766.0 i/s - 1.20x  slower
```

With the proof of concept performance is pretty much identical:

```
Calculating -------------------------------------
              symbol     23.528M (=B1 6.9%) i/s -    117.584M in   5.033231s
          dyn_symbol     23.777M (=B1 4.7%) i/s -    120.231M in   5.071734s
           small_lit     23.066M (=B1 2.9%) i/s -    115.376M in   5.006947s
          frozen_lit     22.729M (=B1 1.1%) i/s -    115.693M in   5.090700s

Comparison:
              symbol: 23527823.6 i/s
          dyn_symbol: 23776757.8 i/s - same-ish: difference falls within er=
ror
           small_lit: 23065535.3 i/s - same-ish: difference falls within er=
ror
          frozen_lit: 22729351.6 i/s - same-ish: difference falls within er=
ror
```

### Possible implementation

The reason I'm opening this issue early is to get feedback on which would b=
e the best implementation.

#### Store hashcode after the string terminator

Right now the proof of concept simply stores the `st_index_t` after the str=
ing null terminator, and only when the string is embedded and as enough lef=
t over space.
Strings with a precomputed hash are marked with an user flag.

Pros:

  - Very simple implementation, no need to change a lot of code, and very e=
asy to strip out if we want to.
  - Doesn't use any extra memory. If the string doesn't have enough left ov=
er bytes, the optimization simply isn't applied.
  - The worst case overhead is a single `FL_TEST_RAW` in `rb_str_hash`.

Cons:

  - The optimization won't apply to certain string sizes. e.g. strings betw=
een `17` and `23` bytes won't have a precomputed hash code.
  - Extracting the hash code requires some not so nice pointer arithmetic.


#### Create another RString union

Another possibility would be to add another entry in the `RString` struct u=
nion, such as we'd have:

```c
struct RString {
    struct RBasic basic;
    long len;
    union {
        // ... existing members
        struct {
            st_index_t hash;
            char ary[1];
        } embded_literal;
    } as;
};
```

Pros:

  - The optimization can now be applied to all string sizes.
  - The hashcode is always at the same offset and properly aligned.

Cons:

  - Some strings would be bumped by one slot size, so would use marginally =
more memory.
  - Complexify the code base more, need to modify a lot more string related=
 code (e.g. `RSTRING_PTR` and many others)
  - When compiling such string, if an equal string already exists in the `f=
string` table, we'd need to replace it, we can't just mutate it in place to=
 add the hashcode.


### Prior art

[Feature #15331] is somewhat similar in its idea, but it does it lazily for=
 all strings. Here it's much simpler because limited to string literals, wh=
ich are the ones likely to be used as Hash keys, and the overhead is on com=
pilation, not runtime (aside from a single flag check). So I think most of =
the caveats of that original implementation don't apply here.




--=20
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-c=
ore.ml.ruby-lang.org/

In This Thread