[ruby-core:118693] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3
From:
"Eregon (Benoit Daloze) via ruby-core" <ruby-core@...>
Date:
2024-07-26 11:09:46 UTC
List:
ruby-core #118693
Issue #20652 has been updated by Eregon (Benoit Daloze).
FWIW, what TruffleRuby does for this is to store `$~` as a frame-local thread-local variable, but thread-local only if more than 1 thread has been seen, otherwise it's stored directly in the frame:
https://github.com/oracle/truffleruby/blob/3cd422433deebe3fa664f8c4540811c42ca02e93/src/main/java/org/truffleruby/language/threadlocal/ThreadAndFrameLocalStorage.java
I'm not sure how it works on CRuby, but `$~` is stored directly in the frame then threads might see a different `$~` than they expect which could lead to very subtle bugs.
I don't really like a Regexp flag for this because a Regexp might be used in different contexts and some usages might want `$~` and some might not.
I think in general a good fix to simplify this and avoid this kind of races would be to store `$~` in the caller frame (even if that's a block's frame) but not higher.
In this case it would be stored in the `lambda`'s frame and not outside.
That's also quite a bit faster.
Of course it would be somewhat incompatible, but how much code uses a `$~` outside a block when the Regexp call is made inside a block?
We could warn that such code should not rely on that for a release or so, before changing it.
----------------------------------------
Misc #20652: Memory allocation for gsub has increased from Ruby 2.7 to 3.3
https://bugs.ruby-lang.org/issues/20652#change-109229
* Author: orisano (Nao Yonashiro)
* Status: Open
* Assignee: jeremyevans0 (Jeremy Evans)
----------------------------------------
I recently upgraded from ruby 2.7.7 to 3.3.1 and noticed that the GC load increased.
When I used the allocation profiler to investigate, I found that memory allocation from gsub had increased.
The problem was code like this:
```ruby
s = "foo "
s.gsub(/ (\s+)/) { " #{' ' * Regexp.last_match(1).length}" }
```
When I compared the results of heap-profiler between 2.7.7 and 3.3.1, I found that MatchData was increasing.
https://gist.github.com/orisano/98792dee260106e9b6fcb45bbabeb1e6
https://github.com/ruby/ruby/commit/abc0304cb28cb9dcc3476993bc487884c139fd11
I discovered that the cause is this commit, which stopped reusing backref to avoid race conditions.
Is there a way to reuse backref while still avoiding race conditions?
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/