[ruby-core:96979] [Ruby master Feature#16557] Deduplicate Regexp literals
From:
jean.boussier@...
Date:
2020-01-23 11:49:47 UTC
List:
ruby-core #96979
Issue #16557 has been reported by byroot (Jean Boussier).
----------------------------------------
Feature #16557: Deduplicate Regexp literals
https://bugs.ruby-lang.org/issues/16557
* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
* Assignee:
* Target version:
----------------------------------------
Pull Request: https://github.com/ruby/ruby/pull/2859
### Context
Real world application contain many duplicated Regexp literals.
From a rails/console in Redmine:
```
>> ObjectSpace.each_object(Regexp).count
=> 6828
>> ObjectSpace.each_object(Regexp).uniq.count
=> 4162
>> ObjectSpace.each_object(Regexp).to_a.map { |r| ObjectSpace.memsize_of(r) }.sum
=> 4611957 # 4.4 MB total
>> ObjectSpace.each_object(Regexp).to_a.map { |r| ObjectSpace.memsize_of(r) }.sum - ObjectSpace.each_object(Regexp).to_a.uniq.map { |r| ObjectSpace.memsize_of(r) }.sum
=> 1490601 # 1.42 MB could be saved
```
Here's the to 10 duplicated regexps in Redmine:
```
147: /"/
107: /\s+/
103: //
89: /\n/
83: /'/
76: /\s+/m
37: /\d+/
35: /\[/
33: /./
33: /\\./
```
Any empty Rails application will have a similar amount of regexps.
### The feature
Since https://bugs.ruby-lang.org/issues/16377 made literal regexps frozen, it is possible to deduplicate literal regexps without changing any semantic and save a decent amount of resident memory.
### The patch
I tried implementing this feature in a way very similar to the `frozen_strings` table, it's functional but I'm having trouble with a segfault on Linux: https://github.com/ruby/ruby/pull/2859
--
https://bugs.ruby-lang.org/
Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>