[#61171] Re: [ruby-changes:33145] normal:r45224 (trunk): gc.c: fix build for testing w/o RGenGC — SASADA Koichi <ko1@...>
(2014/03/01 16:15), normal wrote:
[#61243] [ruby-trunk - Feature #9425] [PATCH] st: use power-of-two sizes to avoid slow modulo ops — normalperson@...
Issue #9425 has been updated by Eric Wong.
[#61359] [ruby-trunk - Bug #9609] [Open] [PATCH] vm_eval.c: fix misplaced RB_GC_GUARDs — normalperson@...
Issue #9609 has been reported by Eric Wong.
(2014/03/07 19:09), normalperson@yhbt.net wrote:
SASADA Koichi <ko1@atdot.net> wrote:
[#61424] [REJECT?] xmalloc/xfree: reduce atomic ops w/ thread-locals — Eric Wong <normalperson@...>
I'm unsure about this. I _hate_ the extra branches this adds;
Hi Eric,
SASADA Koichi <ko1@atdot.net> wrote:
(2014/03/14 2:12), Eric Wong wrote:
SASADA Koichi <ko1@atdot.net> wrote:
[#61452] [ruby-trunk - Feature #9632] [Open] [PATCH 0/2] speedup IO#close with linked-list from ccan — normalperson@...
Issue #9632 has been reported by Eric Wong.
[#61496] [ruby-trunk - Feature #9638] [Open] [PATCH] limit IDs to 32-bits on 64-bit systems — normalperson@...
Issue #9638 has been reported by Eric Wong.
[#61568] hash function for global method cache — Eric Wong <normalperson@...>
I came upon this because I noticed existing st numtable worked poorly
(2014/03/18 8:03), Eric Wong wrote:
SASADA Koichi <ko1@atdot.net> wrote:
what's the profit from using binary tree in place of hash?
Юрий Соколов <funny.falcon@gmail.com> wrote:
[#61687] [ruby-trunk - Bug #9606] Ocassional SIGSEGV inTestException#test_machine_stackoverflow on OpenBSD — normalperson@...
Issue #9606 has been updated by Eric Wong.
[#61760] [ruby-trunk - Feature #9632] [PATCH 0/2] speedup IO#close with linked-list from ccan — normalperson@...
Issue #9632 has been updated by Eric Wong.
[ruby-core:61683] [ruby-trunk - Bug #9676] [Closed] String#gsub shouldn't allocate so many Strings in its loop
Issue #9676 has been updated by Charlie Somerville.
Status changed from Open to Closed
% Done changed from 0 to 100
Applied in changeset r45414.
----------
Stop allocating backref strings within gsub's search loop
* internal.h: add prototype for rb_reg_search0
* re.c: rename rb_reg_search to rb_reg_search0, add set_backref_str
argument to allow callers to indicate that they don't require the
backref string to be allocated
* string.c: don't allocate backref str if replacement string is provided
Closes GH-578. [Bug #9676] [ruby-core:61682]
----------------------------------------
Bug #9676: String#gsub shouldn't allocate so many Strings in its loop
https://bugs.ruby-lang.org/issues/9676#change-45934
* Author: Sam Rawlins
* Status: Closed
* Priority: Normal
* Assignee:
* Category:
* Target version:
* ruby -v: trunk
* Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN
----------------------------------------
`rb_reg_search()` allocates (dups) a String to attach to the backreference object ( `RMATCH(match)->str = rb_str_new4(str);` ). If #gsub has been passed 2 arguments (not Enumerator form) and the second argument is a String, then it shouldn't make these allocations when calling `rb_reg_search()` inside it's loop.
Here's an example:
# gsub-allocates-too-much.rb
require File.join(__dir__, "lib", "allocation_stats")
def puts_object_list(name, stats)
objects = stats.allocations.group_by(:sourcefile, :sourceline, :class).all.
values.flatten.map(&:object).
map {|o| o.is_a?(String) ? "#{o.inspect}<#{o.encoding.to_s}>" : o.inspect }
puts "#{name} #{objects.flatten.size} new objects:"
objects.group_by(&:hash).values.each { |ary| puts "#{ary.join(", ")}" }
end
slash = '/'; underscore = '_'; colon = ':' # allocate before the trace
str = "12:34:45:67"
stats = AllocationStats.trace { str.gsub(colon, underscore) }
puts '> "12:34:45:67".gsub(":", "_")'
puts_object_list("gsub substitutes 3x times:", stats)
$ ruby ../allocation_stats/gsub-allocates-too-much.rb
> "12:34:45:67".gsub(":", "_")
gsub substitutes 3x times: 12 new objects:
"12:34:45:67"<UTF-8>, "12:34:45:67"<UTF-8>, "12:34:45:67"<UTF-8>, "12:34:45:67"<UTF-8>
"12_34_45_67"<UTF-8>
":"<ASCII-8BIT>, ":"<ASCII-8BIT>
":"<US-ASCII>, ":"<US-ASCII>
#<MatchData ":">
#<MatchData nil>
/:/
The Strings that are copies of the original String are all unnecessary (except one, the last).
I have a fix (attached and at [1]) that involves allocating the str attribute of the backreference object only when necessary. In order to do this without changing the signature of `rb_reg_search()`, this patch changes `rb_reg_search()` to wrap a new function `rb_reg_search0()`. So no calls to `rb_reg_search()` need to change, and `str_gsub()` changes two calls into `rb_reg_search0()` to avoid the allocations. (I believe String#split suffers from the same extra allocations, and can make a similar call to `rb_reg_search0()`.)
The impact of this fix is primarily faster garbage collection. I have two "real world" examples:
* ActiveRecord sqlite3 specs: total time in GC reduced from 11.2s to 10.4s (7% savings).
* Mail gem specs: total time in GC reduced from 0.220s to 0.215s (2% savings).
These numbers bounced around a lot though. I'm open to better benchmarking suggestions. I used ActiveRecord and Mail for real world examples of #gsub, where realistic Strings are gsubbed.
[1] https://github.com/ruby/ruby/pull/578
---Files--------------------------------
ruby-578.diff (2.56 KB)
--
https://bugs.ruby-lang.org/