[#61171] Re: [ruby-changes:33145] normal:r45224 (trunk): gc.c: fix build for testing w/o RGenGC — SASADA Koichi <ko1@...>
(2014/03/01 16:15), normal wrote:
[#61243] [ruby-trunk - Feature #9425] [PATCH] st: use power-of-two sizes to avoid slow modulo ops — normalperson@...
Issue #9425 has been updated by Eric Wong.
[#61359] [ruby-trunk - Bug #9609] [Open] [PATCH] vm_eval.c: fix misplaced RB_GC_GUARDs — normalperson@...
Issue #9609 has been reported by Eric Wong.
(2014/03/07 19:09), normalperson@yhbt.net wrote:
SASADA Koichi <ko1@atdot.net> wrote:
[#61424] [REJECT?] xmalloc/xfree: reduce atomic ops w/ thread-locals — Eric Wong <normalperson@...>
I'm unsure about this. I _hate_ the extra branches this adds;
Hi Eric,
SASADA Koichi <ko1@atdot.net> wrote:
(2014/03/14 2:12), Eric Wong wrote:
SASADA Koichi <ko1@atdot.net> wrote:
[#61452] [ruby-trunk - Feature #9632] [Open] [PATCH 0/2] speedup IO#close with linked-list from ccan — normalperson@...
Issue #9632 has been reported by Eric Wong.
[#61496] [ruby-trunk - Feature #9638] [Open] [PATCH] limit IDs to 32-bits on 64-bit systems — normalperson@...
Issue #9638 has been reported by Eric Wong.
[#61568] hash function for global method cache — Eric Wong <normalperson@...>
I came upon this because I noticed existing st numtable worked poorly
(2014/03/18 8:03), Eric Wong wrote:
SASADA Koichi <ko1@atdot.net> wrote:
what's the profit from using binary tree in place of hash?
Юрий Соколов <funny.falcon@gmail.com> wrote:
[#61687] [ruby-trunk - Bug #9606] Ocassional SIGSEGV inTestException#test_machine_stackoverflow on OpenBSD — normalperson@...
Issue #9606 has been updated by Eric Wong.
[#61760] [ruby-trunk - Feature #9632] [PATCH 0/2] speedup IO#close with linked-list from ccan — normalperson@...
Issue #9632 has been updated by Eric Wong.
[ruby-core:61706] [ruby-trunk - Bug #9680] [Open] String#sub and siblings should not use regex when String pattern is passed
Issue #9680 has been reported by Sam Rawlins.
----------------------------------------
Bug #9680: String#sub and siblings should not use regex when String pattern is passed
https://bugs.ruby-lang.org/issues/9680
* Author: Sam Rawlins
* Status: Open
* Priority: Normal
* Assignee:
* Category:
* Target version:
* ruby -v: trunk
* Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN
----------------------------------------
Currently `String#sub`, `#sub!`, `#gsub, and `#gsub!` all accept a String pattern, but immediately create a Regexp from it, and use the regex engine to search for the pattern. This is not performant. For example, `"123:456".gsub(":", "_")` creates the following objects, most of which are immediately up for GC:
* dup of the original String
* result String
* 2x `":"<US-ASCII>`
* 2x `":"<ASCII-8BIT>`
* Regexp from pattern: `/:/`
* `#<MatchData ":">`
* `#<MatchData nil>`
I have a solution which is not too complicated, at https://github.com/ruby/ruby/pull/579 and attached. Calls to `rb_reg_search()` are replaced with calls to a new function, `rb_pat_search()`, which conditionally calls `rb_reg_search()` or `rb_str_index()`, depending on whether the pattern is a String. Calculating the substring that needs to be replaced is also different when the pattern is a String.
Runtime of each method is dramatically reduced:
require 'benchmark'
n = 4_000_000
Benchmark.bm(7) do |bm|
str1 = "123:456"; str2 = "123_456";
colon = ":"; underscore = "_"
# each benchmark runs the substring method twice so that the bang methods can
# perform the same number of substitutions to str1 each go around.
bm.report("sub") { n.times { str1.sub(colon, underscore); str2.sub(underscore, colon) } }
bm.report("sub!") { n.times { str1.sub!(colon, underscore); str1.sub!(underscore, colon) } }
bm.report("gsub") { n.times { str1.gsub(colon, underscore); str2.gsub(underscore, colon) } }
bm.report("gsub!") { n.times { str1.gsub!(colon, underscore); str1.gsub!(underscore, colon) } }
end
# trunk
user system total real
sub 40.450000 0.580000 41.030000 ( 41.209658)
sub! 39.780000 0.580000 40.360000 ( 40.656789)
gsub 58.500000 0.820000 59.320000 ( 59.603923)
gsub! 59.400000 0.770000 60.170000 ( 60.435687)
# this patch
user system total real
sub 3.060000 0.010000 3.070000 ( 3.091920)
sub! 2.380000 0.010000 2.390000 ( 2.390769)
gsub 7.130000 0.130000 7.260000 ( 7.299139)
gsub! 7.660000 0.150000 7.810000 ( 7.846190)
When using a String pattern, runtime is reduced by 87% to 94%.
There is only one incompatibility that I am aware of: `$&` will not be set after using a sub method with a String pattern. (Subgroups (`$1`, ...) will not be available either, but weren't before, since String patterns are escaped before being used.)
In the future, only 3 more methods use the function, `get_pat()`, that creates a Regexp from the String pattern: `#split`, `#scan`, and `#match`. I think this fix could be applied to these as well.
---Files--------------------------------
ruby-579.diff (5.12 KB)
--
https://bugs.ruby-lang.org/