[#65451] [ruby-trunk - Feature #10333] [PATCH 3/1] optimize: "yoda literal" == string — ko1@...

Issue #10333 has been updated by Koichi Sasada.

9 messages 2014/10/07

[ruby-core:65459] [ruby-trunk - Feature #10333] [PATCH 3/1] optimize: "yoda literal" == string

From: normalperson@...
Date: 2014-10-07 08:02:25 UTC
List: ruby-core #65459
Issue #10333 has been updated by Eric Wong.


 ko1@atdot.net wrote:
 > Comments for this ticket and the following tickets:
 > 
 > > 1) [Feature #10326] optimize: recv << "literal string"
 > > 2) [Feature #10329] optimize: foo == "literal string"
 
 > To continue this kind of hack, we need tons of instructions for each
 > methods.  What do you think about it?
 
 I am not completely happy with my current patches because of verbosity
 and also icache footprint in the main VM loop.  Ruby executable sizes
 (even stripped) seem to get bigger with every release :<
 
 However, perhaps the biggest performance problem is still too many
 allocations and garbage objects; so I am willing to trade some code
 size to reduce dynamic allocations.
 
 > Basically, we need to add them more carefully. For example, persuasive
 > explanation are needed, such as statistics (analysis by
 > parser/compier), benchmark results for major use cases (maybe "<<
 > 'literal'" for templates. but not sure this ticket for) .
 
 Right, we will need to find more real benchmarks.
 
 Sadly, there are many places where garbage grows.  So maybe this change
 is only 1-2% overall.  We may need a lot of small changes to add up to
 noticeable improvements.
 
 > Another idea is to make more general approach to indicate arguments
 > (and a receiver) are string literal. It is called specialization.
 > Specialized instructions (opt_plus and so on) is a kind of
 > specialization by hands.
 
 I've been thinking along these lines, too.  For example, I would like to
 see String#tr! and String#gsub! able to avoid allocations for literal
 strings.  Or even optimize: Time.now.{to_f,to_i,strftime("lit"))}
 
 As suggested by akr, users may .freeze (or use constants), but that is
 verbose and requires VM internal knowledge.  My goal is to make
 optimization as transparent as possible so users may write concise,
 idiomatic Ruby code.
 
 It would be great if things like ruby-trunk r47813
 (unicode_norm_gen.rb: optimize concatenation)
 can be done transparently, even.
 
 > Small comments:
 > 
 > (1) iseq_compile_each() should not use opt_* instructions because we
 > should be able to make instructions without opt_* insns (on/off by
 > compile options).
 
 Right.  I'll see about making it optional and doing it more
 idiomatically.  I mainly used existing opt_{aref,aset}_with compilation
 as a guide.
 
 > (2) Name of instructions should be reconsidered.
 
 OK, I do not mind changing names.
 
 
 I also did informal benchmarks with my system Perl installation
 (Perl 5.14.2 on Debian stable x86-64):
 
 > loop_whileloop2
 
 	use strict;
 	my $i = 0;
 	while ($i < 6_000_000) { # benchmark loop 2
 		$i += 1;
 	}
 
 	Perl 0.228s
 	> trunk	0.10645449301227927
 	> built	0.10581812914460897
 
 	Without the string compare, we're already faster than Perl \o/
 
 > vm2_streq1
 
 	use strict;
 	my $i = 0;
 	my $foo = "literal";
 	while ($i < 6_000_000) { # benchmark loop 2
 		$i += 1;
 		$foo eq "literal";
 	}
 
 	Perl 0.0349s
 	> trunk	0.4726782930083573
 	> built	0.18452610215172172
 
 We lose to Perl without the optimization, but win with it :)
 This is just a micro-benchmark, of course, but I think it's an important
 data point to show gains by avoiding allocations when possible

----------------------------------------
Feature #10333: [PATCH 3/1] optimize: "yoda literal" == string
https://bugs.ruby-lang.org/issues/10333#change-49241

* Author: Eric Wong
* Status: Open
* Priority: Normal
* Assignee: 
* Category: core
* Target version: current: 2.2.0
----------------------------------------
This is a follow-up-to:

1) [Feature #10326] optimize: recv << "literal string"
2) [Feature #10329] optimize: foo == "literal string"

This can be slightly faster than: (string == "literal") because
we can guaranteed the "yoda literal" is already a string at
compile time.

Updated benchmarks from Xeon E3-1230 v3 @ 3.30GHz:

target 0: trunk (ruby 2.2.0dev (2014-10-06 trunk 47822) [x86_64-linux]) at "/home/ew/rrrr/b/i/bin/ruby"
target 1: built (ruby 2.2.0dev (2014-10-06 trunk 47822) [x86_64-linux]) at "/home/ew/ruby/b/i/bin/ruby"

-----------------------------------------------------------
loop_whileloop2

i = 0
while i< 6_000_000 # benchmark loop 2
  i += 1
end

trunk	0.10712811909615993
trunk	0.10693809622898698
trunk	0.10645449301227927
trunk	0.10646287119016051
built	0.10612367931753397
built	0.10581812914460897
built	0.10592922195792198
built	0.10595094738528132

-----------------------------------------------------------
vm2_streq1

i = 0
foo = "literal"
while i<6_000_000 # benchmark loop 2
  i += 1
  foo == "literal"
end

trunk	0.47250875690951943
trunk	0.47325073881074786
trunk	0.4726782930083573
trunk	0.4727754699997604
built	0.185972370672971
built	0.1850820742547512
built	0.18558283289894462
built	0.18452610215172172

-----------------------------------------------------------
vm2_streq2

i = 0
foo = "literal"
while i<6_000_000 # benchmark loop 2
  i += 1
  "literal" == foo
end

trunk	0.4719057851471007
trunk	0.4715963830240071
trunk	0.47177061904221773
trunk	0.4724834677763283
built	0.18247668212279677
built	0.18143231887370348
built	0.18060296680778265
built	0.17929687118157744

-----------------------------------------------------------
raw data:

[["loop_whileloop2",
  [[0.10712811909615993,
    0.10693809622898698,
    0.10645449301227927,
    0.10646287119016051],
   [0.10612367931753397,
    0.10581812914460897,
    0.10592922195792198,
    0.10595094738528132]]],
 ["vm2_streq1",
  [[0.47250875690951943,
    0.47325073881074786,
    0.4726782930083573,
    0.4727754699997604],
   [0.185972370672971,
    0.1850820742547512,
    0.18558283289894462,
    0.18452610215172172]]],
 ["vm2_streq2",
  [[0.4719057851471007,
    0.4715963830240071,
    0.47177061904221773,
    0.4724834677763283],
   [0.18247668212279677,
    0.18143231887370348,
    0.18060296680778265,
    0.17929687118157744]]]]

Elapsed time: 6.097474559 (sec)
-----------------------------------------------------------
benchmark results:
minimum results in each 4 measurements.
Execution time (sec)
name	trunk	built
loop_whileloop2	0.106	0.106
vm2_streq1*	0.366	0.079
vm2_streq2*	0.365	0.073

Speedup ratio: compare with the result of `trunk' (greater is better)
name	built
loop_whileloop2	1.006
vm2_streq1*	4.651
vm2_streq2*	4.969
---
 benchmark/bm_vm2_streq2.rb |  6 ++++++
 compile.c                  | 20 +++++++++++++++++++-
 insns.def                  | 20 ++++++++++++++++++++
 test/ruby/test_string.rb   | 12 ++++++++----
 4 files changed, 53 insertions(+), 5 deletions(-)
 create mode 100644 benchmark/bm_vm2_streq2.rb


---Files--------------------------------
0001-optimize-yoda-literal-string.patch (6.23 KB)


-- 
https://bugs.ruby-lang.org/

In This Thread

Prev Next