From: sam.saffron@... Date: 2021-03-02T06:19:45+00:00 Subject: [ruby-core:102695] [Ruby master Feature#17610] [PATCH] Reduce RubyVM::InstructionSequence.load_from_binary allocations Issue #17610 has been updated by sam.saffron (Sam Saffron). > I tried measuring the same thing on popular open source applications like Redmine or Discourse, unfortunately I couldn't make them work on Ruby 3.0 yet. I had a small for for it https://github.com/discourse/discourse/tree/ruby-3 not too much needed changing, waiting on some upstream gems. ---------------------------------------- Feature #17610: [PATCH] Reduce RubyVM::InstructionSequence.load_from_binary allocations https://bugs.ruby-lang.org/issues/17610#change-90690 * Author: byroot (Jean Boussier) * Status: Open * Priority: Normal ---------------------------------------- Pull Request: https://github.com/ruby/ruby/pull/4119 ### Context While profiling our application allocations, I noticed `load_from_binary` would allocate a string for each method: ``` 305.68 kB 7642 "initialize" 7454 bootsnap-1.5.1/lib/bootsnap/compile_cache/iseq.rb:19 ``` This lead me to experiment with the following repro script: ```ruby # frozen_string_literal: true require 'objspace' ObjectSpace.trace_object_allocations_start preload = [:some_func, :some_sym, :Foo, :extend, "Foo"] class Foo;end class Bar;end binary = RubyVM::InstructionSequence.compile(<<~RUBY).to_binary class Foo def initialize end end class Bar def initialize end end RUBY 4.times { GC.start } GC.disable gen = GC.count RubyVM::InstructionSequence.load_from_binary(binary) RubyVM::InstructionSequence.load_from_binary(binary) puts ObjectSpace.dump_all(output: :string, since: gen).lines.grep(/"type":"STRING"/) ``` On 3.0.0p0 it allocates 12 strings: ```json {"address":"0x7f90e7027a40", "type":"STRING", "class":"0x7f90e78be870", "embedded":true, "bytesize":11, "value":"", "file":"/tmp/load.rb", "line":27, "method":"load_from_binary", "generation":13, "memsize":40, "flags":{"wb_protected":true}} {"address":"0x7f90e7027ab8", "type":"STRING", "class":"0x7f90e78be870", "embedded":true, "bytesize":3, "value":"Bar", "encoding":"US-ASCII", "file":"/tmp/load.rb", "line":27, "method":"load_from_binary", "generation":13, "memsize":40, "flags":{"wb_protected":true}} {"address":"0x7f90e7027ae0", "type":"STRING", "class":"0x7f90e78be870", "embedded":true, "bytesize":11, "value":"", "file":"/tmp/load.rb", "line":27, "method":"load_from_binary", "generation":13, "memsize":40, "flags":{"wb_protected":true}} {"address":"0x7f90e7027b08", "type":"STRING", "class":"0x7f90e78be870", "embedded":true, "bytesize":10, "value":"", "encoding":"US-ASCII", "file":"/tmp/load.rb", "line":27, "method":"load_from_binary", "generation":13, "memsize":40, "flags":{"wb_protected":true}} {"address":"0x7f90e7027b58", "type":"STRING", "class":"0x7f90e78be870", "embedded":true, "bytesize":10, "value":"initialize", "encoding":"US-ASCII", "file":"/tmp/load.rb", "line":27, "method":"load_from_binary", "generation":13, "memsize":40, "flags":{"wb_protected":true}} {"address":"0x7f90e7027ba8", "type":"STRING", "class":"0x7f90e78be870", "embedded":true, "bytesize":3, "value":"Foo", "encoding":"US-ASCII", "file":"/tmp/load.rb", "line":27, "method":"load_from_binary", "generation":13, "memsize":40, "flags":{"wb_protected":true}} {"address":"0x7f90e7027cc0", "type":"STRING", "class":"0x7f90e78be870", "embedded":true, "bytesize":11, "value":"", "file":"/tmp/load.rb", "line":26, "method":"load_from_binary", "generation":13, "memsize":40, "flags":{"wb_protected":true}} {"address":"0x7f90e7027d38", "type":"STRING", "class":"0x7f90e78be870", "embedded":true, "bytesize":3, "value":"Bar", "encoding":"US-ASCII", "file":"/tmp/load.rb", "line":26, "method":"load_from_binary", "generation":13, "memsize":40, "flags":{"wb_protected":true}} {"address":"0x7f90e7027d60", "type":"STRING", "class":"0x7f90e78be870", "embedded":true, "bytesize":11, "value":"", "file":"/tmp/load.rb", "line":26, "method":"load_from_binary", "generation":13, "memsize":40, "flags":{"wb_protected":true}} {"address":"0x7f90e7027d88", "type":"STRING", "class":"0x7f90e78be870", "embedded":true, "bytesize":10, "value":"", "encoding":"US-ASCII", "file":"/tmp/load.rb", "line":26, "method":"load_from_binary", "generation":13, "memsize":40, "flags":{"wb_protected":true}} {"address":"0x7f90e7027dd8", "type":"STRING", "class":"0x7f90e78be870", "embedded":true, "bytesize":10, "value":"initialize", "encoding":"US-ASCII", "file":"/tmp/load.rb", "line":26, "method":"load_from_binary", "generation":13, "memsize":40, "flags":{"wb_protected":true}} {"address":"0x7f90e7027e28", "type":"STRING", "class":"0x7f90e78be870", "embedded":true, "bytesize":3, "value":"Foo", "encoding":"US-ASCII", "file":"/tmp/load.rb", "line":26, "method":"load_from_binary", "generation":13, "memsize":40, "flags":{"wb_protected":true}} ``` For a good part, these allocated strings are immediately passed to `rb_str_intern` to get a symbol, and since constant and method names are very likely to be referenced elsewhere in the application source, most of the time the symbol already exist and could be looked up without first allocating a String. ### This patch By using `rb_intern3` and `rb_enc_interned_str`, we can lookup existing symbols and fstrings without allocating anything. The same repro script with this patch only allocate a single string: ```json {"address":"0x7f88ec8dfde8", "type":"STRING", "class":"0x7f88ec8be800", "frozen":true, "embedded":true, "fstring":true, "bytesize":10, "value":"", "encoding":"US-ASCII", "file":"/tmp/load.rb", "line":25, "method":"load_from_binary", "generation":4, "memsize":40, "flags":{"wb_protected":true}} ``` ### Performance I added a small micro benchmark which show minor gains: ``` compare-ruby: ruby 3.1.0dev (2021-01-21T19:19:44Z master 32b7dcfb56) [x86_64-darwin19] built-ruby: ruby 3.1.0dev (2021-01-25T09:58:02Z iseq-load-symbol 6b0e2c1580) [x86_64-darwin19] # Iteration per second (i/s) | |compare-ruby|built-ruby| |:--------------|-----------:|---------:| |symbol | 447.846k| 489.421k| | | -| 1.09x| |define_method | 113.035k| 117.016k| | | -| 1.04x| |all | 61.421k| 64.382k| | | -| 1.05x| ``` In a more real world scenario, this patch reduce `load_from_binary` number of allocations by 65% ( and number of allocations by `7.7%` during our entire application's boot process) (`~1.7M` allocations avoided): Before: ``` allocated memory by location ----------------------------------- 429.56 MB bootsnap-1.5.1/lib/bootsnap/compile_cache/iseq.rb:19 allocated objects by location ----------------------------------- 2748272 bootsnap-1.5.1/lib/bootsnap/compile_cache/iseq.rb:19 ``` After: ``` allocated memory by location ----------------------------------- 346.06 MB bootsnap-1.5.1/lib/bootsnap/compile_cache/iseq.rb:19 allocated objects by location ----------------------------------- 960451 bootsnap-1.5.1/lib/bootsnap/compile_cache/iseq.rb:19 ``` And WALL profiling show much less time spent in `load_from_binary`: 3.0.0-p0: ``` ================================== Mode: wall(1000) Samples: 57745 (13.29% miss rate) GC: 16369 (28.35%) ================================== TOTAL (pct) SAMPLES (pct) FRAME 6500 (11.3%) 6497 (11.3%) RubyVM::InstructionSequence.load_from_binary ``` 3.0.0-p0 + this patch: ``` ================================== Mode: wall(1000) Samples: 46137 (13.32% miss rate) GC: 14094 (30.55%) ================================== TOTAL (pct) SAMPLES (pct) FRAME 3333 (7.2%) 3331 (7.2%) RubyVM::InstructionSequence.load_from_binary ``` I tried measuring the same thing on popular open source applications like Redmine or Discourse, unfortunately I couldn't make them work on Ruby 3.0 yet. -- https://bugs.ruby-lang.org/ Unsubscribe: