From: nagayama15@... Date: 2019-09-11T09:03:28+00:00 Subject: [ruby-core:94898] [Ruby master Feature#16163] Reduce the output of `RubyVM::InstructionSequence#to_binary` Issue #16163 has been reported by NagayamaRyoga (Nagayama Ryoga). ---------------------------------------- Feature #16163: Reduce the output of `RubyVM::InstructionSequence#to_binary` https://bugs.ruby-lang.org/issues/16163 * Author: NagayamaRyoga (Nagayama Ryoga) * Status: Open * Priority: Normal * Assignee: * Target version: ---------------------------------------- ## Abstract The output of `RubyVM::InstructionSequence#to_binary` is extremely large. We have reduced the output of `#to_binary` by more than 70%. The execution speed of `RubyVM::InstructionSequence.load_from_binary` is about 7% slower, but when reading a binary from a file, it may be faster than the master. Since Bootsnap gem uses `#to_binary`, this proposal reduces the compilation cache size of Rails projects to about 1/4. ## Background `#to_binary` and `.load_from_binary` are used by Bootsnap gem that is installed by default in Rails projects since Rails 5.2. Improving `#to_binary` output also reduces the compilation cache generated by it. ## Implementation https://github.com/ruby/ruby/pull/2450 ### Techniques 1. Prevented unnecessary structure fields from being output. i.e. MJIT information in `struct rb_iseq_constant_body`. 2. Output integer value in variable length format such as UTF-8. ``` /* * Small uint serialization * 0x00000000_00000000 - 0x00000000_0000007f: 1byte | XXXX XXX1 | * 0x00000000_00000080 - 0x00000000_00003fff: 2byte | XXXX XX10 | XXXX XXXX | * 0x00000000_00004000 - 0x00000000_001fffff: 3byte | XXXX X100 | XXXX XXXX | XXXX XXXX | * 0x00000000_00020000 - 0x00000000_0fffffff: 4byte | XXXX 1000 | XXXX XXXX | XXXX XXXX | XXXX XXXX | * ... * 0x00010000_00000000 - 0x00ffffff_ffffffff: 8byte | 1000 0000 | XXXX XXXX | XXXX XXXX | XXXX XXXX | XXXX XXXX | XXXX XXXX | XXXX XXXX | XXXX XXXX | * 0x01000000_00000000 - 0xffffffff_ffffffff: 9byte | 0000 0000 | XXXX XXXX | XXXX XXXX | XXXX XXXX | XXXX XXXX | XXXX XXXX | XXXX XXXX | XXXX XXXX | XXXX XXXX | */ ``` 3. We integrated ID output mechanism and object serialization. ## Evaluation ### Environment OS: Ubuntu 16.04 LTS CPU: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz Memory: 32GB ### Simple benchmark First, We combined the files in the `benchmark/` and generated a huge `.rb` file with 5400 lines. And We measured the output size of `#to_binary` and the time taken to load it. The benchmark code: https://gist.github.com/NagayamaRyoga/d482938f3a03c4556d297bb09c03e1fa - master (`ruby 2.7.0dev (2019-08-17T11:20:04Z master 2a65498ca2) [x86_64-linux]`) ``` size: 1963764B user system total real load_from_binary 4.276000 0.000000 4.276000 ( 4.277652) File.read + load_from_binary 5.060000 0.536000 5.596000 ( 5.593620) ``` - This proposal ``` size: 463776B user system total real load_from_binary 4.576000 0.004000 4.580000 ( 4.580691) File.read + load_from_binary 4.856000 0.080000 4.936000 ( 4.934168) ``` The output size of `#to_binary` is about 24% (4 times smaller!) of the output of master's. `.load_from_binary` is about 7% slower. However, loading the binary from a file and decoding it (`File.read + load_from_binary`), it is about 12% faster than master. ### A Rails project with Bootsnap Next, We measured the startup time of the simple Rails project generated with `$ rails new`. Bootsnap caches the compilation results at the first boot and uses them to load the application from the next time. Settings: ``` RAILS_ENV=production DISABLE_SPRING=1 ``` - master - Cache (`tmp/`): 32MB - The first boot: Average 1.700s (N=10) - Boot from cache: Average 0.588s (N=10) - proposal - Cache (`tmp/`): 9.4MB - The first boot: Average 1.684s (N=10) - Boot from cache: Average 0.592s (N=10) The cache size is now about 30%. There was no impact on project startup time. ## Tests Passed `make test-all` with `RUBY_ISEQ_DUMP_DEBUG='to_binary'`. ```sh $ make test-all -j8 RUBY_ISEQ_DUMP_DEBUG=to_binary ../../ruby-dev/revision.h unchanged Run options: "--ruby=./miniruby -I../../ruby-dev/lib -I. -I.ext/common ../../ruby-dev/tool/runruby.rb --extout=.ext -- --disable-gems" --excludes-dir=../../ruby-dev/test/excludes --name=!/memory_leak/ # Running tests: Finished tests in 46.252333s, 452.6258 tests/s, 57576.1656 assertions/s. 20935 tests, 2663032 assertions, 0 failures, 0 errors, 92 skips ruby -v: ruby 2.7.0dev (2019-09-05T09:20:11Z alt-bytecode/load_.. 8aa0a1cc4c) [x86_64-linux] ``` ## Conclusion The output size of `RubyVM::InstructionSequence#to_binary` is about 1/4 of the master. The impact on speed is negligible. Passed all tests. -- https://bugs.ruby-lang.org/ Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe> <http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>