[ruby-core:94898] [Ruby master Feature#16163] Reduce the output of `RubyVM::InstructionSequence#to_binary`
From:
nagayama15@...
Date:
2019-09-11 09:03:28 UTC
List:
ruby-core #94898
Issue #16163 has been reported by NagayamaRyoga (Nagayama Ryoga).
----------------------------------------
Feature #16163: Reduce the output of `RubyVM::InstructionSequence#to_binary`
https://bugs.ruby-lang.org/issues/16163
* Author: NagayamaRyoga (Nagayama Ryoga)
* Status: Open
* Priority: Normal
* Assignee:
* Target version:
----------------------------------------
## Abstract
The output of `RubyVM::InstructionSequence#to_binary` is extremely large.
We have reduced the output of `#to_binary` by more than 70%.
The execution speed of `RubyVM::InstructionSequence.load_from_binary` is about 7% slower, but when reading a binary from a file, it may be faster than the master.
Since Bootsnap gem uses `#to_binary`, this proposal reduces the compilation cache size of Rails projects to about 1/4.
## Background
`#to_binary` and `.load_from_binary` are used by Bootsnap gem
that is installed by default in Rails projects since Rails 5.2.
Improving `#to_binary` output also reduces the compilation cache generated by it.
## Implementation
https://github.com/ruby/ruby/pull/2450
### Techniques
1. Prevented unnecessary structure fields from being output.
i.e. MJIT information in `struct rb_iseq_constant_body`.
2. Output integer value in variable length format such as UTF-8.
```
/*
* Small uint serialization
* 0x00000000_00000000 - 0x00000000_0000007f: 1byte | XXXX XXX1 |
* 0x00000000_00000080 - 0x00000000_00003fff: 2byte | XXXX XX10 | XXXX XXXX |
* 0x00000000_00004000 - 0x00000000_001fffff: 3byte | XXXX X100 | XXXX XXXX | XXXX XXXX |
* 0x00000000_00020000 - 0x00000000_0fffffff: 4byte | XXXX 1000 | XXXX XXXX | XXXX XXXX | XXXX XXXX |
* ...
* 0x00010000_00000000 - 0x00ffffff_ffffffff: 8byte | 1000 0000 | XXXX XXXX | XXXX XXXX | XXXX XXXX | XXXX XXXX | XXXX XXXX | XXXX XXXX | XXXX XXXX |
* 0x01000000_00000000 - 0xffffffff_ffffffff: 9byte | 0000 0000 | XXXX XXXX | XXXX XXXX | XXXX XXXX | XXXX XXXX | XXXX XXXX | XXXX XXXX | XXXX XXXX | XXXX XXXX |
*/
```
3. We integrated ID output mechanism and object serialization.
## Evaluation
### Environment
OS: Ubuntu 16.04 LTS
CPU: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
Memory: 32GB
### Simple benchmark
First, We combined the files in the `benchmark/` and generated a huge `.rb` file with 5400 lines.
And We measured the output size of `#to_binary` and the time taken to load it.
The benchmark code: https://gist.github.com/NagayamaRyoga/d482938f3a03c4556d297bb09c03e1fa
- master (`ruby 2.7.0dev (2019-08-17T11:20:04Z master 2a65498ca2) [x86_64-linux]`)
```
size: 1963764B
user system total real
load_from_binary 4.276000 0.000000 4.276000 ( 4.277652)
File.read + load_from_binary 5.060000 0.536000 5.596000 ( 5.593620)
```
- This proposal
```
size: 463776B
user system total real
load_from_binary 4.576000 0.004000 4.580000 ( 4.580691)
File.read + load_from_binary 4.856000 0.080000 4.936000 ( 4.934168)
```
The output size of `#to_binary` is about 24% (4 times smaller!) of the output of master's.
`.load_from_binary` is about 7% slower.
However, loading the binary from a file and decoding it (`File.read + load_from_binary`), it is about 12% faster than master.
### A Rails project with Bootsnap
Next, We measured the startup time of the simple Rails project generated with `$ rails new`.
Bootsnap caches the compilation results at the first boot and uses them to load the application from the next time.
Settings:
```
RAILS_ENV=production
DISABLE_SPRING=1
```
- master
- Cache (`tmp/`): 32MB
- The first boot: Average 1.700s (N=10)
- Boot from cache: Average 0.588s (N=10)
- proposal
- Cache (`tmp/`): 9.4MB
- The first boot: Average 1.684s (N=10)
- Boot from cache: Average 0.592s (N=10)
The cache size is now about 30%.
There was no impact on project startup time.
## Tests
Passed `make test-all` with `RUBY_ISEQ_DUMP_DEBUG='to_binary'`.
```sh
$ make test-all -j8 RUBY_ISEQ_DUMP_DEBUG=to_binary
../../ruby-dev/revision.h unchanged
Run options: "--ruby=./miniruby -I../../ruby-dev/lib -I. -I.ext/common ../../ruby-dev/tool/runruby.rb --extout=.ext -- --disable-gems" --excludes-dir=../../ruby-dev/test/excludes --name=!/memory_leak/
# Running tests:
Finished tests in 46.252333s, 452.6258 tests/s, 57576.1656 assertions/s.
20935 tests, 2663032 assertions, 0 failures, 0 errors, 92 skips
ruby -v: ruby 2.7.0dev (2019-09-05T09:20:11Z alt-bytecode/load_.. 8aa0a1cc4c) [x86_64-linux]
```
## Conclusion
The output size of `RubyVM::InstructionSequence#to_binary` is about 1/4 of the master.
The impact on speed is negligible.
Passed all tests.
--
https://bugs.ruby-lang.org/
Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>