From: nagayama15@... Date: 2020-01-12T13:42:19+00:00 Subject: [ruby-core:96801] [Ruby master Feature#16505] Improve preformance of `RubyVM::InstructionSequence#to_binary` Issue #16505 has been reported by NagayamaRyoga (Nagayama Ryoga). ---------------------------------------- Feature #16505: Improve preformance of `RubyVM::InstructionSequence#to_binary` https://bugs.ruby-lang.org/issues/16505 * Author: NagayamaRyoga (Nagayama Ryoga) * Status: Open * Priority: Normal * Assignee: * Target version: ---------------------------------------- ## Abstract Within #to_binary, deduplication of objects output to binary is performed, but the current implementation is achieved by a linear search of an array of objects (=`obj_list`). (https://github.com/ruby/ruby/blob/e288632f22b18b29efd20a1469292b0a3ba9b74c/compile.c#L9699-L9701) On the other hand, iseq deduplication is faster because it is implemented using a hash. (https://github.com/ruby/ruby/blob/e288632f22b18b29efd20a1469292b0a3ba9b74c/compile.c#L9744-L9745) This proposal speeds up object deduplication by using a hash. This patch does not change the output binary. ## Implementation https://github.com/ruby/ruby/pull/2835 ## Evaluation Environment: - OS: macOS Catalina - CPU: Intel Core i5 - Memory: 16GB ### address_lists_parser.rb `address_lists_parser.rb`(https://github.com/mikel/mail/blob/master/lib/mail/parsers/address_lists_parser.rb) in [`mail` gem](https://github.com/mikel/mail) has an extremely huge array. Call `# to_binary` on the iseq of this file and check its execution time and MD5 of the output binary. The benchmark code: ```rb require 'benchmark' require 'digest/md5' F = 'address_lists_parser.rb' N = 100 iseq = RubyVM::InstructionSequence.compile_file(F) bin = iseq.to_binary puts "md5 hash: #{Digest::MD5.hexdigest(bin)}" Benchmark.bm(12) do |x| x.report("to_binary x#{N}") { N.times do || iseq.to_binary end } end ``` - master (`ruby 2.8.0dev (2020-01-12T10:54:59Z master e288632f22) [x86_64-darwin19]`) ``` md5 hash: fd80e7c0c8da7a9044e89139c6078137 user system total real to_binary x100 27.162084 0.078262 27.240346 ( 27.675089) ``` - Proposal (`ruby 2.8.0dev (2020-01-12T12:39:10Z improve-performanc.. e05ad5ef81) [x86_64-darwin19]`) ``` md5 hash: fd80e7c0c8da7a9044e89139c6078137 user system total real to_binary x100 0.989403 0.036869 1.026272 ( 1.063335) ``` The same binary was output before and after the change. Execution speed is 26 times faster. -- https://bugs.ruby-lang.org/ Unsubscribe: