From: "nickborromeo (Nick Borromeo) via ruby-core" Date: 2023-03-06T18:10:07+00:00 Subject: [ruby-core:112701] [Ruby master Bug#19438] Ruby 2.7 -> 3.2 Performance Regression in so_k_nucleotide benchmark Issue #19438 has been updated by nickborromeo (Nick Borromeo). > However I think the interesting part of this issue, would be to figure out why that specific implementation got slower over time. I definitely agree! > It would be interesting to break down the algorithm in smaller pieces and benchmark them idependently to see which operation exactly got slower. before I did the change to `tally` I also broke it down to smaller steps and ran the benchmark, apologies for the wall of text to follow; a direct quote from a discussion I had with others looking at this --- I may have found something interesting while playing around with the benchmark tool. I wanted to see if we can narrow it down to a specific thing that might be slow. so I got rid of the the `sort_by_freq` and `find_seq` methods for now and focused mostly on the `frecuency` method since stackprof identified that as the _slowest_ one Just so we have a baseline here is the code and the benchmark of the `frecuency` method untouched ``` def frecuency(seq, length) frequencies = Hash.new(0) ns = seq.length + 1 - length for i in (0 ... ns) frequencies[seq[i, length]] += 1 end [ns, frequencies] end ``` ``` Calculating ------------------------------------- 2.7.5 3.0.5 3.1.2 3.2.1 so_k_nucleotide 7.530 6.524 6.056 6.294 i/s - 1.000 times in 0.132808s 0.153277s 0.165117s 0.158884s Comparison: so_k_nucleotide 2.7.5: 7.5 i/s 3.0.5: 6.5 i/s - 1.15x slower 3.2.1: 6.3 i/s - 1.20x slower 3.1.2: 6.1 i/s - 1.24x slower ``` Now lets look at `String#[]` as the only operation ``` def frecuency(seq, length) frequencies = Hash.new(0) ns = seq.length + 1 - length for i in (0 ... ns) seq[i, length] # frequencies[seq[i, length]] += 1 end [ns, frequencies] end ``` ``` ��� benchmark-driver so_k_nucleotide.yml --rbenv '2.7.5;3.0.5;3.1.2;3.2.1' --repeat-count 5 Calculating ------------------------------------- 2.7.5 3.0.5 3.1.2 3.2.1 so_k_nucleotide 14.000 14.225 13.797 14.096 i/s - 1.000 times in 0.071427s 0.070300s 0.072481s 0.070944s Comparison: so_k_nucleotide 3.0.5: 14.2 i/s 3.2.1: 14.1 i/s - 1.01x slower 2.7.5: 14.0 i/s - 1.02x slower 3.1.2: 13.8 i/s - 1.03x slower ``` Now lets look at `String#[]` and `Hash#[]=` ``` def frecuency(seq, length) frequencies = Hash.new(0) ns = seq.length + 1 - length for i in (0 ... ns) frequencies[seq[i, length]] end [ns, frequencies] end ``` ``` Calculating ------------------------------------- 2.7.5 3.0.5 3.1.2 3.2.1 so_k_nucleotide 12.516 12.248 12.058 13.034 i/s - 1.000 times in 0.079900s 0.081645s 0.082931s 0.076725s Comparison: so_k_nucleotide 3.2.1: 13.0 i/s 2.7.5: 12.5 i/s - 1.04x slower 3.0.5: 12.2 i/s - 1.06x slower 3.1.2: 12.1 i/s - 1.08x slower ``` However the interesting bit is when we assign things to variables or when we increment the counter for a hash value which I think is `Hash#[]=` ``` def frecuency(seq, length) frequencies = Hash.new(0) ns = seq.length + 1 - length for i in (0 ... ns) x = seq[i, length] end [ns, frequencies] end ``` ``` ��� benchmark-driver so_k_nucleotide.yml --rbenv '2.7.5;3.0.5;3.1.2;3.2.1' --repeat-count 5 Calculating ------------------------------------- 2.7.5 3.0.5 3.1.2 3.2.1 so_k_nucleotide 13.641 12.170 11.857 12.001 i/s - 1.000 times in 0.073306s 0.082168s 0.084335s 0.083325s Comparison: so_k_nucleotide 2.7.5: 13.6 i/s 3.0.5: 12.2 i/s - 1.12x slower 3.2.1: 12.0 i/s - 1.14x slower 3.1.2: 11.9 i/s - 1.15x slower ``` This is the same as the baseline in the start of the comment ``` ��� benchmark-driver so_k_nucleotide.yml --rbenv '2.7.5;3.0.5;3.1.2;3.2.1' --repeat-count 5 Calculating ------------------------------------- 2.7.5 3.0.5 3.1.2 3.2.1 so_k_nucleotide 7.626 6.474 6.135 6.239 i/s - 1.000 times in 0.131126s 0.154468s 0.163005s 0.160283s Comparison: so_k_nucleotide 2.7.5: 7.6 i/s 3.0.5: 6.5 i/s - 1.18x slower 3.2.1: 6.2 i/s - 1.22x slower 3.1.2: 6.1 i/s - 1.24x slower ``` I think we might need to look into the assignment methods instead of the access methods for string and hash ���� ---------------------------------------- Bug #19438: Ruby 2.7 -> 3.2 Performance Regression in so_k_nucleotide benchmark https://bugs.ruby-lang.org/issues/19438#change-102159 * Author: nick.schwaderer (Nicholas Schwaderer) * Status: Open * Priority: Normal * ruby -v: 3.2.0 * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- ## Introduction Recently I had been going through some of the old benchmarks in the [Ruby Great Implementation Shootout](https://programmingzen.com/the-great-ruby-shootout-july-2010/) from around 2010. As an experiment, one night I ran the benchmarks against Ruby 3.2.0, Ruby 3.2.0 --yjit, TruffleRuby, TruffleRuby +GraalVM, and Ruby 2.6.10. Most results were as expected. However there was a benchmark that Ruby 2.6.10 was _consistently_ outperforming all new Rubies on. ## Method After pairing with @eightbitraptor, we discovered that this old benchmark was remarkably similar to an existing benchmark in the `/benchmark` directory, [so_k_nucleotide.yml](https://github.com/ruby/ruby/blob/master/benchmark/so_k_nucleotide.yml). We decided to go with that benchmark. For brevity I have not included the full 150 lines of the benchmark here. I tested this benchmark out with 100 runs using `benchmark-driver` against Ruby 2.7,3.0,3.1,3.2. (I had discovered that 2.7 was even faster than 2.6.). It appears that about half of the regression occured from 2.7 -> 3.0; the other half from 3.0 -> 3.2. One other interesting finding is that each minor version does appear to regress from the last, even if a little. ## Code This is my benchmark running code and harnass. [The full code and data can be found here](https://gist.github.com/Schwad/16edf3d7cc5316af4baf23497f3c6a8f) ```ruby RUNS = 100 results = Hash.new { |h, k| h[k] = [] } RUNS.times do |i| puts i run = `benchmark-driver so_k_nucleotide.yml --chruby '2.7.5;3.0.5;3.1.3;3.2.0' -o simple` run.scan(/\d\.\d\.\d/).each_with_index do |version, index| results[version] << run.scan(/\d\.\d\d\d/)[index] end end require 'csv' columns = results.keys outdata = CSV.generate do |csv| csv << columns RUNS.times do |i| csv << columns.map { |c| results[c][i] } end end File.write("output.csv", outdata) ``` ## Data Ruby 2.7.5 was consistently ~18-20% faster than Ruby 3.2.0 in this Benchmark ![Screenshot 2023-02-15 at 13 16 10](https://user-images.githubusercontent.com/7865030/219038430-4a124cc6-0d23-46e2-9794-d89d1f26e227.png) ## Next Steps I am happy to help investigate or learn more about this regression if anyone has any ideas. -- https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/