From: "Freaky (Thomas Hurst) via ruby-core" <ruby-core@...> Date: 2023-09-19T23:15:21+00:00 Subject: [ruby-core:114823] [Ruby master Bug#19875] Ruby 3.0 -> 3.1 Performance regression in String#count Issue #19875 has been updated by Freaky (Thomas Hurst). File bytecount.c added nobu (Nobuyoshi Nakada) wrote in #note-13: > Freaky (Thomas Hurst) wrote in #note-12: > > I see a difference if I configure with cflags=-msse4.2 - it's off by default. We probably want some runtime CPU feature detection if people are actually going to use it. > > Who/what do you mean by "we"? > It feels like a compiler's (or optimizer's) job. If only! I've added an AVX2 path and worked on getting runtime dispatch working on both clang and gcc. I've combined the result into a single file suitable to drop straight over the top of your `missing/bytecount.c`, which yields this on my Zen 3: ``` './ruby test.rb' ran 22.37 � 1.03 times faster than './ruby.master test.rb' ``` And this - using exactly the same binary - on an old K8 machine which lacks AVX2 and SSE4: ``` ./ruby test.rb ran 2.97 � 0.02 times faster than ./ruby.master test.rb ``` ---------------------------------------- Bug #19875: Ruby 3.0 -> 3.1 Performance regression in String#count https://bugs.ruby-lang.org/issues/19875#change-104667 * Author: iz (Illia Zub) * Status: Open * Priority: Normal * ruby -v: 3.2.2 * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- `String#count` became slower since Ruby 3.1. Originally found by `@Freaky`: https://github.com/ruby/ruby/pull/4001#issuecomment-1714779781 Compared using the [`benchmark-driver` gem](https://github.com/benchmark-driver/benchmark-driver). ``` $ benchmark-driver tmp/string_count_benchmark_driver.yml --rbenv '3.1.1;3.1.4;2.7.2;3.2.2;3.0.6' Calculating ------------------------------------- 3.1.1 3.1.4 2.7.2 3.2.2 3.0.6 count 465.804 463.741 865.783 462.711 857.395 i/s - 10.000k times in 21.468251s 21.563768s 11.550239s 21.611783s 11.663235s Comparison: count 2.7.2: 865.8 i/s 3.0.6: 857.4 i/s - 1.01x slower 3.1.1: 465.8 i/s - 1.86x slower 3.1.4: 463.7 i/s - 1.87x slower 3.2.2: 462.7 i/s - 1.87x slower ``` Benchmark: ```yml $ cat ./tmp/string_count_benchmark_driver.yml loop_count: 10_000 prelude: | html = "\nruby\n" * 1024 * 1024 benchmark: count: html.count($/) ``` --- *Initially, I noticed the difference between `str.count($/)` and `str.lines.size` when working on the performance improvement: https://serpapi.com/blog/lines-count-failed-deployments/* ---Files-------------------------------- rb_str_len.fast (31.9 KB) rb_str_len.slow (34 KB) revert-4001.patch (1.71 KB) rb_str_count.S (11.8 KB) bytecount.c (7.23 KB) -- https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/