From: "Freaky (Thomas Hurst) via ruby-core" <ruby-core@...>
Date: 2023-09-19T23:15:21+00:00
Subject: [ruby-core:114823] [Ruby master Bug#19875] Ruby 3.0 -> 3.1 Performance regression in String#count

Issue #19875 has been updated by Freaky (Thomas Hurst).



File bytecount.c added



nobu (Nobuyoshi Nakada) wrote in #note-13:

> Freaky (Thomas Hurst) wrote in #note-12:

> > I see a difference if I configure with cflags=-msse4.2 - it's off by default.  We probably want some runtime CPU feature detection if people are actually going to use it.

> 

> Who/what do you mean by "we"?

> It feels like a compiler's (or optimizer's) job.



If only!



I've added an AVX2 path and worked on getting runtime dispatch working on both clang and gcc.  I've combined the result into a single file suitable to drop straight over the top of your `missing/bytecount.c`, which yields this on my Zen 3:





```

  './ruby test.rb' ran

   22.37 � 1.03 times faster than './ruby.master test.rb'

```



And this - using exactly the same binary - on an old K8 machine which lacks AVX2 and SSE4:



```

  ./ruby test.rb ran

    2.97 � 0.02 times faster than ./ruby.master test.rb

```







----------------------------------------

Bug #19875: Ruby 3.0 -> 3.1 Performance regression in String#count

https://bugs.ruby-lang.org/issues/19875#change-104667



* Author: iz (Illia Zub)

* Status: Open

* Priority: Normal

* ruby -v: 3.2.2

* Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN

----------------------------------------

`String#count` became slower since Ruby 3.1. Originally found by `@Freaky`: https://github.com/ruby/ruby/pull/4001#issuecomment-1714779781



Compared using the [`benchmark-driver` gem](https://github.com/benchmark-driver/benchmark-driver).



```

$ benchmark-driver tmp/string_count_benchmark_driver.yml --rbenv '3.1.1;3.1.4;2.7.2;3.2.2;3.0.6'                                                 

Calculating -------------------------------------

                          3.1.1       3.1.4       2.7.2       3.2.2       3.0.6

               count    465.804     463.741     865.783     462.711     857.395 i/s -     10.000k times in 21.468251s 21.563768s 11.550239s 21.611783s 11.663235s



Comparison:

                            count

               2.7.2:       865.8 i/s 

               3.0.6:       857.4 i/s - 1.01x  slower

               3.1.1:       465.8 i/s - 1.86x  slower

               3.1.4:       463.7 i/s - 1.87x  slower

               3.2.2:       462.7 i/s - 1.87x  slower

```



Benchmark:





```yml

$ cat ./tmp/string_count_benchmark_driver.yml 

loop_count: 10_000

prelude: |

  html = "\nruby\n" * 1024 * 1024

benchmark:

  count: html.count($/)

```



---



*Initially, I noticed the difference between `str.count($/)` and `str.lines.size` when working on the performance improvement: https://serpapi.com/blog/lines-count-failed-deployments/*



---Files--------------------------------

rb_str_len.fast (31.9 KB)

rb_str_len.slow (34 KB)

revert-4001.patch (1.71 KB)

rb_str_count.S (11.8 KB)

bytecount.c (7.23 KB)





-- 

https://bugs.ruby-lang.org/

 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/