[ruby-core:99559] [Ruby master Feature#17115] Optimize String#casecmp? for ASCII strings
From:
jean.boussier@...
Date:
2020-08-11 15:16:13 UTC
List:
ruby-core #99559
Issue #17115 has been reported by byroot (Jean Boussier).
----------------------------------------
Feature #17115: Optimize String#casecmp? for ASCII strings
https://bugs.ruby-lang.org/issues/17115
* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
----------------------------------------
Patch: https://github.com/ruby/ruby/pull/3369
`casecmp?` is kind of a performance trap as it's much slower than using a case insensitive regexp or just `casecmp == 0`.
```
str = "Connection"
cmp = "connection"
Benchmark.ips do |x|
x.report('/\A\z/i.match?') { /\Afoo\Z/i.match?(str) }
x.report('casecmp?') { cmp.casecmp?(str) }
x.report('casecmp') { cmp.casecmp(str) == 0 }
x.compare!
end
Calculating -------------------------------------
/\A\z/i.match? 11.447M (1.3%) i/s - 57.814M in 5.051489s
casecmp? 6.197M (ア 0.9%) i/s - 31.138M in 5.025252s
casecmp 12.753M (ア 1.2%) i/s - 64.636M in 5.069195s
Comparison:
casecmp: 12752791.6 i/s
/\A\z/i.match?: 11446996.1 i/s - 1.11x (ア 0.00) slower
casecmp?: 6196886.0 i/s - 2.06x (ア 0.00) slower
```
This is because contrary to the others it tries to be correct in regards to unicode case folding.
However there are cases where fast case insentive equality check of known ASCII strings is useful. For instance for matching HTTP headers.
This patch check if both strings use a single byte encoding, and if so then delegate most of the work to strncasecmp(3)
This makes casecmp? sligthly faster than `casecmp == 0` when both strings are ASCII.
```
| |compare-ruby|built-ruby|
|:-----------------------|-----------:|---------:|
|casecmp-1 | 11.618M| 10.757M|
| | 1.08x| -|
|casecmp-10 | 1.849M| 1.723M|
| | 1.07x| -|
|casecmp-100 | 204.490k| 186.798k|
| | 1.09x| -|
|casecmp-1000 | 20.413k| 20.184k|
| | 1.01x| -|
|casecmp-nonascii1 | 19.541M| 20.100M|
| | -| 1.03x|
|casecmp-nonascii10 | 19.489M| 19.914M|
| | -| 1.02x|
|casecmp-nonascii100 | 19.479M| 20.155M|
| | -| 1.03x|
|casecmp-nonascii1000 | 19.462M| 20.064M|
| | -| 1.03x|
|casecmp_p-1 | 2.214M| 12.030M|
| | -| 5.43x|
|casecmp_p-10 | 1.373M| 2.150M|
| | -| 1.57x|
|casecmp_p-100 | 249.292k| 231.041k|
| | 1.08x| -|
|casecmp_p-1000 | 16.173k| 23.592k|
| | -| 1.46x|
|casecmp_p-nonascii1 | 651.921k| 650.572k|
| | 1.00x| -|
|casecmp_p-nonascii10 | 108.253k| 109.006k|
| | -| 1.01x|
|casecmp_p-nonascii100 | 11.749k| 11.889k|
| | -| 1.01x|
|casecmp_p-nonascii1000 | 1.140k| 1.138k|
|
```
--
https://bugs.ruby-lang.org/
Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>