[ruby-core:105662] [Ruby master Feature#18254] Add an `offset` parameter to String#unpack and String#unpack1
From:
"byroot (Jean Boussier)" <noreply@...>
Date:
2021-10-18 08:32:56 UTC
List:
ruby-core #105662
Issue #18254 has been updated by byroot (Jean Boussier).
Ah, I didn't know about it, but then you just allocated a string and converted an integer to string, so it's even slower than the `slice` pattern:
```ruby
# frozen_string_literal: true
require 'benchmark/ips'
STRING = Random.bytes(200)
POS = 12
Benchmark.ips do |x|
x.report("no-offset") { STRING.unpack1("N") }
x.report("slice-offset") { STRING.slice(POS, 4).unpack1("N")}
x.report("unpack-offset") { STRING.unpack1("@#{POS}N") }
x.compare!
end
```
```
# Ruby 2.7.2
Warming up --------------------------------------
no-offset 1.016M i/100ms
slice-offset 532.173k i/100ms
unpack-offset 321.805k i/100ms
Calculating -------------------------------------
no-offset 10.090M (賊 1.2%) i/s - 50.782M in 5.033549s
slice-offset 5.318M (賊 2.1%) i/s - 26.609M in 5.005346s
unpack-offset 3.205M (賊 1.8%) i/s - 16.090M in 5.021922s
Comparison:
no-offset: 10090269.9 i/s
slice-offset: 5318453.9 i/s - 1.90x (賊 0.00) slower
unpack-offset: 3205017.9 i/s - 3.15x (賊 0.00) slower
```
Based on this, an `offset` parameter could make the current code almost 2x more efficient.
----------------------------------------
Feature #18254: Add an `offset` parameter to String#unpack and String#unpack1
https://bugs.ruby-lang.org/issues/18254#change-94161
* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
----------------------------------------
When working with binary protocols it's common to have to first unpack some kind of header or type prefix, and then based on that unpack another part of the string.
For instance here's [a code snippet from Dalli, the most common Memcached client](https://github.com/petergoldstein/dalli/blob/76b79d78cda13562da17bc99f92edcedf1873994/lib/dalli/protocol/binary.rb#L156-L184):
```ruby
while buf.bytesize - pos >= 24
header = buf.slice(pos, 24)
(key_length, _, body_length, cas) = header.unpack(KV_HEADER)
if key_length == 0
# all done!
@multi_buffer = nil
@position = nil
@inprogress = false
break
elsif buf.bytesize - pos >= 24 + body_length
flags = buf.slice(pos + 24, 4).unpack1("N")
key = buf.slice(pos + 24 + 4, key_length)
value = buf.slice(pos + 24 + 4 + key_length, body_length - key_length - 4) if body_length - key_length - 4 > 0
pos = pos + 24 + body_length
begin
values[key] = [deserialize(value, flags), cas]
rescue DalliError
end
else
# not enough data yet, wait for more
break
end
end
@position = pos
```
### Proposal
If `unpack` and `unpack1` had an `offset:` parameter, it would allow this kind of code to extract the fields it needs without allocating and copying as much strings, e.g.:
```ruby
flags = buf.slice(pos + 24, 4).unpack1("N")
```
could be:
```ruby
buf.unpack1("N", offset: pos + 24)
```
--
https://bugs.ruby-lang.org/
Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>