From: duerst <noreply@...>
Date: 2021-10-25T09:26:23+00:00
Subject: [ruby-core:105783] [Ruby master Feature#18254] Add an `offset` parameter to String#unpack and String#unpack1

Issue #18254 has been updated by duerst (Martin D��rst).


mame (Yusuke Endoh) wrote in #note-5:
> Just a confirmation: the offset is byte-oriented, not character-oriented, right? There are a format "u" which is UTF-8 coding, so the behavior should be explained clearly in the document.

This is not only a problem of "explain it in the document". In order for this offset to work well, there should be a way to know how many bytes an invocation of String#unpack consumes. In many cases, that's very easy to calculate from the format string, but in others, in particular for UTF-8, it's not easy.

----------------------------------------
Feature #18254: Add an `offset` parameter to String#unpack and String#unpack1
https://bugs.ruby-lang.org/issues/18254#change-94299

* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
----------------------------------------
When working with binary protocols it's common to have to first unpack some kind of header or type prefix, and then based on that unpack another part of the string.

For instance here's [a code snippet from Dalli, the most common Memcached client](https://github.com/petergoldstein/dalli/blob/76b79d78cda13562da17bc99f92edcedf1873994/lib/dalli/protocol/binary.rb#L156-L184):

```ruby
while buf.bytesize - pos >= 24
  header = buf.slice(pos, 24)
  (key_length, _, body_length, cas) = header.unpack(KV_HEADER)

  if key_length == 0
    # all done!
    @multi_buffer = nil
    @position = nil
    @inprogress = false
    break

  elsif buf.bytesize - pos >= 24 + body_length
    flags = buf.slice(pos + 24, 4).unpack1("N")
    key = buf.slice(pos + 24 + 4, key_length)
    value = buf.slice(pos + 24 + 4 + key_length, body_length - key_length - 4) if body_length - key_length - 4 > 0

    pos = pos + 24 + body_length

    begin
      values[key] = [deserialize(value, flags), cas]
    rescue DalliError
    end

  else
    # not enough data yet, wait for more
    break
  end
end
@position = pos
```

### Proposal

If `unpack` and `unpack1` had an `offset:` parameter, it would allow this kind of code to extract the fields it needs without allocating and copying as much strings, e.g.:

```ruby
flags = buf.slice(pos + 24, 4).unpack1("N")
```

could be:

```ruby
buf.unpack1("N", offset: pos + 24)
```




-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>