[ruby-core:81372] Re: [Ruby trunk Misc#13597] Does read_nonblock call remalloc for the buffer if does it just set the size attribute

From: Eric Wong <normalperson@...>
Date: 2017-05-24 19:21:47 UTC
List: ruby-core #81372
emily@mongodb.com wrote:
> Hello
> 
> I've observed that a lot of memory gets allocated and wasted
> when read_nonblock is called for a number of bytes much larger
> than is actually read from the socket.  This line
> https://github.com/ruby/ruby/blob/0130bb01baed404c0e3c75bd5db472415a6da1d3/io.c#L2686
> appears to eventually only change the heap size value here
> https://github.com/ruby/ruby/blob/144e06700705a3f067582682567bc77b429c4fca/string.c#L104
> but does not call remalloc.

Correct.  We do not realloc here since there is a good chance
the buffer can be reused soon after and need the larger size.
realloc can be very expensive.

> I see this request to allow an offset to be passed to read_nonblock:
> https://bugs.ruby-lang.org/issues/11484

Thanks for pinging on that, I guess I'll try implementing it at
some point (but I will need matz approval to make API changes).

> but until that is implemented, how do you recommend
> efficiently asking to read a large number of bytes from a
> socket? If I'm not mistaken, if I request 16000000, but only
> read 1000000, the buffer that has been allocated in
> io_read_nonblock for 16000000 doesn't seem to be resized.

You can use String#clear right away on the result:

  rbuf = ''
	tmp = ''
	case ret = io.read_nonblock(16384, tmp, exception: false)
	when String
    # tmp.object_id == ret.object_id at this point
    rbuf << ret
    ret.clear # calls free(3) internally
  else
    ...
  end while true

And you can also clear the bigger rbuf when you're done.

Coincidentally, I made a similar change to net/protocol for
net/http in the stdlib this weekend:

  https://svn.ruby-lang.org/cgi-bin/viewvc.cgi?view=revision&revision=58840

But of course, I expect a destination offset [Feature #11484]
to be more helpful.

> Would you recommend instead requesting a more predictable
> number of bytes, closer to the default system value
> (SO_RCVBUF, for example) in each call to read_nonblock?

That might be too complicated and a waste of syscalls in the
general case.  I'm not sure I saw value in going with sizes
larger than 1MB, and usually 16K is fine.  Using giant values
like 16MB will blow away your CPU cache.  Maybe, (just maybe)
16MB helps with really big transfers across LFNs
(long-fat-networks), but I doubt that's a a common case for DBs
:)

> For context, this pull request against the MongoDB Ruby driver has lead me to this investigation. https://github.com/mongodb/mongo-ruby-driver/pull/864

I don't agree with GitHub's Terms-of-Service nor do I run
Javascript or look at images; but I dumped that text and read
it; so I'll add some notes here:

  In my experience, 4K is too small for even 70ms latency
  connections,  but that might've just been on the writing
  side...  I would choose 8K, at least, but usually 16K.  It
  also depends on network latency and hardware.

  Choosing 16K also has a good side effect with current CRuby: a
  malloc implementation can internally reuse space which Ruby
  uses internally for buffers; potentially reducing
  fragmentation and helping cache latency.  And we (CRuby) have
  been using 16K for most IO buffers for a long time...

Anyways, I'll be glad to help with further network-related
Ruby stuff on here as long as everything is plain text.

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

In This Thread

Prev Next