From: Eric Wong Date: 2017-05-24T19:21:47+00:00 Subject: [ruby-core:81372] Re: [Ruby trunk Misc#13597] Does read_nonblock call remalloc for the buffer if does it just set the size attribute emily@mongodb.com wrote: > Hello > > I've observed that a lot of memory gets allocated and wasted > when read_nonblock is called for a number of bytes much larger > than is actually read from the socket. This line > https://github.com/ruby/ruby/blob/0130bb01baed404c0e3c75bd5db472415a6da1d3/io.c#L2686 > appears to eventually only change the heap size value here > https://github.com/ruby/ruby/blob/144e06700705a3f067582682567bc77b429c4fca/string.c#L104 > but does not call remalloc. Correct. We do not realloc here since there is a good chance the buffer can be reused soon after and need the larger size. realloc can be very expensive. > I see this request to allow an offset to be passed to read_nonblock: > https://bugs.ruby-lang.org/issues/11484 Thanks for pinging on that, I guess I'll try implementing it at some point (but I will need matz approval to make API changes). > but until that is implemented, how do you recommend > efficiently asking to read a large number of bytes from a > socket? If I'm not mistaken, if I request 16000000, but only > read 1000000, the buffer that has been allocated in > io_read_nonblock for 16000000 doesn't seem to be resized. You can use String#clear right away on the result: rbuf = '' tmp = '' case ret = io.read_nonblock(16384, tmp, exception: false) when String # tmp.object_id == ret.object_id at this point rbuf << ret ret.clear # calls free(3) internally else ... end while true And you can also clear the bigger rbuf when you're done. Coincidentally, I made a similar change to net/protocol for net/http in the stdlib this weekend: https://svn.ruby-lang.org/cgi-bin/viewvc.cgi?view=revision&revision=58840 But of course, I expect a destination offset [Feature #11484] to be more helpful. > Would you recommend instead requesting a more predictable > number of bytes, closer to the default system value > (SO_RCVBUF, for example) in each call to read_nonblock? That might be too complicated and a waste of syscalls in the general case. I'm not sure I saw value in going with sizes larger than 1MB, and usually 16K is fine. Using giant values like 16MB will blow away your CPU cache. Maybe, (just maybe) 16MB helps with really big transfers across LFNs (long-fat-networks), but I doubt that's a a common case for DBs :) > For context, this pull request against the MongoDB Ruby driver has lead me to this investigation. https://github.com/mongodb/mongo-ruby-driver/pull/864 I don't agree with GitHub's Terms-of-Service nor do I run Javascript or look at images; but I dumped that text and read it; so I'll add some notes here: In my experience, 4K is too small for even 70ms latency connections, but that might've just been on the writing side... I would choose 8K, at least, but usually 16K. It also depends on network latency and hardware. Choosing 16K also has a good side effect with current CRuby: a malloc implementation can internally reuse space which Ruby uses internally for buffers; potentially reducing fragmentation and helping cache latency. And we (CRuby) have been using 16K for most IO buffers for a long time... Anyways, I'll be glad to help with further network-related Ruby stuff on here as long as everything is plain text. Unsubscribe: