From: emily@... Date: 2017-05-29T10:16:41+00:00 Subject: [ruby-core:81450] [Ruby trunk Misc#13597] Does read_nonblock call remalloc for the buffer if does it just set the size attribute Issue #13597 has been updated by emilys (Emily Stolfo). Hi Eric Thank you so much for your response - it provided a lot of useful information I didn't know otherwise. I've pointed the user who opened the pull request to your response so he has the chance to update his code based on the new information. I haven't heard back from him yet but in the meantime, I'll do some testing and see what I find to be the optimal solution. I'll certainly ping you again if I have questions...and will also look forward to perhaps having the ability to pass an offset to read_noblock in the future. Thanks again Emily normalperson (Eric Wong) wrote: > emily@mongodb.com wrote: > > Hello > > > > I've observed that a lot of memory gets allocated and wasted > > when read_nonblock is called for a number of bytes much larger > > than is actually read from the socket. This line > > https://github.com/ruby/ruby/blob/0130bb01baed404c0e3c75bd5db472415a6da1d3/io.c#L2686 > > appears to eventually only change the heap size value here > > https://github.com/ruby/ruby/blob/144e06700705a3f067582682567bc77b429c4fca/string.c#L104 > > but does not call remalloc. > > Correct. We do not realloc here since there is a good chance > the buffer can be reused soon after and need the larger size. > realloc can be very expensive. > > > I see this request to allow an offset to be passed to read_nonblock: > > https://bugs.ruby-lang.org/issues/11484 > > Thanks for pinging on that, I guess I'll try implementing it at > some point (but I will need matz approval to make API changes). > > > but until that is implemented, how do you recommend > > efficiently asking to read a large number of bytes from a > > socket? If I'm not mistaken, if I request 16000000, but only > > read 1000000, the buffer that has been allocated in > > io_read_nonblock for 16000000 doesn't seem to be resized. > > You can use String#clear right away on the result: > > rbuf = '' > tmp = '' > case ret = io.read_nonblock(16384, tmp, exception: false) > when String > # tmp.object_id == ret.object_id at this point > rbuf << ret > ret.clear # calls free(3) internally > else > ... > end while true > > And you can also clear the bigger rbuf when you're done. > > Coincidentally, I made a similar change to net/protocol for > net/http in the stdlib this weekend: > > https://svn.ruby-lang.org/cgi-bin/viewvc.cgi?view=revision&revision=58840 > > But of course, I expect a destination offset [Feature #11484] > to be more helpful. > > > Would you recommend instead requesting a more predictable > > number of bytes, closer to the default system value > > (SO_RCVBUF, for example) in each call to read_nonblock? > > That might be too complicated and a waste of syscalls in the > general case. I'm not sure I saw value in going with sizes > larger than 1MB, and usually 16K is fine. Using giant values > like 16MB will blow away your CPU cache. Maybe, (just maybe) > 16MB helps with really big transfers across LFNs > (long-fat-networks), but I doubt that's a a common case for DBs > :) > > > For context, this pull request against the MongoDB Ruby driver has lead me to this investigation. https://github.com/mongodb/mongo-ruby-driver/pull/864 > > I don't agree with GitHub's Terms-of-Service nor do I run > Javascript or look at images; but I dumped that text and read > it; so I'll add some notes here: > > In my experience, 4K is too small for even 70ms latency > connections, but that might've just been on the writing > side... I would choose 8K, at least, but usually 16K. It > also depends on network latency and hardware. > > Choosing 16K also has a good side effect with current CRuby: a > malloc implementation can internally reuse space which Ruby > uses internally for buffers; potentially reducing > fragmentation and helping cache latency. And we (CRuby) have > been using 16K for most IO buffers for a long time... > > Anyways, I'll be glad to help with further network-related > Ruby stuff on here as long as everything is plain text. ---------------------------------------- Misc #13597: Does read_nonblock call remalloc for the buffer if does it just set the size attribute https://bugs.ruby-lang.org/issues/13597#change-65153 * Author: emilys (Emily Stolfo) * Status: Open * Priority: Normal * Assignee: ---------------------------------------- Hello I've observed that a lot of memory gets allocated and wasted when read_nonblock is called for a number of bytes much larger than is actually read from the socket. This line https://github.com/ruby/ruby/blob/0130bb01baed404c0e3c75bd5db472415a6da1d3/io.c#L2686 appears to eventually only change the heap size value here https://github.com/ruby/ruby/blob/144e06700705a3f067582682567bc77b429c4fca/string.c#L104 but does not call remalloc. I see this request to allow an offset to be passed to read_nonblock: https://bugs.ruby-lang.org/issues/11484 but until that is implemented, how do you recommend efficiently asking to read a large number of bytes from a socket? If I'm not mistaken, if I request 16000000, but only read 1000000, the buffer that has been allocated in io_read_nonblock for 16000000 doesn't seem to be resized. Would you recommend instead requesting a more predictable number of bytes, closer to the default system value (SO_RCVBUF, for example) in each call to read_nonblock? For context, this pull request against the MongoDB Ruby driver has lead me to this investigation. https://github.com/mongodb/mongo-ruby-driver/pull/864 Thank you in advance Emily -- https://bugs.ruby-lang.org/ Unsubscribe: