From: janko.marohnic@... Date: 2018-05-09T12:50:10+00:00 Subject: [ruby-core:86954] [Ruby trunk Bug#14745] High memory usage when using String#replace with IO.copy_stream Issue #14745 has been reported by janko (Janko Marohni��). ---------------------------------------- Bug #14745: High memory usage when using String#replace with IO.copy_stream https://bugs.ruby-lang.org/issues/14745 * Author: janko (Janko Marohni��) * Status: Open * Priority: Normal * Assignee: * Target version: * ruby -v: ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-darwin17] * Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN ---------------------------------------- I'm using custom IO-like objects that implement #read as the first argument to IO.copy_stream, and I noticed odd memory behaviour when using String#replace on the output buffer versus String#clear. Here is an example of a "fake IO" object where #read uses String#clear on the output buffer: ~~~ ruby GC.disable require "stringio" class FakeIO def initialize(content) @io = StringIO.new(content) end def read(length, outbuf) chunk = @io.read(length) if chunk outbuf.clear outbuf << chunk chunk.clear else outbuf.clear end outbuf unless outbuf.empty? end end io = FakeIO.new("a" * 50*1024*1024) # 50MB IO.copy_stream(io, File::NULL) system "top -pid #{Process.pid}" ~~~ This program outputs memory usage of 50MB at the end, as expected ��� 50MB was loaded into memory at the beginning and any new strings are deallocated. However, if I modify the #read implementation to use String#replace instead of String#clear: ~~~ ruby def read(length, outbuf) chunk = @io.read(length) if chunk outbuf.replace chunk chunk.clear else outbuf.clear end outbuf unless outbuf.empty? end ~~~ the memory usage has now doubled to 100MB at the end of the program, indicating that some string bytes weren't successfully deallocated. So, it seems that String#replace has different behaviour compared to String#clear + String#<<. I was *only* able to reproduce this with `IO.copy_stream`, the following program shows 50MB memory usage, regardless of whether the String#clear or String#replace approach is used: ~~~ ruby GC.disable buffer = "a" * 50*1024*1024 chunk = "b" * 50*1024*1024 if ARGV[0] == "clear" buffer.clear buffer << chunk else buffer.replace chunk end chunk.clear system "top -pid #{Process.pid}" ~~~ With this program I also noticed one interesting thing. If I remove `chunk.clear`, then the "clear" version uses 100MB as expected (because both buffer and chunk strings are 50MB large), but the "replace" version uses only 50MB, which makes it appear that the `buffer` string doesn't use any memory when in fact it should use 50MB just like the `chunk` string. I found that odd, and I think it might be a clue to the memory bug with String#replace I experienced when using `IO.copy_stream`. -- https://bugs.ruby-lang.org/ Unsubscribe: