From: sam.saffron@... Date: 2018-05-03T05:43:47+00:00 Subject: [ruby-core:86858] [Ruby trunk Feature#14723] [WIP] sleepy GC Issue #14723 has been updated by sam.saffron (Sam Saffron). From my testing on Discourse bench ... the difference is pretty much not that measurable Before patch ``` Unicorn: (workers: 3) Include env: false Iterations: 200, Best of: 1 Concurrency: 1 --- categories: 50: 58 75: 65 90: 73 99: 123 home: 50: 62 75: 70 90: 86 99: 139 topic: 50: 60 75: 65 90: 72 99: 117 categories_admin: 50: 101 75: 106 90: 115 99: 210 home_admin: 50: 107 75: 114 90: 132 99: 211 topic_admin: 50: 115 75: 123 90: 134 99: 201 timings: load_rails: 5444 ruby-version: 2.6.0-p-1 rss_kb: 196444 pss_kb: 139514 memorysize: 7.79 GB virtual: vmware architecture: amd64 operatingsystem: Ubuntu processor0: Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz physicalprocessorcount: 2 kernelversion: 4.15.0 rss_kb_23779: 309984 pss_kb_23779: 249785 rss_kb_23817: 307056 pss_kb_23817: 246738 rss_kb_23948: 304732 pss_kb_23948: 244364 ``` After patch: ``` Iterations: 200, Best of: 1 Concurrency: 1 --- categories: 50: 56 75: 61 90: 70 99: 116 home: 50: 63 75: 70 90: 77 99: 170 topic: 50: 61 75: 68 90: 77 99: 96 categories_admin: 50: 102 75: 111 90: 121 99: 182 home_admin: 50: 96 75: 102 90: 108 99: 205 topic_admin: 50: 109 75: 118 90: 130 99: 192 timings: load_rails: 4987 ruby-version: 2.6.0-p-1 rss_kb: 196004 pss_kb: 137541 memorysize: 7.79 GB virtual: vmware architecture: amd64 operatingsystem: Ubuntu processor0: Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz physicalprocessorcount: 2 kernelversion: 4.15.0 rss_kb_16393: 306312 pss_kb_16393: 244353 rss_kb_16438: 307052 pss_kb_16438: 244942 rss_kb_16555: 305092 pss_kb_16555: 242997 ``` Nothing really sticks out as absolutely an improvement across the board though some of the benches are a bit faster, memory is almost not impacted. It is no worse than head, but it is also not easy to measure how much better it is, we may need to repeat with significantly more iterations to remove noise. I do want to review Discourse carefully to ensure we are using async_exec everywhere... will do so later today. Eric if you feel like trying out the bench, clone: https://github.com/discourse/discourse.git and run ruby script/bench.rb I also have some allocator benches you can play with at: https://github.com/SamSaffron/allocator_bench.git ---------------------------------------- Feature #14723: [WIP] sleepy GC https://bugs.ruby-lang.org/issues/14723#change-71823 * Author: normalperson (Eric Wong) * Status: Open * Priority: Normal * Assignee: * Target version: ---------------------------------------- The idea is to use "idle time" when process is otherwise sleeping and using no CPU time to perform GC. It makes sense because real world traffic sees idle time due to network latency and waiting for user input. Right now, it's Linux-only. Future patches will affect other sleeping functions: IO.select, Kernel#sleep, Thread#join, Process.waitpid, etc... I don't know if this patch can be implemented for win32, right now it's just dummy functions and that will be somebody elses job. But all pthreads platforms should eventually benefit. Before this patch, the entropy-dependent script below takes 95MB consistently on my system. Now, depending on the amount of entropy on my system, it takes anywhere from 43MB to 75MB. I'm using /dev/urandom to simulate real-world network latency variations. There is no improvement when using /dev/zero because the process is never idle. require 'net/http' require 'digest/md5' Thread.abort_on_exception = true s = TCPServer.new('127.0.0.1', 0) len = 1024 * 1024 * 1024 th = Thread.new do c = s.accept c.readpartial(16384) c.write("HTTP/1.0 200 OK\r\nContent-Length: #{len}\r\n\r\n") IO.copy_stream('/dev/urandom', c, len) c.close end addr = s.addr Net::HTTP.start(addr[3], addr[1]) do |http| http.request_get('/') do |res| dig = Digest::MD5.new res.read_body { |buf| dig.update(buf) } puts dig.hexdigest end end The above script is also dependent on net/protocol using read_nonblock. Ordinary IO objects will need IO#nonblock=true to see benefits (because they never hit rb_wait_for_single_fd) * gc.c (rb_gc_inprogress): new function (rb_gc_step): ditto * internal.h: declare prototypes for new gc.c functions * thread_pthread.c (gvl_contended_p): new function * thread_win32.c (gvl_contended_p): ditto (dummy) * thread.c (rb_wait_for_single_fd w/ ppoll): use new functions to perform GC while GVL is uncontended and GC is lazy sweeping or incremental marking [ruby-core:86265] ``` 2 part patch broken out https://80x24.org/spew/20180429035007.6499-2-e@80x24.org/raw https://80x24.org/spew/20180429035007.6499-3-e@80x24.org/raw Also on my "sleepy-gc" git branch @ git://80x24.org/ruby.git ---Files-------------------------------- sleepy-gc-wip-v1.diff (5.37 KB) -- https://bugs.ruby-lang.org/ Unsubscribe: