From: sam.saffron@... Date: 2018-05-02T05:19:36+00:00 Subject: [ruby-core:86820] [Ruby trunk Feature#14723] [WIP] sleepy GC Issue #14723 has been updated by sam.saffron (Sam Saffron). I can confirm this has a MAJOR benefit for particular workloads with the pg gem. In particular if you are using async_exec (which most of us should) ``` require 'pg' require 'benchmark/ips' $conn = PG.connect(dbname: 'postgres') Benchmark.ips do |b| b.config(time: 10, warmup: 3) b.report("exec") do $conn.exec("SELECT generate_series(1,10000)").to_a end b.report("async exec") do $conn.async_exec("SELECT generate_series(1,10000)").to_a end end ``` Before: ``` sam@ubuntu pg_perf % ruby test.rb Warming up -------------------------------------- exec 20.000 i/100ms async exec 21.000 i/100ms Calculating ------------------------------------- exec 212.760 (�� 1.4%) i/s - 2.140k in 10.060122s async exec 214.570 (�� 1.9%) i/s - 2.163k in 10.084992s sam@ubuntu pg_perf % ruby test.rb Warming up -------------------------------------- exec 19.000 i/100ms async exec 20.000 i/100ms Calculating ------------------------------------- exec 202.603 (�� 5.9%) i/s - 2.033k in 10.072578s async exec 201.516 (�� 6.0%) i/s - 2.020k in 10.062116s ``` After: ``` sam@ubuntu pg_perf % ruby test.rb Warming up -------------------------------------- exec 21.000 i/100ms async exec 23.000 i/100ms Calculating ------------------------------------- exec 211.320 (�� 2.8%) i/s - 2.121k in 10.044445s async exec 240.188 (�� 1.7%) i/s - 2.415k in 10.057509s sam@ubuntu pg_perf % ruby test.rb Warming up -------------------------------------- exec 20.000 i/100ms async exec 23.000 i/100ms Calculating ------------------------------------- exec 209.644 (�� 1.4%) i/s - 2.100k in 10.018850s async exec 237.100 (�� 2.1%) i/s - 2.392k in 10.092435s ``` So this moves us from 200-210 ops/s to 240 ops/s. This is a major perf boost, still to see if it holds on the full Discourse bench, but I expect major improvements cause waiting for SQL is very very very common in web apps. I do not expect too much benefit in concurrent puma workloads, but for us in unicorn we should have a pretty nice boost. ---------------------------------------- Feature #14723: [WIP] sleepy GC https://bugs.ruby-lang.org/issues/14723#change-71783 * Author: normalperson (Eric Wong) * Status: Open * Priority: Normal * Assignee: * Target version: ---------------------------------------- The idea is to use "idle time" when process is otherwise sleeping and using no CPU time to perform GC. It makes sense because real world traffic sees idle time due to network latency and waiting for user input. Right now, it's Linux-only. Future patches will affect other sleeping functions: IO.select, Kernel#sleep, Thread#join, Process.waitpid, etc... I don't know if this patch can be implemented for win32, right now it's just dummy functions and that will be somebody elses job. But all pthreads platforms should eventually benefit. Before this patch, the entropy-dependent script below takes 95MB consistently on my system. Now, depending on the amount of entropy on my system, it takes anywhere from 43MB to 75MB. I'm using /dev/urandom to simulate real-world network latency variations. There is no improvement when using /dev/zero because the process is never idle. require 'net/http' require 'digest/md5' Thread.abort_on_exception = true s = TCPServer.new('127.0.0.1', 0) len = 1024 * 1024 * 1024 th = Thread.new do c = s.accept c.readpartial(16384) c.write("HTTP/1.0 200 OK\r\nContent-Length: #{len}\r\n\r\n") IO.copy_stream('/dev/urandom', c, len) c.close end addr = s.addr Net::HTTP.start(addr[3], addr[1]) do |http| http.request_get('/') do |res| dig = Digest::MD5.new res.read_body { |buf| dig.update(buf) } puts dig.hexdigest end end The above script is also dependent on net/protocol using read_nonblock. Ordinary IO objects will need IO#nonblock=true to see benefits (because they never hit rb_wait_for_single_fd) * gc.c (rb_gc_inprogress): new function (rb_gc_step): ditto * internal.h: declare prototypes for new gc.c functions * thread_pthread.c (gvl_contended_p): new function * thread_win32.c (gvl_contended_p): ditto (dummy) * thread.c (rb_wait_for_single_fd w/ ppoll): use new functions to perform GC while GVL is uncontended and GC is lazy sweeping or incremental marking [ruby-core:86265] ``` 2 part patch broken out https://80x24.org/spew/20180429035007.6499-2-e@80x24.org/raw https://80x24.org/spew/20180429035007.6499-3-e@80x24.org/raw Also on my "sleepy-gc" git branch @ git://80x24.org/ruby.git ---Files-------------------------------- sleepy-gc-wip-v1.diff (5.37 KB) -- https://bugs.ruby-lang.org/ Unsubscribe: