From: "kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core" Date: 2023-01-12T02:20:59+00:00 Subject: [ruby-core:111793] [Ruby master Feature#19322] Support spawning "private" child processes Issue #19322 has been updated by kjtsanaktsidis (KJ Tsanaktsidis). > In the example you shown, I believe it's none of Unicorn's business to reap arbitrary processes Firstly, I do want to note that I don't think this is just a Unicorn problem. This is the "classic unix" way of writing a preforking pool of workers of any kind, and I'm sure similar code exists in many deployed Ruby applications. There's no way, when responding to a SIGCHLD, to know what child died until _after_ you actually reap it and steal its status code. If Unicorn (as an example) wanted to avoid reaping children it does not own, it would need to perform O(N) waitpid system calls to wait on each of its known children and see if they've exited. Alternatively, it could do the pass-down-a-pipe trick that @normalperson pointed out above, but then you can't get the exit status. > it doesn't compose That's the problem with the entire UNIX process API - it doesn't compose! Subprocesses exiting raise signals that run in other parts of the program, other parts of the program can accidentally wait for your subprocesses and steal the exit status from you, pid re-use means that a gem can't necessarily even _tell_ that its subprocess exited (is there now some new process with the same pid? you can't know unless you know for sure you didn't reap the previous one). The new APIs I've proposed here (`spawn_private` and `fork_private`) _do_ compose - when a subprocess is created, it can only be reaped by using the unique handle which came from its creation, and not from other random parts of a potentially very large application. My hope is that these APIs let gems (and Ruby itself, e.g. mjit) treat the spawning of subprocesses as an opaque implementation detail. > let's not add on top of it and force every Ruby implementation to have such complexity please. I think other ruby implementations already need this complexity if they want threadsafe implementations of things like `Process.system`. For example, this program will quickly throw an exception under Truffleruby: ```ruby t1 = Thread.new do loop do pid, status = Process.waitpid2 -1 puts "Reaped pid #{pid} status #{status.inspect}" rescue Errno::ECHILD end end t2 = Thread.new do loop do child_success = system "/bin/sh -c 'exit 1'" puts "Child success? #{child_success}" end end t2.join t1.join ``` ``` % ruby bad.rb Reaped pid 8781 status # # terminated with exception (report_on_exception is true): core/errno.rb:48:in `handle': No child processes - No child process: 8781 (Errno::ECHILD) from core/truffle/process_operations.rb:150:in `block in wait' from core/truffle/ffi/pointer.rb:255:in `new' from core/truffle/process_operations.rb:145:in `wait' from core/process.rb:591:in `wait' from core/kernel.rb:593:in `system' from bad.rb:11:in `block (2 levels) in
' from core/kernel.rb:407:in `loop' from bad.rb:10:in `block in
' core/errno.rb:48:in `handle': No child processes - No child process: 8781 (Errno::ECHILD) from core/truffle/process_operations.rb:150:in `block in wait' from core/truffle/ffi/pointer.rb:255:in `new' from core/truffle/process_operations.rb:145:in `wait' from core/process.rb:591:in `wait' from core/kernel.rb:593:in `system' from bad.rb:11:in `block (2 levels) in
' from core/kernel.rb:407:in `loop' from bad.rb:10:in `block in
' ``` It actually works under CRuby, because the direct wait for a specific pid always takes precedence over the wait on -1, and there is no interrupt check between when the child process is spawned and when waitpid is called in the `system` implementation. > KJ: do you need to care about the exit status? I doubt I specifically need it for my use-case (the parent/child process already share a socketpair, and the parent would notice if it closed), but I kind of thought Ruby should offer non-hacky APIs for the use-case of "child processes in gems" in general, so I still wrote up my proposalj. ## Summary: Really, I think there are three ways of looking at this issue: 1. Programs doing `waitpid -1` are bad and wrong, nobody should ever do that, if any code in your program does this anywhere, then Ruby should no longer make any guarantees about subprocess management working correctly in the entire process. 2. Programs doing `waitpid -1` are widely deployed, it would be good if, when writing gems, there were APIs we could use which offer better isolation and composibility than the classic unix APIs, so that our gems work no matter what their containing processes are doing. 3. Gems should never be spawning child processes anyway. My thinking on this issue is camp 2. Like it or not (and really, I don't like it), `waitpid -1` has been part of the unix way of doing preforking worker pools since approximately forever, and it would be good if programs such programs could use gems without carefully checking whether they spawn any subprocesses in their implementation. Perhaps some more data needs to be gathered on just how common `waitpid -1` actually is? If people think this is something that moves the needle on this discussion, I'm happy to do some research on the topic. ---------------------------------------- Feature #19322: Support spawning "private" child processes https://bugs.ruby-lang.org/issues/19322#change-101204 * Author: kjtsanaktsidis (KJ Tsanaktsidis) * Status: Open * Priority: Normal ---------------------------------------- ## Background The traditional Unix process APIs (`fork` etc) are poorly isolated. If a library spawns a child process, this is not transparent to the program using the library. Any signal handler for `SIGCHLD` in the program will be called when the spawned process exits, and even worse, if the parent calls `Process.waitpid2(-1)`, it will consume the returned status code, stealing it from the library! Unfortunately, the practice of responding to `SIGCHLD` by calling `waitpid2(-1)` in a loop is a pretty common unixism. For example, Unicorn does it [here](https://yhbt.net/unicorn.git/tree/lib/unicorn/http_server.rb#n401). In short, there is no reliable way for a gem to spawn a child process in a way that can���t (unintentionally) be interfered with by other parts of the program. ## Problem statement Consider the following program. ```ruby # Imagine this part of the program is in some top-level application event loop # or something - similar to how Unicorn works. It detects child processes exiting # and takes some action (possibly restarting a crashed worker, for example). Signal.trap(:CHLD) do loop do begin pid, status = Process.waitpid2 -1 puts "Signal handler reaped #{pid} #{status.inspect}" rescue Errno::ECHILD puts "Signal handler reaped nothing" break end end end # Imagine that _this_ part of the program is buried deep in some gem. It knows # nothing about the application SIGCHLD handling, and quite possibly the application # author might not even know this gem spawns a child process to do its work! require 'open3' loop do o, status = Open3.capture2("/bin/sh", "-c", "echo 'hello'") puts "ran command, got #{o.chomp} #{status.inspect}" end ``` In current versions of Ruby, _some_ loop iterations will function correctly, and print something like this. The gem gets the `Process::Status` object from its command and can know if e.g. it exited abnormally. ``` ran command, got ohaithar # Signal handler reaped nothing ``` However, other iterations of the loop print this. The signal handler runs and calls `Process.waitpid2(-1)` before the code in open3 can do so. Then, the gem code does not get a `Process::Status` object! This is also potentially bad for the application; it reaped a child process it didn't even know existed, and it might cause some surprising bugs if the application author didn't know this was a possibility. ``` Signal handler reaped 1153596 # Signal handler reaped nothing ran command, got ohaithar nil ``` We would like a family of APIs which allow a gem to spawn a child process and guarantees that the gem can wait on it. Some concurrent call to `Process.waitpid2(-1)` (or even `Process.waitpid2($some_lucky_guess_for_the_pid)`) should not steal the status out from underneath the code which created the process. Ideally, we should even suppress the `SIGCHLD` signal to avoid the application signal handler needlessly waking up. ## Proposed Ruby-level APIs. I propose we create the following new methods in Ruby. * `Process.spawn_private` * `Process.fork_private` These methods behave identically to their non-_private versions in all respect, except instead of returning a pid, they return an object of type `Process::PrivateHandle`. `Process::PrivateHandle` would have the following methods: * `pid()` - returns the pid for the created process * `wait()` - blocks the caller until the created process has exited, and returns a `Process::Status` object. If the handle has _already_ had `#wait` called on it, it returns the same `Process::Status` object as was returned then immediately. This is unlike `Process.waitpid` and friends, which would raise an ECHILD in this case (or, in the face of pid wraparound, potentially wait on some other totally unrelated child process with the same pid). * `wait_nonblock()` - if the created process has exited, behaves like `#wait`; otherwise, it returns a `Process::Status` object for which `#exited?` returns false. * `kill(...)` - if the created process has not been reaped via a call to `#wait`, performs identically to `Process.kill ..., pid`. Otherwise, if the process _has_ been reaped, raises `Errno::ESRCH` immediately without issuing a system call. This ensures that, if pids wrap around, that the wrong process is not signaled by mistake. A call to `Process.wait`, `Process.waitpid`, or `Process.waitpid2` will _never_ return a `Process::Status` for a process started with a `_private` method, even if that call is made with the pid of the child process. The _only_ way to reap a private child process is through `Process::PrivateHandle`. The implementation of `IO.popen`, `Kernel#system`, `Kernel#popen`, backticks, and the `Open3` module would be changed to use this private process mechanism internally, although they do not return pids so they do not need to have their interfaces changed. (note though - I don't believe `Kernel#system` suffers from the same problem as the `open3` example above, because it does not yield the GVL nor check interrupts in between spawning the child and waiting on it) ## Implementation strategy I believe this can be implemented, in broad strokes, with an approach like this: * Keep a global table mapping pids -> handles for processes created with `fork_private` or `spawn_private`. * When a child process is waited on, consult the handle table. If there is a handle registered, and the wait call was made without the handle, do NOT return the reaped status. Instead, save the status against the handle, and repeat the call to `waitpid`. * If the wait call _was_ made with the handle, we can return the * Once a handle has had the child status saved against it, it is removed from the table. * A subsequent call to wait on that pi the handle will look up the saved information and return it without making a system call. In fact, most of the infrastructure to do this correctly is already in place - it was added by @k0kubun and @normalperson four years ago - https://bugs.ruby-lang.org/issues/14867. MJIT had a similar problem to the one described in this issue; it needs to fork a C compiler, but if the application performs a `Process.waitpid2(-1)`, it could wind up reaping the gcc process out from underneath mjit. This code has changed considerably over the course of last year, but my understanding is that mjit still uses this infrastructure to protect its Ruby child-process from becoming visible to Ruby code. In any case, the way waitpid works _currently_, is that... * Ruby actually does all calls to `waitpid` as `WNOHANG` (i.e. nonblocking) internally. * If a call to `waitpid` finds no children, it blocks the thread, representing the state in a structure of type `struct waitpid_state`. * Ruby also keeps a list of all `waitpid_state`'s that are currently being waited for, `vm->waiting_pids` and `vm->waiting_grps`. * These structures are protected with a specific mutex, `vm->waitpid_lock`. * Ruby internally uses the SIGCHLD signal to reap the dead children, and then find a waiting call to `waitpid` (via the two lists) to actually dispatch the reaped status to. * If some caller is waiting for a specific pid, that _always_ takes priority over some other caller that's waiting for a pid-group (e.g. `-1`). mjit's child process is protected, because: * When mjit forks, it uses a method `rb_mjit_fork` to do so. * That calls the actual `fork` implementation _whilst still holding_ `vm->waitpid_lock` * Before yielding the lock, it inserts an entry in `vm->waiting_pids` saying that mjit is waiting for the just-created child. * Since direct waits for pids always take precedence over pid-groups, this ensures that mjit will always reap its own children. I believe this mechanism can be extended and generalised to power the proposed API, and mjit could itself use that rather than having mjit-specific handling in `process.c`. ## POC implementation I sketched out a _very_ rough POC to see if what I said above would be possible, and I think it is: https://github.com/ruby/ruby/commit/6009c564b16862001535f2b561f1a12f6e7e0c57 The following script behaves how I expect with this patch: ```ruby pid, h = Process.spawn_private "/bin/sh", "-c", "sleep 1; exit 69" puts "pid -> #{pid}" puts "h -> #{h}" # should ESRCH. sleep 2 begin Process.waitpid2 -1 rescue => e puts "waitpid err -> #{e}" end wpid, status = h.wait puts "wpid -> #{wpid}" puts "status -> #{status.inspect}" ``` ``` ktsanaktsidis@lima-linux1 ruby % ./tool/runruby.rb -- ./tst1.rb pid -> 1154105 h -> # waitpid err -> No child processes wpid -> 1154105 status -> # ``` The child process can be waited on with the handle, and the call to `waitpid2(-1)` finds nothing. ## Previous idea: OS-specific handles My first version of this proposal involved a similar API, but powering it with platform-specific concepts available on Linux, Windows, and FreeBSD which offer richer control than just pids & the `wait` syscall. In particular, I had believed that we could use the `clone` syscall in Linux to create a child process which: * Could be referred to by a unique file descriptor (a pidfd) which would be guaranteed never to be re-used (unlike a pid), * Would not generate a signal when it exited (i.e. no SIGCHLD). * Could not be waited on by an unspecting to `waitpid` (except if a special flag `__WCLONE` as passed). Unfortunately, when I tried to implement this, I ran into a pretty serious snag. It is possible to create such a process - BUT, when the process exec's, it goes _back_ to "raise-SIGCHLD-on-exit" and "allow-waiting-without-__WCLONE" modes. I guess this functionality in the clone syscall is really designed to power threads in Linux, rather than being a general-purpose "hidden process" API. So, I don't think we should use pidfds in this proposal. ## Motivation My use-case for this is that I���m working on a perf-based profiling tool for Ruby. To get around some Linux capability issues, I want my profiler gem (or CRuby patch, whatever it winds up being!) to fork a privileged helper binary to do some eBPF twiddling. But, if you���re profiling e.g. a Unicorn master process, the result of that binary exiting might be caught by Unicorn itself, rather than my (gem | interpreter feature). In my case, I'm so deep in linux specific stuff that just calling `clone(2)` from my extension is probably fine, but I had enough of a look at this process management stuff I thought it would be worth asking the question if this might be useful to other, more normal, gems. -- https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/