From: Eric Wong Date: 2018-10-29T01:20:21+00:00 Subject: [ruby-core:89611] Re: [Ruby trunk Bug#14867] Process.wait can wait for MJIT compiler process takashikkbn@gmail.com wrote: > Issue #14867 has been updated by k0kubun (Takashi Kokubun). > > > I think we'll have to support non-blocking/event-based waitpid in Windows for auto-fiber/Thread::Light, anyways. So making MJIT worker thread be a Ruby Thread (or Thread::Light) would allow it to hook into process.c APIs. I don't think the existing code is too far off from being able to support Windows... > I see. Dropping MJIT worker by non-blocking/event-based > waitpid would be an option once it's supported for > auto-fiber/Thread::Light. At this moment, I'm not sure which > is larger, synchronization cost between threads or translating > ISeq to C code synchronously. So in a very short term, I wanna > fix the 1 (waitpid) case with postponed_job unless it has a > critical blocker to fix the issue. I suppose MJIT has priority over Thread::Light at the moment since MJIT is already in trunk and people know about it. I can take some time to investigate moving MJIT over to event-based waitpid in a platform-independent way. The other thing is MJIT multi-threading/synchronization seems tricky-to-debug right now, and making it event-based should be beneficial for reliability. > Note that the current synchronization design is already > achieving the better benchmark result than one by copying > inline caches on enqueueing an ISeq. To simplify the situation > by stop using postponed_job, I experimented to have another > global variable for the synchronization job which is to be > checked and invoked by `RUBY_VM_CHECK_INTS`, but the benchmark > result was not good. So for now using postponed_job is the > only possible approach that achieves the current performance. OK, thank you for that information. So checking one extra global variables with RUBY_VM_CHECK_INTS introduces a measurable perf hit? > > rb_postponed_job_register only sets a flag, but doesn't wake up sleeping the thread in 1. or 2. by calling ubf.func (via rb_threadptr_interrupt). > > > > This is a tricky situation... > > > > Calling ubf.func is NOT async-signal-safe, so rb_postponed_job_register may not use it by default, either. > I'm not still understanding why this could be related to the > deadlock 1 (waitpid on #system, #`, or Process.waitpid). I > assume that the sleeping threads "1." and "2." should be waken > up by something prepared in threads "1." and "2.", not by MJIT > worker thread "3.". So the fact that rb_postponed_job_register > (called by MJIT worker thread "3.") may not wakeup the thread > "1." or "2." should not be the cause of this deadlock, in my > understanding. Ah, so the waitpid from #system is on /bin/rm (I missed that earlier) Is `rm' stuck (NFS, drive error, etc)? But yes, I also wonder if there's a bug in the current system+waitpid implementation. Perhaps #system needs to use similar mechanism as mjit_worker to prevent work-stealing. Regardless, ubf.func need to be called after registering postponed job, so there was still a problem, there. > I actually don't understand how threads "1." and "2." are > usually waken up. For waitpid thread "2.", I'm guessing that a > signal handler (of SIGCHLD?) registered by the thread "2." > itself somehow *replaces* native_ppoll_sleep and finishes it, > but not sure. System-level signal handler is only registered at startup, and Ruby Signal.trap handler is process-wide and only runs in main thread. native_ppoll_sleep can be called by any thread (which exclusively acquires sigwait_fd). > > Also ruby_current_execution_context_ptr variable is unstable between setting ec->interrupt_flag (via RUBY_VM_SET_POSTPONED_JOB_INTERRUPT) and ubf.func calls since we make them without GVL > > > > This is a similar situation to [Bug #14939] r64062 > I recognize this issue. For this issue, @ko1 suggested to > repeatedly calling `rb_postponed_job_register_one` (that > registers the same job at most only once) to survive the ec > switches. I can fix this issue, but I'm NOT thinking that this > could be the solution for this deadlock if unblocking thread > "3." waiting for postponed job may not wake up threads "1." > and "2." as I understand. Right. > I *assume* that threads "1." and "2." are not waken up because > something that should wake up "1." and "2." is not working > somehow with MJIT enabled. At first @ko1 guessed that > `rb_postponed_job_register` swaps `ec->interrupt_flag` in a > bad way and a flag for the signal handler for waitpid is > skipped to be set, but actually we're always using `ATOMIC_OR` > for modifying `ec->interrupt_flag`, and so it may not be the > cause either. Agreed, and the self-pipe/eventfd would've woken native_ppoll_sleep, anyways. (But I need to fix a car problem, first :<) Unsubscribe: