From: "kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core" Date: 2023-03-24T02:04:47+00:00 Subject: [ruby-core:112986] [Ruby master Feature#19541] Proposal: Generate frame unwinding info for YJIT code Issue #19541 has been updated by kjtsanaktsidis (KJ Tsanaktsidis). > For example, if we rely on frame pointer unwinding, it'd be incorrect when the PC is in sections of the prologue/epilogue, but would cover most crashes I do agree that this will work pretty much all of the time yeah. I _want_ to make it work in the prologue/epilogue, but I guess that's more for completeness's sake rather than any real utility, so yeah it may not be worth generating metadata for this. > We could read around the PC to figure out how to unwind from those sections for full robustness later. Oh interesting - I guess if we can rely on YJIT _not_ generating opcodes like `push %rbp; mov %rbp, %rsp` and `stp x29, x30, [sp,#-0x10]!; mov x29, sp` anywhere else _except_ the prologue, then yeah the unwinder (both the in-process one for crash reporting, and the out-of-process one in GDB's python interface) can nose around the PC and work out if it's inside the prologue/epilogue or not. It seems I might be able to spike this out by writing a GDB python unwinder entirely outside the Ruby tree (for aarch64; need to add the frame pointers for x86_64 first before it'd work there). Maybe the way to go is for me to write that, share it around, and once it's mostly working, _then_ port its logic into the Ruby crash reporter as well. This does leave the question open of how to get some kind of sensible name for the yjit frames that isn't just a random address. I suppose if we're going with an approach of "smart unwinders that understand how YJIT lays out code", maybe I can get the unwinder to figure something out based on the CFP pointer. It's in a callee-saved register, and most unwinding schemes generally make it possible to recover these (I think it might be required for C++ exception unwinding to work). Otherwise perhaps we can spill it to the stack as well - I'll play around. ---------------------------------------- Feature #19541: Proposal: Generate frame unwinding info for YJIT code https://bugs.ruby-lang.org/issues/19541#change-102517 * Author: kjtsanaktsidis (KJ Tsanaktsidis) * Status: Assigned * Priority: Normal * Assignee: yjit ---------------------------------------- ## What is being propsed? Currently, when Ruby crashes with yjit generated code on the stack, `rb_print_backtrace()` is unable to actually show any frames underneath the yjit code. For example, if you send SIGSEGV to a Ruby process running yjit, this is what you see: ``` /ruby/miniruby(rb_print_backtrace+0xc) [0xaaaad0276884] /ruby/vm_dump.c:785 /ruby/miniruby(rb_vm_bugreport) /ruby/vm_dump.c:1093 /ruby/miniruby(rb_bug_for_fatal_signal+0xd0) [0xaaaad0075580] /ruby/error.c:813 /ruby/miniruby(sigsegv+0x5c) [0xaaaad01bedac] /ruby/signal.c:919 linux-vdso.so.1(__kernel_rt_sigreturn+0x0) [0xffff91a3e8bc] /ruby/miniruby(map<(usize, yjit::backend::ir::Insn), (usize, yjit::backend::ir::Insn), yjit::backend::ir::{impl#17}::next_mapped::{closure_env#0}>+0x8c) [0xaaaad03b8b00] /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/option.rs:929 /ruby/miniruby(next_mapped+0x3c) [0xaaaad0291dc0] src/backend/ir.rs:1225 /ruby/miniruby(arm64_split+0x114) [0xaaaad0287744] src/backend/arm64/mod.rs:359 /ruby/miniruby(compile_with_regs+0x80) [0xaaaad028bf84] src/backend/arm64/mod.rs:1106 /ruby/miniruby(compile+0xc4) [0xaaaad0291ae0] src/backend/ir.rs:1158 /ruby/miniruby(gen_single_block+0xe44) [0xaaaad02b1f88] src/codegen.rs:854 /ruby/miniruby(gen_block_series_body+0x9c) [0xaaaad03b0250] src/core.rs:1698 /ruby/miniruby(gen_block_series+0x50) [0xaaaad03b0100] src/core.rs:1676 /ruby/miniruby(branch_stub_hit_body+0x80c) [0xaaaad03b1f68] src/core.rs:2021 /ruby/miniruby({closure#0}+0x28) [0xaaaad02eb86c] src/core.rs:1924 /ruby/miniruby(do_call+0x98) [0xaaaad035ba3c] /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:492 [0xaaaad035c9b4] ``` (n.b. - I compiled Ruby with `-fasynchronous-unwind-tables ���rdynamic ���g` in cflags to make sure gcc generates appropriate unwind info & keeps the symbol tables). Likewise, if you attach gdb to a Ruby process with yjit enabled, gdb can't show thread backtraces through yjit-generated code either. My proposal is that YJIT generate sufficient unwinding and debug information on all platforms to allow both `rb_print_backtrace()` and the platform's debugger (gdb/lldb/WinDbg) to show: * Full stack traces all the way back to `main`. That is, it should be possible to see frames _underneath_ `[0xaaaad035c9b4]` from the backtrace above. * Names for the dynamically generated yjit blocks (e.g. instead of `[0xaaaad035c9b4]`, we should see something like `yjit$$name_of_ruby_method`, where `name_of_ruby_method` is the `label` for the iseq this is JIT'd code for). ## Motivation I have a few motivations for wanting this. Firstly, I feel this functionality is independently useful. When Ruby crashes, the more information we can get, the more likely we are to find the root cause. Likewise, the same principle applies to debugging with gdb - you can get a fuller understanding of what the process is doing if you see the whole stack. I have often found attaching gdb to the Ruby interpreter helps in understanding problems in Ruby code or C extensions and is something I do relatively frequently; yjit breaking that will definitely be inconvenient for me! ## Implementation I have a draft implementation here on how I'd implement this: https://github.com/ruby/ruby/pull/7567. It's currently missing tests & platform support (it only works on Linux aarch64). Also, it implements unwind info generation, so unwinding can work _through_ yjit code, but it does not currently emit symbols to give _names_ to those yjit frames. My PR contains a document which explains how the Linux interfaces for registering unwind info for JIT'd code work, so I won't duplicate that information here. The biggest implementation question I had is around the use of Rust crates. Currently, I prototyped my implementation using the gimli & object crates, for generating DWARF info and ELF binaries. However, the yjit build does purposefully does not use cargo & external crates for release builds. There are a few different ways we could go here: * Don't use the gimli & object crates; instead, re-implement all debug info & object file generation code in yjit. * Don't use the crates; instead, link againt C libraries to provide this functionality & call them from Rust (perhaps some combination of libelf, libdw, libbfd, or llvm might do what we need) * Use cargo after all for the release build & download the crates at build-time * Use cargo for the release build, but vendor everything, so the build doesn't need to download anything * Only make unwind info generation available in dev mode where cargo is used, and so mark the gimli/object dependencies as optional in Cargo.toml. We'd need to decide on one of these approaches for this proposal to work. I don't really have a strong sense of the pros/cons of each. (Side note - my PR actually depends on a _fork_ of gimli - I've been discussing adding the needed interfaces upstream here: https://github.com/gimli-rs/gimli/issues/648). ## Benchmarks I ran the yit-bench suite on my branch and compared it to Ruby master: * My branch: https://gist.github.com/KJTsanaktsidis/5741a9f64e5cd75cdf5fedd846091a4f * Ruby master: https://gist.github.com/KJTsanaktsidis/592d3ebcf98f6745dfa3efbd30a25acf This is a (very simple) comparison: ``` -------------- ------------ ------------ --------------- bench yjit (ms) branch (ms) branch/yjit (%) activerecord 97.5 98.5 101.03% hexapdf 2415.3 2458.2 101.78% liquid-c 61.9 63.1 101.94% liquid-render 135.3 135.0 99.78% mail 104.6 105.5 100.86% psych-load 1887.1 1922.0 101.85% railsbench 1544.4 1556.0 100.75% ruby-lsp 88.4 89.5 101.24% sequel 147.5 151.1 102.44% binarytrees 303 305.6 100.86% chunky_png 1075.8 1079.4 100.33% erubi 392.9 392.3 99.85% erubi_rails 14.7 14.7 100.00% etanni 792.3 791.4 99.89% fannkuchredux 3815.9 3813.6 99.94% lee 1030.2 1039.2 100.87% nbody 49.2 49.3 100.20% optcarrot 4142 4143.3 100.03% ruby-json 2860.7 2874.0 100.46% rubykon 7906.6 7904.2 99.97% 30k_ifelse 348.7 345.4 99.05% 30k_methods 828.6 831.8 100.39% cfunc_itself 28.8 28.9 100.35% fib 34.4 34.5 100.29% getivar 115.5 109.7 94.98% keyword_args 37.7 38.0 100.80% respond_to 26 26.1 100.38% setivar 33.8 33.5 99.11% setivar_object 208.7 194.3 93.10% str_concat 52.6 52.2 99.24% throw 23.8 24.1 101.26% -------------- ------------ ------------ --------------- ``` It seems like the performance impact of generating and registering the debug info is marginal. -- https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/