From: mk Date: 2022-12-02T09:54:28+00:00 Subject: [ruby-core:111148] [Ruby master Bug#19156] ObjectSpace.dump_all segfault during string inspection Issue #19156 has been updated by mk (Matthias K�ppler). @byroot I wonder if you could help me understand the underlying issue better. I found a minimal, executable test case that reproduces this issue reliably: ```ruby Prometheus::Client::MmapedValue.new(:counter, :counter, 'ordered_counter', { label_1: 'x' * 1024**2 }) # This will crash ObjectSpace.each_object(String, &:valid_encoding?) ``` I am trying to pour this into an automated test to make sure we won't regress on this again. This test does not crash if the metric label here is relatively short, a few characters perhaps. It _does_ crash once the label string grows above a certain size, however. Here is what I don't understand yet: 1. `prometheus-client-mmap` calls into `new_str0`, which when the string is large enough, will be malloc'ed by MRI, correct? I had to make the string _much_ larger than `sizeof(RVALUE)` for the crash to occur. Aren't strings malloc'ed as soon as they do not fit into a heap slot anymore? 1. The library proceeds to overwrite the internal pointer pointing to the malloc'ed memory region and let's it point to the mapped file memory instead. But how come MRI does not see this when accessing this string memory through a function like `valid_encoding?` Shouldn't that result in traversing the same pointer, now pointing to the string data in the memory map? From MRIs perspective, why does it matter where the actual string data resides? I also don't think it's because of the `MmapValue` object being GC'ed; I ran this example through `GC.stress` and even nulled out the parent object creating this string, but it never crashed in response to that. In fact, I can completely disable GC in this test case, and it will still crash, which leads me to think this is not related to the mmap memory being freed before the Ruby string is or vice versa? ---------------------------------------- Bug #19156: ObjectSpace.dump_all segfault during string inspection https://bugs.ruby-lang.org/issues/19156#change-100432 * Author: mk (Matthias K�ppler) * Status: Third Party's Issue * Priority: Normal * ruby -v: ruby 3.0.4p208 (2022-04-12 revision 3fa771dded) [x86_64-linux] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN ---------------------------------------- I am working on a feature that would allow our application to capture heap dumps during shutdown for later inspection. These heap dumps are captured via `ObjectSpace.dump_all(output: io)`. While walking the object space, MRI occasionally segfaults while inspecting string objects in `search_nonascii` of `string.c`: ``` /usr/local/lib/ruby/3.0.0/objspace.rb:87: [BUG] Segmentation fault at 0x00007efee4201000 ruby 3.0.4p208 (2022-04-12 revision 3fa771dded) [x86_64-linux] ... -- Control frame information ----------------------------------------------- c:0053 p:---- s:0312 e:000311 CFUNC :_dump_all c:0052 p:0130 s:0305 e:000304 METHOD /usr/local/lib/ruby/3.0.0/objspace.rb:87 c:0051 p:0023 s:0295 e:000294 METHOD /home/git/gitlab/lib/gitlab/memory/reports/heap_dump.rb:26 ... -- C level backtrace information ------------------------------------------- /usr/local/lib/libruby.so.3.0(rb_print_backtrace+0x11) [0x7efee4ad0c5e] vm_dump.c:758 /usr/local/lib/libruby.so.3.0(rb_vm_bugreport) vm_dump.c:998 /usr/local/lib/libruby.so.3.0(rb_bug_for_fatal_signal+0xf8) [0x7efee48d0b08] error.c:787 /usr/local/lib/libruby.so.3.0(sigsegv+0x55) [0x7efee4a23db5] signal.c:963 /lib/x86_64-linux-gnu/libpthread.so.0(__restore_rt+0x0) [0x7efee4f12140] ../sysdeps/pthread/funlockfile.c:28 /usr/local/lib/libruby.so.3.0(search_nonascii+0x30) [0x7efee4a3ca60] string.c:552 /usr/local/lib/libruby.so.3.0(coderange_scan) string.c:585 /usr/local/lib/libruby.so.3.0(enc_coderange_scan+0x1b) [0x7efee4a3e28a] string.c:709 /usr/local/lib/libruby.so.3.0(rb_enc_str_coderange) string.c:727 /usr/local/lib/ruby/3.0.0/x86_64-linux/objspace.so(is_broken_string+0x8) [0x7efeced9c304] ../../internal/string.h:116 /usr/local/lib/ruby/3.0.0/x86_64-linux/objspace.so(dump_object) objspace_dump.c:388 /usr/local/lib/ruby/3.0.0/x86_64-linux/objspace.so(heap_i+0x39) [0x7efeced9caa9] objspace_dump.c:521 /usr/local/lib/libruby.so.3.0(objspace_each_objects_without_setup+0xaf) [0x7efee48e878f] gc.c:3232 /usr/local/lib/libruby.so.3.0(objspace_each_objects_protected+0x14) [0x7efee48e87c4] gc.c:3242 /usr/local/lib/libruby.so.3.0(rb_ensure+0x12a) [0x7efee48d96aa] eval.c:1162 /usr/local/lib/libruby.so.3.0(objspace_each_objects+0x28) [0x7efee48fb458] gc.c:3310 /usr/local/lib/libruby.so.3.0(rb_objspace_each_objects) gc.c:3298 /usr/local/lib/ruby/3.0.0/x86_64-linux/objspace.so(objspace_dump_all+0x88) [0x7efeced9b068] objspace_dump.c:616 ... ``` Unfortunately I couldn't get my hands on that memory region to see which strings are causing this since this doesn't always happen. I suspect this is also a problem with MRI master since the code looks unchanged from 3.0.4. -- https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/