From: "byroot (Jean Boussier) via ruby-core" Date: 2025-04-08T07:50:20+00:00 Subject: [ruby-core:121579] [Ruby Bug#21220] Memory corruption in update_line_coverage() [write at index -1] Issue #21220 has been updated by byroot (Jean Boussier). What I've figured for now it that the corruption is triggered by: ```ruby # This require line makes sure the original script file is processed by simplecov require File.expand_path($0, '.') ``` ``` frame #4: 0x00000001003c468c ruby`rb_bug(fmt=) at error.c:1117:5 frame #5: 0x00000001003c57dc ruby`update_line_coverage.cold.1 at thread.c:5666:17 frame #6: 0x000000010021ae64 ruby`update_line_coverage(data=, trace_arg=) at thread.c:5666:17 frame #7: 0x0000000100277fe0 ruby`exec_hooks_unprotected [inlined] exec_hooks_body(ec=0x0000000141f05bf0, list=0x0000000141f05970, trace_arg=0x000000016fdfc4c8) at vm_trace.c:352:17 frame #8: 0x0000000100277fc0 ruby`exec_hooks_unprotected(ec=0x0000000141f05bf0, list=0x0000000141f05970, trace_arg=0x000000016fdfc4c8) at vm_trace.c:381:5 frame #9: 0x0000000100277f30 ruby`rb_exec_event_hooks(trace_arg=, hooks=0x0000000141f05970, pop_p=0) at vm_trace.c:427:13 frame #10: 0x000000010026c01c ruby`vm_trace_hook [inlined] rb_exec_event_hook_orig(ec=0x0000000141f05bf0, hooks=0x0000000141f05970, flag=65536, self=4302425520, id=0, called_id=0, klass=0, data=36, pop_p=0) at vm_core.h:2179:5 frame #11: 0x000000010026bfe8 ruby`vm_trace_hook(ec=0x0000000141f05bf0, reg_cfp=0x0000000148127e78, pc=, pc_events=65537, target_event=65536, global_hooks=0x0000000141f05970, local_hooks_ptr=0x000000011b4d4ca8, val=36) at vm_insnhelper.c:7062:9 frame #12: 0x000000010026b9e4 ruby`vm_trace(ec=0x0000000141f05bf0, reg_cfp=0x0000000148127e78) at vm_insnhelper.c:7170:13 frame #13: 0x000000010024b128 ruby`vm_exec_core(ec=) at vm.inc:4972:5 frame #14: 0x0000000100249860 ruby`rb_vm_exec(ec=0x0000000141f05bf0) at vm.c:2597:22 frame #15: 0x000000010025e978 ruby`rb_iseq_eval(iseq=) at vm.c:2852:11 [artificial] frame #16: 0x0000000100123298 ruby`load_iseq_eval(ec=0x0000000141f05bf0, fname=4753022720) at load.c:789:5 frame #17: 0x0000000100121350 ruby`require_internal(ec=0x0000000141f05bf0, fname=4757156480, exception=1, warn=) at load.c:1297:21 frame #18: 0x0000000100120608 ruby`rb_require_string_internal(fname=4757156480, resurrect=false) at load.c:1403:22 frame #19: 0x00000001001204d8 ruby`rb_f_require [inlined] rb_require_string(fname=4757156480) at load.c:1389:12 frame #20: 0x00000001001204b8 ruby`rb_f_require(obj=, fname=) at load.c:1029:12 frame #21: 0x0000000100268894 ruby`vm_call_cfunc_with_frame_(ec=0x0000000141f05bf0, reg_cfp=0x0000000148127ee8, calling=, argc=1, argv=0x00000001480280c0, stack_bottom=0x00000001480280b8) at vm_insnhelper.c:3794:11 frame #22: 0x0000000100263af8 ruby`vm_call_alias(ec=, cfp=, calling=0x000000016fdfd140) at vm_insnhelper.c:4181:12 frame #23: 0x000000010024cf4c ruby`vm_exec_core [inlined] vm_sendish(ec=0x0000000141f05bf0, reg_cfp=0x0000000148127ee8, cd=0x00006000022b3110, block_handler=0, method_explorer=mexp_search_method) at vm_insnhelper.c:5964:15 frame #24: 0x000000010024ce48 ruby`vm_exec_core(ec=) at insns.def:898:11 frame #25: 0x0000000100249860 ruby`rb_vm_exec(ec=0x0000000141f05bf0) at vm.c:2597:22 frame #26: 0x000000010025e978 ruby`rb_iseq_eval(iseq=) at vm.c:2852:11 [artificial] frame #27: 0x0000000100123298 ruby`load_iseq_eval(ec=0x0000000141f05bf0, fname=4755544040) at load.c:789:5 frame #28: 0x0000000100121350 ruby`require_internal(ec=0x0000000141f05bf0, fname=4726729080, exception=1, warn=) at load.c:1297:21 ``` But I wasn't able to reduce it yet. ---------------------------------------- Bug #21220: Memory corruption in update_line_coverage() [write at index -1] https://bugs.ruby-lang.org/issues/21220#change-112635 * Author: mbcodeandsound (Mike Bourgeous) * Status: Open * ruby -v: ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [x86_64-linux] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- Hello! I have encountered repeatable memory corruption in Ruby 3.4.2 on Ubuntu 24.04.2 LTS, which I believe is happening in update_line_coverage(). I could not reproduce this on Ruby 3.x or earlier. My findings follow. I also have detailed step-by-step notes at https://github.com/mike-bourgeous/mb-sound/issues/36 ### Summary `update_line_coverage()` calls `rb_sourceline()`, subtracts one from its return value, and uses this as an index into an Array. Sometimes `rb_sourceline()` returns 0, and when this happens, `update_line_coverage()` will write to index -1 of the array. This corrupts the heap before the Array, resulting in a program crash later during GC. As I am new to the Ruby codebase I do not know if it's normal for rb_sourceline() to return 0 and update_line_coverage() should handle it, or if something is wrong in the code that ultimately feeds rb_sourceline(). ### Symptom On Linux, affected processes print one of the following errors and exit: ``` munmap_chunk(): invalid pointer Aborted (core dumped) ``` or, if preloading libc_malloc_debug.so ``` malloc_check_get_size: memory corruption Aborted (core dumped) ``` ### Reproduction I have a reduced GitHub project that can reproduce the bug consistently both on my machine and in CI. When I try to reduce the size of this repo further, the bug stops happening. The issue only reproduces locally if the `coverage/` directory has a large `.resultset.json`. - **Repo:** https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug - **Example of the bug:** https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug/actions/runs/14289657889/job/40049195631#step:5:176 ``` shell # Repeatedly running the process increases the likelihood of crashing # as the SimpleCov result file grows. for f in `seq 1 100`; do echo $f; ruby -r./spec/simplecov_helper.rb bin/midi_roll.rb -c 40 -r 2 spec/test_data/all_notes.mid > /dev/null || break ; done ``` ### Research and reasoning I initially found the crash during a live stream when I was upgrading a project from Ruby 2.7 to Ruby 3.4. The crash occurred when an RSpec test tried to spawn another Ruby process, while using SimpleCov to measure code coverage in both. I discovered a workaround of disabling SimpleCov in the nested process when running tests on Ruby 3.4. I used a somewhat unusual approach to get coverage metrics for subprocesses. After the stream I wanted to understand what was really happening and see if I could find a way to re-enable test code coverage for subprocesses. I used a combination of Valgrind, GDB, and trial and error to narrow down the site of the crash and the original corruption. I wrote [a GDB script to automate information gathering](https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug/blob/master/gdb_ruby_backtrace.gdb) when the GC crash occurred, and Valgrind+vgdb to identify the original write that appeared to cause the corruption. I reviewed the Git history of update_line_coverage(), rb_sourceline() (and the functions it calls), and a few other functions, but did not find any obvious changes between Ruby 3.3.x and Ruby 3.4.x, so the root cause is somewhere beyond my familiarity with the codebase. Full details of my process are in my issue notes: https://github.com/mike-bourgeous/mb-sound/issues/36 ---Files-------------------------------- corruption_c_stack.txt (2.63 KB) corruption_ruby_stack.txt (948 Bytes) crash_ruby_stack.txt (4.46 KB) crash_c_stack.txt (26.2 KB) -- https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/