From: merch-redmine@... Date: 2019-05-25T01:06:33+00:00 Subject: [ruby-core:92836] [Ruby trunk Bug#9606] Ocassional SIGSEGV inTestException#test_machine_stackoverflow on OpenBSD Issue #9606 has been updated by jeremyevans0 (Jeremy Evans). Backport deleted (1.9.3: UNKNOWN, 2.0.0: UNKNOWN, 2.1: UNKNOWN) I haven't seen `TestException#test_machine_stackoverflow` SIGSEGV in a long time on OpenBSD. I'm guessing the numerous improvements in the last 5 years make this is no longer an issue. Is anyone else seeing `TestException#test_machine_stackoverflow` SIGSEGV in their environment with the master branch or ruby 2.6? If nobody responds confirming this issue is still present, I'll close this in a few weeks. ---------------------------------------- Bug #9606: Ocassional SIGSEGV inTestException#test_machine_stackoverflow on OpenBSD https://bugs.ruby-lang.org/issues/9606#change-78215 * Author: jeremyevans0 (Jeremy Evans) * Status: Feedback * Priority: Normal * Assignee: * Target version: * ruby -v: ruby 2.1.1p76 (2014-02-24 revision 45161) [x86_64-openbsd] * Backport: ---------------------------------------- ruby 2.1.1 on OpenBSD seems to occassionally suffer from a stack overflow when running TestException#test_machine_stackoverflow (about 1 every 3-4 times): $ make test-all TESTOPTS="-q test/ruby/test_exception.rb" Reading specs from /usr/lib/gcc-lib/amd64-unknown-openbsd5.5/4.2.1/specs Target: amd64-unknown-openbsd5.5 Configured with: OpenBSD/amd64 system compiler Thread model: posix gcc version 4.2.1 20070719 CC = cc LD = ld LDSHARED = cc -shared CFLAGS = -O0 -g -fPIC XCFLAGS = -D_FORTIFY_SOURCE=2 -fstack-protector -fno-strict-overflow -fvisibility=hidden -DRUBY_EXPORT CPPFLAGS = -DOPENSSL_NO_STATIC_ENGINE -I/usr/local/include -I. -I.ext/include/x86_64-openbsd -I./include -I. DLDFLAGS = -L/usr/local/lib -fstack-protector SOLIBS = -pthread -lgmp -lm ./miniruby -I./lib -I. -I.ext/common ./tool/runruby.rb --extout=.ext -- --disable-gems "./test/runner.rb" --ruby="./miniruby -I./lib -I. -I.ext/common ./tool/runruby.rb --extout=.ext -- --disable-gems" -q test/ruby/test_exception.rb Run options: "--ruby=./miniruby -I./lib -I. -I.ext/common ./tool/runruby.rb --extout=.ext -- --disable-gems" -q # Running tests: .........................F........... Finished tests in 2.089776s, 17.7053 tests/s, 88.0477 assertions/s. 1) Failure: TestException#test_machine_stackoverflow [/usr/obj/ports/ruby-2.1.1/ruby-2.1.1/test/ruby/test_exception.rb:482]: -:7: [BUG] Segmentation fault at 0x007f7fff7fbfe8 ruby 2.1.1p76 (2014-02-24 revision 45161) [x86_64-openbsd] Looking at the core file in gdb: (gdb) bt #0 0x00001bb73a57a19a in kill () at :2 #1 0x00001bb73a5da52a in abort () at /usr/src/lib/libc/stdlib/abort.c:70 #2 0x00001bb741a3ca04 in rb_bug (fmt=Could not find the frame base for "rb_bug". ) at error.c:341 #3 0x00001bb741b19178 in sigsegv (sig=Could not find the frame base for "sigsegv". ) at signal.c:704 #4 Here is the interesting part, the key passed to st_lookup should be exactly the same key as the one passed to rb_hash_aref, but the SIGSEGV happens when st_lookup tries to access it: #5 0x00001bb741b238fe in st_lookup (table=0x1bb73e7f7480, key=Cannot access memory at address 0x7f7fff7fbfe8 ) at st.c:410 #6 0x00001bb741a65353 in rb_hash_aref (hash=30473858635240, key=3864588) at hash.c:701 #7 0x00001bb741b94df8 in vm_exec_core (th=0x1bb73f782000, initial=0) at insns.def:1857 #8 0x00001bb741ba1f5e in vm_exec (th=0x1bb73f782000) at vm.c:1304 #9 0x00001bb741ba09a7 in invoke_block_from_c (th=0x1bb73f782000, block=0x1bb73aa32280, self=30473875219360, argc=0, argv=0x1bb7480167b0, blockptr=0x0, cref=0x0, defined_class=8) at vm.c:732 #10 0x00001bb741ba0be6 in vm_invoke_proc (th=0x1bb73f782000, proc=0x1bb73aa32280, self=30473875219360, defined_class=8, argc=0, argv=0x1bb7480167b0, blockptr=0x0) at vm.c:788 #11 0x00001bb741ba0c85 in rb_vm_invoke_proc (th=0x1bb73f782000, proc=0x1bb73aa32280, argc=0, argv=0x1bb7480167b0, blockptr=0x0) at vm.c:807 #12 0x00001bb741a48ce7 in proc_call (argc=0, argv=0x1bb7480167b0, procval=30473858635280) at proc.c:734 #13 0x00001bb741b8bd0c in call_cfunc_m1 (func=0x1bb741a48c45 , recv=30473858635280, argc=0, argv=0x1bb7480167b0) at vm_insnhelper.c:1298 #14 0x00001bb741b8c8f5 in vm_call_cfunc_with_frame (th=0x1bb73f782000, reg_cfp=0x1bb7480bf230, ci=0x1bb744054470) at vm_insnhelper.c:1470 #15 0x00001bb741b8ca6b in vm_call_cfunc (th=0x1bb73f782000, reg_cfp=0x1bb7480bf230, ci=0x1bb744054470) at vm_insnhelper.c:1560 #16 0x00001bb741b917a9 in vm_exec_core (th=0x1bb73f782000, initial=0) at insns.def:1028 #17 0x00001bb741ba1f5e in vm_exec (th=0x1bb73f782000) at vm.c:1304 Let's look at the st_lookup frame: (gdb) up 5 #5 0x00001bb741b238fe in st_lookup (table=0x1bb73e7f7480, key=Cannot access memory at address 0x7f7fff7fbfe8 ) at st.c:410 410 { Current language: auto; currently c (gdb) print &table $1 = (st_table **) 0x7f7fff7fc008 (gdb) print &key $2 = (st_data_t *) 0x7f7fff7fbfe8 (gdb) print *(&table - 1) $3 = (st_table *) 0x0 (gdb) print *(&table - 2) Cannot access memory at address 0x7f7fff7fbff8 (gdb) print *(&table - 3) Cannot access memory at address 0x7f7fff7fbff0 (gdb) print *(&table - 4) Cannot access memory at address 0x7f7fff7fbfe8 What is happening here is that when the stack overflows, the location of key in memory is not accessible. The top of the stack is at 0x7f7fff7fc000, and anything below that (the stack grows downward) is not accessible. Let's look at the registers, mostly interested in the stack pointer (rsp): (gdb) info reg rax 0x1bb73f84d5e8 30473858635240 rbx 0x1bb7438d1200 30473926283776 rcx 0x1bb74081e3a0 30473875219360 rdx 0x7f7fff7fc058 140187724136536 rsi 0x3af80c 3864588 rdi 0x1bb73e7f7480 30473841505408 rbp 0x7f7fff7fc020 0x7f7fff7fc020 rsp 0x7f7fff7fbfe0 0x7f7fff7fbfe0 r8 0x8 8 r9 0x1bb742d55431 30473914242097 r10 0x1bb742d55431 30473914242097 r11 0x1bb741ba1f0e 30473895681806 r12 0x1bb73f782000 30473857802240 r13 0x11 17 r14 0x1bb746a7bd50 30473978363216 r15 0x1bb7480bf190 30474001707408 rip 0x1bb741b238fe 0x1bb741b238fe eflags 0x10202 66050 cs 0x2b 43 ss 0x23 35 ds 0x23 35 es 0x23 35 fs 0x23 35 gs 0x23 35 Lets go to the top frame and look at the stack pointer: (gdb) up 16100 #16100 0x00001bb5386010df in main (argc=17, argv=0x7f7fffffa790) at main.c:36 36 return ruby_run_node(ruby_options(argc, argv)); (gdb) info reg rax 0x1bb73f84d5e8 30473858635240 rbx 0x7f7fffffa820 140187732518944 rcx 0x1bb74081e3a0 30473875219360 rdx 0x7f7fff7fc058 140187724136536 rsi 0x3af80c 3864588 rdi 0x1bb73e7f7480 30473841505408 rbp 0x7f7fffffa750 0x7f7fffffa750 rsp 0x7f7fffffa730 0x7f7fffffa730 r8 0x8 8 r9 0x1bb742d55431 30473914242097 r10 0x1bb742d55431 30473914242097 r11 0x1bb741ba1f0e 30473895681806 r12 0x7f7fffffa790 140187732518800 r13 0x11 17 r14 0x0 0 r15 0x0 0 rip 0x1bb5386010df 0x1bb5386010df eflags 0x10202 66050 cs 0x2b 43 ss 0x23 35 ds 0x23 35 es 0x23 35 fs 0x23 35 gs 0x23 35 The difference between the two is: (gdb) print 0x7f7fffffa730 - 0x7f7fff7fbfe0 $4 = 8382288 That's pretty close to 8MB (8388608). Sure enough, that's what the stack limit for the user is set to: $ ulimit -a time(cpu-seconds) unlimited file(blocks) unlimited coredump(blocks) unlimited data(kbytes) 3145728 stack(kbytes) 8192 lockedmem(kbytes) 1267356 memory(kbytes) 3800076 nofiles(descriptors) 1024 processes 1024 So the operating system is operating appropriately, only allocating about 8MB of stack. The above example is from OpenBSD/amd64, similar errors occur on OpenBSD/i386. It appears that ruby's stack overflow handling is not working correctly in this case. Any pointers for how to fix this issue? -- https://bugs.ruby-lang.org/ Unsubscribe: