From: "mame (Yusuke Endoh) via ruby-core" Date: 2025-11-27T09:25:49+00:00 Subject: [ruby-core:123922] [Ruby Bug#21715] Miscompilation on x86-64-v2 due to undefined behavior in search_nonascii in string.c Issue #21715 has been updated by mame (Yusuke Endoh). I wonder if the premise that "unaligned word access is feasible on x86" no longer holds in modern contexts? We are of course aware that unaligned word access is undefined behavior in C. However, it is slightly faster, which is why we introduced this optimization specifically for x86. I evaluated the performance on an AMD Ryzen 9 6900HX with gcc version 15.2.0 (Ubuntu 15.2.0-4ubuntu4) using the benchmark below. (I ran each test 10 times and picked the best result.) ```ruby s = ([65] * 10).pack("C*") t = Process.clock_gettime(Process::CLOCK_MONOTONIC) 20000000.times { s.dup.force_encoding("UTF-8").scrub } p Process.clock_gettime(Process::CLOCK_MONOTONIC) - t ``` It appears that `-march=x86-64 -DUNALIGNED_WORD_ACCESS=1` remains the fastest. * `cflags="-march=x86-64 -DUNALIGNED_WORD_ACCESS=1"`: 2.918 s. * `cflags="-march=x86-64 -DUNALIGNED_WORD_ACCESS=1"` with Alan's patch: 2.941 s. * `cflags="-march=x86-64 -DUNALIGNED_WORD_ACCESS=0"`: 3.020 s. * `cflags="-march=x86-64-v2 -DUNALIGNED_WORD_ACCESS=0"`: 3.175 s. * `cflags="-march=x86-64-v3 -DUNALIGNED_WORD_ACCESS=0"`: 3.017 s. * `cflags="-march=x86-64-v4 -DUNALIGNED_WORD_ACCESS=0"`: Illegal instruction It is worth noting that `x86-64-v3` performs extremely well for long strings. On the other hand, `x86-64-v2` is clearly slower than `x86-64`, which is unfortunate. ```ruby s = ([65] * 1000000).pack("C*") t = Process.clock_gettime(Process::CLOCK_MONOTONIC) 200000.times { s.dup.force_encoding("UTF-8").scrub } p Process.clock_gettime(Process::CLOCK_MONOTONIC) - t ``` * `cflags="-march=x86-64 -DUNALIGNED_WORD_ACCESS=1"`: 5.229 s. * `cflags="-march=x86-64 -DUNALIGNED_WORD_ACCESS=1"` with Alan's patch: 5.232 s. * `cflags="-march=x86-64 -DUNALIGNED_WORD_ACCESS=0"`: 5.230 s. * `cflags="-march=x86-64-v2 -DUNALIGNED_WORD_ACCESS=0"`: 6.127 s. * `cflags="-march=x86-64-v3 -DUNALIGNED_WORD_ACCESS=0"`: 2.728 s. * `cflags="-march=x86-64-v4 -DUNALIGNED_WORD_ACCESS=0"`: Illegal instruction However, since most strings handled in Ruby are not that long, it is likely more critical to ensure speed for short strings. Regarding Alan's patch, it only supports `search_nonascii`. Since the optimization under `UNALIGNED_WORD_ACCESS` is applied in other places as well, the patch may be incomplete. Looking at these benchmarks, it seems fair to say the difference is not drastic. If the performance degradation is only around 3.3%, I think it is fine to abandon the optimization and set `UNALIGNED_WORD_ACCESS=0` unconditionally. I would appreciate it if others could verify this on different environments as well. ---------------------------------------- Bug #21715: Miscompilation on x86-64-v2 due to undefined behavior in search_nonascii in string.c https://bugs.ruby-lang.org/issues/21715#change-115323 * Author: mjacob (Manuel Jacob) * Status: Open * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- Building the following Dockerfile fails on a x86-64 machine in the last step (running `make` command): ``` FROM opensuse/leap:16.0 RUN zypper --non-interactive install wget make gcc RUN wget 'https://cache.ruby-lang.org/pub/ruby/3.4/ruby-3.4.7.tar.gz' RUN tar xaf ruby-3.4.7.tar.gz WORKDIR ruby-3.4.7/build RUN ../configure RUN make ``` The failing command (during `make`) is: `./miniruby -I../lib -I. -I.ext/common ../tool/mkconfig.rb -arch=x86_64-linux -version=3.4.7 -install_name=ruby -so_name=ruby -unicode_version=15.0.0 -unicode_emoji_version=15.0 > rbconfig.tmp` Excerpt from the crash report: ``` ../tool/mkconfig.rb: [BUG] Segmentation fault at 0x0000000000000000 ruby 3.4.7 (2025-10-08 revision 7a5688e2a2) +PRISM [x86_64-linux] -- Control frame information ----------------------------------------------- c:0001 p:0000 s:0003 E:000ec0 DUMMY [FINISH] -- Threading information --------------------------------------------------- Total ractor count: 1 Ruby thread count for this ractor: 1 -- Machine register context ------------------------------------------------ RIP: 0x0000556c2da74760 RBP: 0x0000000000000027 RSP: 0x00007ffd24a195f0 RAX: 0x0000000000000028 RBX: 0x0000556c64acc420 RCX: 0x0000000000000000 RDX: 0x0000000000000000 RDI: 0x0000000000000014 RSI: 0x00007f49f7d6c123 R8: 0x46ea57707c6b1df2 R9: 0x00007f49f7d6c123 R10: 0x2afb945fcb545f01 R11: 0x0000556c2dc3fe50 R12: 0x00007f49f7d6c263 R13: 0x00007f49f7d6c11b R14: 0x0000556c64bdaa48 R15: 0x00007f49f7d6c25c EFL: 0x0000000000010256 -- C level backtrace information ------------------------------------------- /ruby-3.4.7/build/miniruby(rb_print_backtrace+0x5) [0x556c2db2c1b6] ../vm_dump.c:823 /ruby-3.4.7/build/miniruby(rb_vm_bugreport) ../vm_dump.c:1155 /ruby-3.4.7/build/miniruby(rb_bug_for_fatal_signal+0xf7) [0x556c2d8cdc47] ../error.c:1130 /ruby-3.4.7/build/miniruby(sigsegv+0x42) [0x556c2da58482] ../signal.c:934 /lib64/libc.so.6(__restore_rt+0x0) [0x7f49f7eb2090] /ruby-3.4.7/build/miniruby(search_nonascii+0xcb) [0x556c2da74760] ../string.c:729 /ruby-3.4.7/build/miniruby(coderange_scan) ../string.c:767 /ruby-3.4.7/build/miniruby(rbimpl_fl_unset_raw_raw+0x0) [0x556c2da76874] ../string.c:895 /ruby-3.4.7/build/miniruby(RB_FL_UNSET_RAW) ../include/ruby/internal/fl_type.h:669 /ruby-3.4.7/build/miniruby(RB_ENC_CODERANGE_SET) ../include/ruby/internal/encoding/coderange.h:131 /ruby-3.4.7/build/miniruby(enc_coderange_scan) ../string.c:911 /ruby-3.4.7/build/miniruby(rb_enc_str_coderange) ../string.c:910 /ruby-3.4.7/build/miniruby(is_ascii_string+0x8) [0x556c2da7697e] ../internal/string.h:151 /ruby-3.4.7/build/miniruby(str_do_hash) ../string.c:393 /ruby-3.4.7/build/miniruby(register_fstring) ../string.c:554 /ruby-3.4.7/build/miniruby(rb_enc_literal_str+0x87) [0x556c2da94bb7] ../string.c:12546 /ruby-3.4.7/build/miniruby(parse_static_literal_string+0x38) [0x556c2d875991] ../prism_compile.c:312 /ruby-3.4.7/build/miniruby(pm_compile_node) ../prism_compile.c:10321 /ruby-3.4.7/build/miniruby(pm_compile_node+0x2e65) [0x556c2d875aa5] ../prism_compile.c:10309 /ruby-3.4.7/build/miniruby(pm_compile_conditional+0x18c) [0x556c2d88cfcc] ../prism_compile.c:1053 /ruby-3.4.7/build/miniruby(pm_compile_node+0x42e1) [0x556c2d876f21] ../prism_compile.c:9355 /ruby-3.4.7/build/miniruby(pm_setup_args_core+0xe4) [0x556c2d884304] ../prism_compile.c:1792 /ruby-3.4.7/build/miniruby(pm_setup_args+0x98) [0x556c2d884e98] ../prism_compile.c:1979 /ruby-3.4.7/build/miniruby(pm_compile_call+0x307) [0x556c2d885cf7] ../prism_compile.c:3673 /ruby-3.4.7/build/miniruby(pm_compile_call_node+0x2c6) [0x556c2d872326] ../prism_compile.c:7403 /ruby-3.4.7/build/miniruby(pm_compile_node+0x39dc) [0x556c2d87661c] ../prism_compile.c:8775 /ruby-3.4.7/build/miniruby(pm_compile_node+0x2e65) [0x556c2d875aa5] ../prism_compile.c:10309 /ruby-3.4.7/build/miniruby(pm_compile_conditional+0x18c) [0x556c2d88cfcc] ../prism_compile.c:1053-march=x86-64-v2 /ruby-3.4.7/build/miniruby(pm_compile_node+0x42e1) [0x556c2d876f21] ../prism_compile.c:9355 /ruby-3.4.7/build/miniruby(pm_compile_node+0x2e3a) [0x556c2d875a7a] ../prism_compile.c:10307 /ruby-3.4.7/build/miniruby(pm_compile_scope_node+0x104a) [0x556c2d88f5da] ../prism_compile.c:6991 /ruby-3.4.7/build/miniruby(pm_compile_node+0x35c9) [0x556c2d876209] ../prism_compile.c:10180 /ruby-3.4.7/build/miniruby(APPEND_LIST+0x0) [0x556c2d891e60] ../prism_compile.c:10481 /ruby-3.4.7/build/miniruby(pm_iseq_compile_node) ../prism_compile.c:10485 /ruby-3.4.7/build/miniruby(pm_iseq_new_with_opt_try+0x10) [0x556c2d94c790] ../iseq.c:1042 /ruby-3.4.7/build/miniruby(rb_protect+0xd6) [0x556c2d8db9c6] ../eval.c:1054 /ruby-3.4.7/build/miniruby(pm_iseq_new_with_opt+0x177) [0x556c2d9525c7] ../iseq.c:1095 /ruby-3.4.7/build/miniruby(pm_iseq_new_main+0x85) [0x556c2d952895] ../iseq.c:943 /ruby-3.4.7/build/miniruby(process_options+0x12fd) [0x556c2da519cd] ../ruby.c:2616 /ruby-3.4.7/build/miniruby(ruby_process_options+0x157) [0x556c2da52657] ../ruby.c:3174 /ruby-3.4.7/build/miniruby(ruby_options+0x97) [0x556c2d8da977] ../eval.c:117 /ruby-3.4.7/build/miniruby(rb_main+0x19) [0x556c2d7eb578] ../prism/prism.c:21769 /ruby-3.4.7/build/miniruby(main) ../main.c:68 /lib64/libc.so.6(__libc_start_call_main+0x82) [0x7f49f7e9b340] /lib64/libc.so.6(__libc_start_main+0x8b) [0x7f49f7e9b409] /ruby-3.4.7/build/miniruby(_start+0x25) [0x556c2d7eb5c5] ../main.c:69 ``` The failing instruction at 0x556c2da74760 is: `movdqa xmm0, XMMWORD PTR [rsi+rcx*1]`. At this place, register `rsi` contains 0x7f49f7d6c123, which is the value 0x7f49f7d6c11b of parameter `p` of the function `search_nonascii` + 8, and register `rcx` contains 0. So, the whole instruction means ���[move aligned packed integer values](https://www.felixcloutier.com/x86/movdqa:vmovdqa32:vmovdqa64) from memory at 0x7f49f7d6c123 to register `xmm0`���. The segmentation fault happened because the address is expected to be aligned on a 16-byte boundary, but it is not. The instruction is part of a loop at https://github.com/ruby/ruby/blob/v3_4_7/string.c#L728 that gets auto-vectorized by GCC. On x86-64, * `UNALIGNED_WORD_ACCESS` is `1` * `p` doesn���t get aligned to anything because of `#if !UNALIGNED_WORD_ACCESS` in line 700 * `aligned_ptr(value)` is expanded to `(uintptr_t *)(value)` according to line 723 * `p` is therefore casted to type `uintptr_t *` in line 725 * `uintptr_t` is typedefed to `unsigned long int`, which has alignment of 8 bytes In result, a pointer `p` to potentially unaligned memory is casted to a pointer to a type with alignment of 8 bytes. That is undefined behavior according to C99 6.3.2.3p7: ���A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type. If the resulting pointer is not correctly aligned for the pointed-to type, the behavior is undefined.���. Compilers can utilize this rule to make the assumption that the pointed-to memory has alignment of 8 bytes. In this case, the GCC loop auto-vectorizer adds code to align the assumedly 8 bytes aligned address to 16 bytes alignment. A subsequent instruction assuming 16 bytes alignment can therefore fail. I could reproduce this crash only on openSUSE Leap 16.0, but not openSUSE Leap 15.6, openSUSE Tumbleweed or Arch Linux, because only the former configured GCC to default to emitting code requiring x86-64-v2. When passing `-march=x86-64-v2` in CFLAGS, the crash happens on all these distributions. -- https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/