From: "alanwu (Alan Wu) via ruby-core" Date: 2025-11-28T01:27:24+00:00 Subject: [ruby-core:123927] [Ruby Bug#21715] Miscompilation on x86-64-v2 due to undefined behavior in search_nonascii in string.c Issue #21715 has been updated by alanwu (Alan Wu). I repeated Mame's experience on a Xeon Platinum 8124M and gcc version 13.3.0 (Ubuntu 13.3.0-6ubuntu2~24.04). The chip is from 2017, and runs x86-64-v4. I'm using slightly different scripts since I'm running with frequency scaling disabled. Also, I'm using hyperfine to get some basic stats on the results. ``` # short-str.rb s = ([65] * 10).pack("C*") 4000000.times { s.dup.force_encoding("UTF-8").scrub } ```

$ hyperfine -L ruby x86-64-uwa-1,x86-64-uwa-1-sans-ub,x86-64-uwa-0,x86-64-v2-uwa-0,x86-64-v3-uwa-0,x86-64-v4-uwa-0 '~/.rubies/{ruby}/bin/ruby --disable-all short-str.rb'
Benchmark 1: ~/.rubies/x86-64-uwa-1/bin/ruby --disable-all short-str.rb
  Time (mean �� ��):      1.165 s ��  0.001 s    [User: 1.157 s, System: 0.007 s]
  Range (min ��� max):    1.164 s ���  1.166 s    10 runs
 
Benchmark 2: ~/.rubies/x86-64-uwa-1-sans-ub/bin/ruby --disable-all short-str.rb
  Time (mean �� ��):      1.179 s ��  0.001 s    [User: 1.172 s, System: 0.007 s]
  Range (min ��� max):    1.177 s ���  1.181 s    10 runs
 
Benchmark 3: ~/.rubies/x86-64-uwa-0/bin/ruby --disable-all short-str.rb
  Time (mean �� ��):      1.142 s ��  0.001 s    [User: 1.135 s, System: 0.007 s]
  Range (min ��� max):    1.141 s ���  1.144 s    10 runs
 
Benchmark 4: ~/.rubies/x86-64-v2-uwa-0/bin/ruby --disable-all short-str.rb
  Time (mean �� ��):      1.165 s ��  0.001 s    [User: 1.157 s, System: 0.007 s]
  Range (min ��� max):    1.162 s ���  1.167 s    10 runs
 
Benchmark 5: ~/.rubies/x86-64-v3-uwa-0/bin/ruby --disable-all short-str.rb
  Time (mean �� ��):      1.150 s ��  0.001 s    [User: 1.140 s, System: 0.009 s]
  Range (min ��� max):    1.148 s ���  1.153 s    10 runs
 
Benchmark 6: ~/.rubies/x86-64-v4-uwa-0/bin/ruby --disable-all short-str.rb
  Time (mean �� ��):      1.181 s ��  0.001 s    [User: 1.172 s, System: 0.008 s]
  Range (min ��� max):    1.179 s ���  1.184 s    10 runs

``` Summary ~/.rubies/x86-64-uwa-0/bin/ruby --disable-all short-str.rb ran 1.01 �� 0.00 times faster than ~/.rubies/x86-64-v3-uwa-0/bin/ruby --disable-all short-str.rb 1.02 �� 0.00 times faster than ~/.rubies/x86-64-v2-uwa-0/bin/ruby --disable-all short-str.rb 1.02 �� 0.00 times faster than ~/.rubies/x86-64-uwa-1/bin/ruby --disable-all short-str.rb 1.03 �� 0.00 times faster than ~/.rubies/x86-64-uwa-1-sans-ub/bin/ruby --disable-all short-str.rb 1.03 �� 0.00 times faster than ~/.rubies/x86-64-v4-uwa-0/bin/ruby --disable-all short-str.rb ``` I'm seeing the same 3% difference, but `cflags="-march=x86-64 -DUNALIGNED_WORD_ACCESS=0"` wins. Side note, it's pretty tricky to measure the speed on short inputs. The loop overhead seems too large compared to the string operations. ```ruby # long-str.rb s = ([65] * 100000).pack("C*") 200000.times { s.dup.force_encoding("UTF-8").scrub } ```

$ hyperfine -L ruby x86-64-uwa-1,x86-64-uwa-1-sans-ub,x86-64-uwa-0,x86-64-v2-uwa-0,x86-64-v3-uwa-0,x86-64-v4-uwa-0 '~/.rubies/{ruby}/bin/ruby --disable-all long-str.rb' --warmup 3
Benchmark 1: ~/.rubies/x86-64-uwa-1/bin/ruby --disable-all long-str.rb
  Time (mean �� ��):      1.531 s ��  0.002 s    [User: 1.527 s, System: 0.004 s]
  Range (min ��� max):    1.529 s ���  1.534 s    10 runs
 
Benchmark 2: ~/.rubies/x86-64-uwa-1-sans-ub/bin/ruby --disable-all long-str.rb
  Time (mean �� ��):     830.5 ms ��   1.0 ms    [User: 826.5 ms, System: 3.7 ms]
  Range (min ��� max):   829.1 ms ��� 831.9 ms    10 runs
 
Benchmark 3: ~/.rubies/x86-64-uwa-0/bin/ruby --disable-all long-str.rb
  Time (mean �� ��):     831.3 ms ��   2.1 ms    [User: 827.4 ms, System: 3.6 ms]
  Range (min ��� max):   828.9 ms ��� 834.8 ms    10 runs
 
Benchmark 4: ~/.rubies/x86-64-v2-uwa-0/bin/ruby --disable-all long-str.rb
  Time (mean �� ��):      2.248 s ��  0.002 s    [User: 2.244 s, System: 0.003 s]
  Range (min ��� max):    2.246 s ���  2.253 s    10 runs
 
Benchmark 5: ~/.rubies/x86-64-v3-uwa-0/bin/ruby --disable-all long-str.rb
  Time (mean �� ��):     830.1 ms ��   1.7 ms    [User: 827.2 ms, System: 2.6 ms]
  Range (min ��� max):   827.6 ms ��� 832.9 ms    10 runs
 
Benchmark 6: ~/.rubies/x86-64-v4-uwa-0/bin/ruby --disable-all long-str.rb
  Time (mean �� ��):      2.254 s ��  0.004 s    [User: 2.249 s, System: 0.004 s]
  Range (min ��� max):    2.249 s ���  2.259 s    10 runs
``` Summary ~/.rubies/x86-64-v3-uwa-0/bin/ruby --disable-all long-str.rb ran 1.00 �� 0.00 times faster than ~/.rubies/x86-64-uwa-1-sans-ub/bin/ruby --disable-all long-str.rb 1.00 �� 0.00 times faster than ~/.rubies/x86-64-uwa-0/bin/ruby --disable-all long-str.rb 1.84 �� 0.00 times faster than ~/.rubies/x86-64-uwa-1/bin/ruby --disable-all long-str.rb 2.71 �� 0.01 times faster than ~/.rubies/x86-64-v2-uwa-0/bin/ruby --disable-all long-str.rb 2.71 �� 0.01 times faster than ~/.rubies/x86-64-v4-uwa-0/bin/ruby --disable-all long-str.rb ``` `x86-64-v3` wins. > Regarding Alan's patch, it only supports search_nonascii. Since the optimization under UNALIGNED_WORD_ACCESS is applied in other places as well, the patch may be incomplete. Right, it's incomplete. I just wanted to offer something quickly to see if it fixes the particular crash in OP. > I think it is fine to abandon the optimization and set UNALIGNED_WORD_ACCESS=0 unconditionally. I agree. If we do that, I hope we can delete the code for UNALIGNED_WORD_ACCESS=1. I think it's a mistake to keep around code that intentionally trigger UB, especially after learning that they cause crashes. Further simplification is possible after removing dead code by doing unaligned reads using memcpy unconditionally, on all platforms. It gets rid of the code for manually align pointers. It's a good balance between speed, C compliance, and complexity. This is optional, though, since we simplify by a lot by just keeping one side of UNALIGNED_WORD_ACCESS. UNALIGNED_WORD_ACCESS=1 is kind of funny. Once vectorized, most of the loads in the loop are in fact, aligned reads such as MOVDQA. ---------------------------------------- Bug #21715: Miscompilation on x86-64-v2 due to undefined behavior in search_nonascii in string.c https://bugs.ruby-lang.org/issues/21715#change-115328 * Author: mjacob (Manuel Jacob) * Status: Open * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- Building the following Dockerfile fails on a x86-64 machine in the last step (running `make` command): ``` FROM opensuse/leap:16.0 RUN zypper --non-interactive install wget make gcc RUN wget 'https://cache.ruby-lang.org/pub/ruby/3.4/ruby-3.4.7.tar.gz' RUN tar xaf ruby-3.4.7.tar.gz WORKDIR ruby-3.4.7/build RUN ../configure RUN make ``` The failing command (during `make`) is: `./miniruby -I../lib -I. -I.ext/common ../tool/mkconfig.rb -arch=x86_64-linux -version=3.4.7 -install_name=ruby -so_name=ruby -unicode_version=15.0.0 -unicode_emoji_version=15.0 > rbconfig.tmp` Excerpt from the crash report: ``` ../tool/mkconfig.rb: [BUG] Segmentation fault at 0x0000000000000000 ruby 3.4.7 (2025-10-08 revision 7a5688e2a2) +PRISM [x86_64-linux] -- Control frame information ----------------------------------------------- c:0001 p:0000 s:0003 E:000ec0 DUMMY [FINISH] -- Threading information --------------------------------------------------- Total ractor count: 1 Ruby thread count for this ractor: 1 -- Machine register context ------------------------------------------------ RIP: 0x0000556c2da74760 RBP: 0x0000000000000027 RSP: 0x00007ffd24a195f0 RAX: 0x0000000000000028 RBX: 0x0000556c64acc420 RCX: 0x0000000000000000 RDX: 0x0000000000000000 RDI: 0x0000000000000014 RSI: 0x00007f49f7d6c123 R8: 0x46ea57707c6b1df2 R9: 0x00007f49f7d6c123 R10: 0x2afb945fcb545f01 R11: 0x0000556c2dc3fe50 R12: 0x00007f49f7d6c263 R13: 0x00007f49f7d6c11b R14: 0x0000556c64bdaa48 R15: 0x00007f49f7d6c25c EFL: 0x0000000000010256 -- C level backtrace information ------------------------------------------- /ruby-3.4.7/build/miniruby(rb_print_backtrace+0x5) [0x556c2db2c1b6] ../vm_dump.c:823 /ruby-3.4.7/build/miniruby(rb_vm_bugreport) ../vm_dump.c:1155 /ruby-3.4.7/build/miniruby(rb_bug_for_fatal_signal+0xf7) [0x556c2d8cdc47] ../error.c:1130 /ruby-3.4.7/build/miniruby(sigsegv+0x42) [0x556c2da58482] ../signal.c:934 /lib64/libc.so.6(__restore_rt+0x0) [0x7f49f7eb2090] /ruby-3.4.7/build/miniruby(search_nonascii+0xcb) [0x556c2da74760] ../string.c:729 /ruby-3.4.7/build/miniruby(coderange_scan) ../string.c:767 /ruby-3.4.7/build/miniruby(rbimpl_fl_unset_raw_raw+0x0) [0x556c2da76874] ../string.c:895 /ruby-3.4.7/build/miniruby(RB_FL_UNSET_RAW) ../include/ruby/internal/fl_type.h:669 /ruby-3.4.7/build/miniruby(RB_ENC_CODERANGE_SET) ../include/ruby/internal/encoding/coderange.h:131 /ruby-3.4.7/build/miniruby(enc_coderange_scan) ../string.c:911 /ruby-3.4.7/build/miniruby(rb_enc_str_coderange) ../string.c:910 /ruby-3.4.7/build/miniruby(is_ascii_string+0x8) [0x556c2da7697e] ../internal/string.h:151 /ruby-3.4.7/build/miniruby(str_do_hash) ../string.c:393 /ruby-3.4.7/build/miniruby(register_fstring) ../string.c:554 /ruby-3.4.7/build/miniruby(rb_enc_literal_str+0x87) [0x556c2da94bb7] ../string.c:12546 /ruby-3.4.7/build/miniruby(parse_static_literal_string+0x38) [0x556c2d875991] ../prism_compile.c:312 /ruby-3.4.7/build/miniruby(pm_compile_node) ../prism_compile.c:10321 /ruby-3.4.7/build/miniruby(pm_compile_node+0x2e65) [0x556c2d875aa5] ../prism_compile.c:10309 /ruby-3.4.7/build/miniruby(pm_compile_conditional+0x18c) [0x556c2d88cfcc] ../prism_compile.c:1053 /ruby-3.4.7/build/miniruby(pm_compile_node+0x42e1) [0x556c2d876f21] ../prism_compile.c:9355 /ruby-3.4.7/build/miniruby(pm_setup_args_core+0xe4) [0x556c2d884304] ../prism_compile.c:1792 /ruby-3.4.7/build/miniruby(pm_setup_args+0x98) [0x556c2d884e98] ../prism_compile.c:1979 /ruby-3.4.7/build/miniruby(pm_compile_call+0x307) [0x556c2d885cf7] ../prism_compile.c:3673 /ruby-3.4.7/build/miniruby(pm_compile_call_node+0x2c6) [0x556c2d872326] ../prism_compile.c:7403 /ruby-3.4.7/build/miniruby(pm_compile_node+0x39dc) [0x556c2d87661c] ../prism_compile.c:8775 /ruby-3.4.7/build/miniruby(pm_compile_node+0x2e65) [0x556c2d875aa5] ../prism_compile.c:10309 /ruby-3.4.7/build/miniruby(pm_compile_conditional+0x18c) [0x556c2d88cfcc] ../prism_compile.c:1053-march=x86-64-v2 /ruby-3.4.7/build/miniruby(pm_compile_node+0x42e1) [0x556c2d876f21] ../prism_compile.c:9355 /ruby-3.4.7/build/miniruby(pm_compile_node+0x2e3a) [0x556c2d875a7a] ../prism_compile.c:10307 /ruby-3.4.7/build/miniruby(pm_compile_scope_node+0x104a) [0x556c2d88f5da] ../prism_compile.c:6991 /ruby-3.4.7/build/miniruby(pm_compile_node+0x35c9) [0x556c2d876209] ../prism_compile.c:10180 /ruby-3.4.7/build/miniruby(APPEND_LIST+0x0) [0x556c2d891e60] ../prism_compile.c:10481 /ruby-3.4.7/build/miniruby(pm_iseq_compile_node) ../prism_compile.c:10485 /ruby-3.4.7/build/miniruby(pm_iseq_new_with_opt_try+0x10) [0x556c2d94c790] ../iseq.c:1042 /ruby-3.4.7/build/miniruby(rb_protect+0xd6) [0x556c2d8db9c6] ../eval.c:1054 /ruby-3.4.7/build/miniruby(pm_iseq_new_with_opt+0x177) [0x556c2d9525c7] ../iseq.c:1095 /ruby-3.4.7/build/miniruby(pm_iseq_new_main+0x85) [0x556c2d952895] ../iseq.c:943 /ruby-3.4.7/build/miniruby(process_options+0x12fd) [0x556c2da519cd] ../ruby.c:2616 /ruby-3.4.7/build/miniruby(ruby_process_options+0x157) [0x556c2da52657] ../ruby.c:3174 /ruby-3.4.7/build/miniruby(ruby_options+0x97) [0x556c2d8da977] ../eval.c:117 /ruby-3.4.7/build/miniruby(rb_main+0x19) [0x556c2d7eb578] ../prism/prism.c:21769 /ruby-3.4.7/build/miniruby(main) ../main.c:68 /lib64/libc.so.6(__libc_start_call_main+0x82) [0x7f49f7e9b340] /lib64/libc.so.6(__libc_start_main+0x8b) [0x7f49f7e9b409] /ruby-3.4.7/build/miniruby(_start+0x25) [0x556c2d7eb5c5] ../main.c:69 ``` The failing instruction at 0x556c2da74760 is: `movdqa xmm0, XMMWORD PTR [rsi+rcx*1]`. At this place, register `rsi` contains 0x7f49f7d6c123, which is the value 0x7f49f7d6c11b of parameter `p` of the function `search_nonascii` + 8, and register `rcx` contains 0. So, the whole instruction means ���[move aligned packed integer values](https://www.felixcloutier.com/x86/movdqa:vmovdqa32:vmovdqa64) from memory at 0x7f49f7d6c123 to register `xmm0`���. The segmentation fault happened because the address is expected to be aligned on a 16-byte boundary, but it is not. The instruction is part of a loop at https://github.com/ruby/ruby/blob/v3_4_7/string.c#L728 that gets auto-vectorized by GCC. On x86-64, * `UNALIGNED_WORD_ACCESS` is `1` * `p` doesn���t get aligned to anything because of `#if !UNALIGNED_WORD_ACCESS` in line 700 * `aligned_ptr(value)` is expanded to `(uintptr_t *)(value)` according to line 723 * `p` is therefore casted to type `uintptr_t *` in line 725 * `uintptr_t` is typedefed to `unsigned long int`, which has alignment of 8 bytes In result, a pointer `p` to potentially unaligned memory is casted to a pointer to a type with alignment of 8 bytes. That is undefined behavior according to C99 6.3.2.3p7: ���A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type. If the resulting pointer is not correctly aligned for the pointed-to type, the behavior is undefined.���. Compilers can utilize this rule to make the assumption that the pointed-to memory has alignment of 8 bytes. In this case, the GCC loop auto-vectorizer adds code to align the assumedly 8 bytes aligned address to 16 bytes alignment. A subsequent instruction assuming 16 bytes alignment can therefore fail. I could reproduce this crash only on openSUSE Leap 16.0, but not openSUSE Leap 15.6, openSUSE Tumbleweed or Arch Linux, because only the former configured GCC to default to emitting code requiring x86-64-v2. When passing `-march=x86-64-v2` in CFLAGS, the crash happens on all these distributions. -- https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/