From: "ioquatix (Samuel Williams)" Date: 2022-05-14T23:06:53+00:00 Subject: [ruby-core:108553] [Ruby master Bug#18782] Race conditions in autoload when loading the same feature with multiple threads. Issue #18782 has been updated by ioquatix (Samuel Williams). File 0005-Revert-removal-of-non-conditional-xfree.patch added File 0004-Increase-timeout-for-debug-tests.patch added File 0003-Protect-race-on-autoload-state.patch added File 0002-Prevent-race-between-GC-mark-and-autoload-setup.patch added File 0001-Add-RUBY_VM_CRITICAL_SECTION-for-detecting-unexpecte.patch added @normalperson I've attached patch files. If you have time, I'd really appreciate your feedback on this. Specifically, patch 5 - freeing this unconditionally causes the Ruby VM to hang in `test-bundler-parallel` and I'm not sure why. Here is a backtrace of a hung process: ``` * thread #1, name = 'ruby', stop reason = signal SIGSTOP * frame #0: 0x00007f91a51972cb libc.so.6`__lll_lock_wait_private + 43 frame #1: 0x00007f91a51ab6e0 libc.so.6`__libc_calloc + 816 frame #2: 0x00007f91a5542823 ld-linux-x86-64.so.2`_dl_new_object + 115 frame #3: 0x00007f91a553ded3 ld-linux-x86-64.so.2`_dl_map_object_from_fd + 179 frame #4: 0x00007f91a553f7ac ld-linux-x86-64.so.2`_dl_map_object + 508 frame #5: 0x00007f91a55435b9 ld-linux-x86-64.so.2`dl_open_worker_begin + 169 frame #6: 0x00007f91a5267e18 libc.so.6`_dl_catch_exception + 136 frame #7: 0x00007f91a5542d7b ld-linux-x86-64.so.2`dl_open_worker + 59 frame #8: 0x00007f91a5267e18 libc.so.6`_dl_catch_exception + 136 frame #9: 0x00007f91a554315d ld-linux-x86-64.so.2`_dl_open + 173 frame #10: 0x00007f91a5268211 libc.so.6`do_dlopen + 65 frame #11: 0x00007f91a5267e18 libc.so.6`_dl_catch_exception + 136 frame #12: 0x00007f91a5267ee3 libc.so.6`_dl_catch_error + 51 frame #13: 0x00007f91a5268170 libc.so.6`dlerror_run + 64 frame #14: 0x00007f91a526832a libc.so.6`__libc_dlopen_mode + 74 frame #15: 0x00007f91a521f191 libc.so.6`__libc_unwind_link_get + 81 frame #16: 0x00007f91a522c867 libc.so.6`__backtrace + 39 frame #17: 0x000056542d70de17 ruby`rb_vm_bugreport at vm_dump.c:762:13 frame #18: 0x000056542d7a454a ruby`rb_bug_for_fatal_signal at error.c:822:5 frame #19: 0x000056542d663c49 ruby`sigsegv(sig=, info=, ctx=) at signal.c:964:5 frame #20: 0x00007f91a514f560 libc.so.6`__restore_rt frame #21: 0x00007f91a51a6ef1 libc.so.6`malloc_consolidate + 257 frame #22: 0x00007f91a51a8450 libc.so.6`_int_free + 2032 frame #23: 0x00007f91a51aabe3 libc.so.6`free + 115 frame #24: 0x00007f91a19482bd libyaml-0.so.2`yaml_parser_delete + 77 frame #25: 0x00007f91a4c49bb9 psych.so`dealloc(ptr=0x000056542efbf0e0) at psych_parser.c:48:5 frame #26: 0x000056542d525a2c ruby`finalize_deferred_heap_pages at gc.c:4241:2 frame #27: 0x000056542d5307f6 ruby`rb_objspace_call_finalizer at gc.c:4427:5 frame #28: 0x000056542d509de0 ruby`rb_ec_cleanup at eval.c:164:5 frame #29: 0x000056542d50a65c ruby`ruby_run_node at eval.c:321:12 frame #30: 0x000056542d5041fa ruby`rb_main(argc=7, argv=0x00007ffeffea7e48) at main.c:47:9 frame #31: 0x000056542d504230 ruby`main(argc=7, argv=0x00007ffeffea7e48) at main.c:56:10 frame #32: 0x00007f91a513a310 libc.so.6`__libc_start_call_main + 128 frame #33: 0x00007f91a513a3c1 libc.so.6`__libc_start_main@@GLIBC_2.34 + 129 frame #34: 0x000056542d504095 ruby`_start + 37 ``` ---------------------------------------- Bug #18782: Race conditions in autoload when loading the same feature with multiple threads. https://bugs.ruby-lang.org/issues/18782#change-97593 * Author: ioquatix (Samuel Williams) * Status: Open * Priority: Normal * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN ---------------------------------------- I have identified several race conditions in the autoload code. 1. It's possible to race on adding and then deleting items in `autoload_featuremap`. When this happens, two threads will try to load the same file with different autoload data and deadlock. 2. When finishing autoload, it's necessary to clear `ele->state` before setting constants. If this is not synchronised, a thread can see the cleared `ele->state` before seeing the constants and assume the constant is not being autoloaded and then fail with `NameError`. This test case can reproduce both cases: ``` # test.rb autoload_path = File.join(__dir__, "foobar.rb") File.write(autoload_path, 'module Foo; end; module Bar; end') 100_000.times do $stderr.puts "--------------------" autoload :Foo, autoload_path autoload :Bar, autoload_path t1 = Thread.new {Foo} t2 = Thread.new {Bar} t1.join t2.join Object.send(:remove_const, :Foo) Object.send(:remove_const, :Bar) $LOADED_FEATURES.delete(autoload_path) end ``` Example failure of case (1): ``` -------------------- (success) autoload_by_someone_else ele=0x55f33b806a30 ele->state=(nil) autoload_by_someone_else ele=0x55f33b806a30 ele->state=(nil) check_autoload_required 2 autoload_by_someone_else ele=0x55f33b806a30 ele->state=0x7fdd678be780 check_autoload_required 4 autoload_by_someone_else ele=0x55f33b806a30 ele->state=0x7fdd678be780 check_autoload_required 4 ele=0x55f33b806a30 ele->state=0x7fdd678be780 = NULL check_autoload_required 4 -------------------- (failure) autoload_by_someone_else ele=0x55f33b806a30 ele->state=(nil) autoload_by_someone_else ele=0x55f33b6e8f40 ele->state=(nil) check_autoload_required 2 check_autoload_required 3 autoload_by_someone_else ele=0x55f33b806a30 ele->state=0x7fdd6779d780 check_autoload_required 1 autoload_by_someone_else ele=0x55f33b806a30 ele->state=0x7fdd6779d780 check_autoload_required 1 ele=0x55f33b806a30 ele->state=0x7fdd6779d780 = NULL ele=0x55f33b6e8f40 ele->state=0x7fdd678be780 = NULL ../test.rb:12:in `join': No live threads left. Deadlock? (fatal) 3 threads, 3 sleeps current:0x000055f33b771250 main thread:0x000055f33b66e090 * # rb_thread_t:0x000055f33b66e090 native:0x00007fdd6a71c3c0 int:0 * # rb_thread_t:0x000055f33b770ff0 native:0x00007fdd6789e640 int:1 mutex:0x000055f33b7c5100 cond:1 depended by: tb_thread_id:0x000055f33b66e090 * # rb_thread_t:0x000055f33b771250 native:0x00007fdd679bf640 int:0 from ../test.rb:12:in `block in
' from ../test.rb:4:in `times' from ../test.rb:4:in `
' make: *** [uncommon.mk:1250: runruby] Error 1 ``` Example failure of case (2): ``` [0x7f175fe5b0c8] rb_autoload_str mod=Object id=Foo file="/home/samuel/Projects/ioquatix/ruby/foobar.rb" [0x7f175fe5b0c8] rb_autoload_str const_set mod=Object id=Foo file="/home/samuel/Projects/ioquatix/ruby/foobar.rb" [0x7f175fe5b0c8] rb_autoload_str mod=Object id=Bar file="/home/samuel/Projects/ioquatix/ruby/foobar.rb" [0x7f175fe5b0c8] rb_autoload_str const_set mod=Object id=Bar file="/home/samuel/Projects/ioquatix/ruby/foobar.rb" [0x7f175fe61d88] rb_const_search_from value == Qundef -> autoloading [0x7f175fe61e78] rb_const_search_from value == Qundef -> autoloading [0x7f175fe61e78] Assigning constants... [0x7f175fe61d88] rb_const_search_from value == Qundef -> autoloading [0x7f175fe61e78] autoload_const_set name=:Foo value=Foo [0x7f175fe61e78] autoload_const_set name=:Bar value=Bar # terminated with exception (report_on_exception is true): ../test.rb:11:in `block (2 levels) in
': uninitialized constant Bar (NameError) ../test.rb:11:in `block (2 levels) in
': uninitialized constant Bar (NameError) make: *** [uncommon.mk:1250: runruby] Error 1 ``` These failures are very uncommon but it does impact Ruby as far back as 2.7, and probably earlier. ---Files-------------------------------- 0001-Add-RUBY_VM_CRITICAL_SECTION-for-detecting-unexpecte.patch (2.07 KB) 0002-Prevent-race-between-GC-mark-and-autoload-setup.patch (1.01 KB) 0003-Protect-race-on-autoload-state.patch (10.6 KB) 0004-Increase-timeout-for-debug-tests.patch (910 Bytes) 0005-Revert-removal-of-non-conditional-xfree.patch (691 Bytes) -- https://bugs.ruby-lang.org/ Unsubscribe: