From: ko1@... Date: 2015-12-10T00:26:22+00:00 Subject: [ruby-core:72014] [Ruby trunk - Bug #11799] Object allocation during garbage collection phase terminates the Ruby process Issue #11799 has been updated by Koichi Sasada. Thank you for your log. This is a reason. ``` (gdb) thread 12 [Switching to thread 12 (Thread 0x7fffe5fc4700 (LWP 27808))]#0 0x00007ffff772463c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 (gdb) bt #0 0x00007ffff772463c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007ffff7b0375b in native_cond_wait (limits_us=) at thread_pthread.c:338 #2 gvl_acquire_common (limits_us=) at thread_pthread.c:87 #3 gvl_yield (limits_us=) at thread_pthread.c:155 #4 rb_thread_schedule_limits (limits_us=) at thread.c:1209 #5 0x00007ffff7b03b7c in rb_threadptr_execute_interrupts (th=, blocking_timing=0) at thread.c:2019 #6 0x00007ffff7ae236d in vm_call0_body (th=0x1988bb0, ci=, argv=0x7fffe5fc10b0) at vm_eval.c:252 #7 0x00007ffff7ae331e in vm_call0 (recv=18167000, mid=152, n=1) at vm_eval.c:59 #8 rb_call0 (recv=18167000, mid=152, n=1) at vm_eval.c:349 #9 rb_call (recv=18167000, mid=152, n=1) at vm_eval.c:616 #10 rb_funcall (recv=18167000, mid=152, n=1) at vm_eval.c:818 #11 0x00007fffef6659ed in eqrb::operator()(unsigned long, unsigned long) const () from /usr/local/rvm/gems/ruby-2.2.3/gems/google_hash-0.8.9/lib/google_hash.so #12 0x00007fffef6f6b6f in google::sparse_hashtable, unsigned long, hashrb, google::sparse_hash_map > >::SelectKey, google::sparse_hash_map > >::SetKey, eqrb, google::libc_allocator_with_realloc > >::KeyInfo::equals(unsigned long const&, unsigned long const&) const () from /usr/local/rvm/gems/ruby-2.2.3/gems/google_hash-0.8.9/lib/google_hash.so #13 0x00007fffef6f5739 in google::sparse_hashtable, unsigned long, hashrb, google::sparse_hash_map > >::SelectKey, google::sparse_hash_map > >::SetKey, eqrb, google::libc_allocator_with_realloc > >::equals(unsigned long const&, unsigned long const&) const () ---Type to continue, or q to quit--- from /usr/local/rvm/gems/ruby-2.2.3/gems/google_hash-0.8.9/lib/google_hash.so #14 0x00007fffef6f65ab in google::sparse_hashtable, unsigned long, hashrb, google::sparse_hash_map > >::SelectKey, google::sparse_hash_map > >::SetKey, eqrb, google::libc_allocator_with_realloc > >::test_deleted_key(unsigned long const&) const () from /usr/local/rvm/gems/ruby-2.2.3/gems/google_hash-0.8.9/lib/google_hash.so #15 0x00007fffef6f5440 in google::sparse_hashtable, unsigned long, hashrb, google::sparse_hash_map > >::SelectKey, google::sparse_hash_map > >::SetKey, eqrb, google::libc_allocator_with_realloc > >::test_deleted(google::sparse_hashtable_iterator, unsigned long, hashrb, google::sparse_hash_map > >::SelectKey, google::sparse_hash_map > >::SetKey, e qrb, google::libc_allocator_with_realloc > > const&) const () from /usr/local/rvm/gems/ruby-2.2.3/gems/google_hash-0.8.9/lib/google_hash.so #16 0x00007fffef6f4a49 in google::sparse_hashtable_iterator, unsigned long, hashrb, google::sparse_hash_map > >::SelectKey, google::sparse_hash_map > >::SetKey, eqrb, google::libc_allocator_with_realloc > >::advance_past_deleted() () from /usr/local/rvm/gems/ruby-2.2.3/gems/google_hash-0.8.9/lib/google_hash.so #17 0x00007fffef6f45e4 in google::sparse_hashtable_iterator, unsigned long, hashrb, google::sparse_hash_map > >::SelectKey, google::sparse_hash_map to continue, or q to quit--- gned long, unsigned long, hashrb, eqrb, google::libc_allocator_with_realloc > >::SetKey, eqrb, google::libc_allocator_with_realloc > >::operator++() () from /usr/local/rvm/gems/ruby-2.2.3/gems/google_hash-0.8.9/lib/google_hash.so #18 0x00007fffef6f38e8 in mark_hash_map_values () from /usr/local/rvm/gems/ruby-2.2.3/gems/google_hash-0.8.9/lib/google_hash.so #19 0x00007ffff79c91cb in gc_mark_children (objspace=0x6029f0, obj=18167000) at gc.c:4190 #20 0x00007ffff79ca5f4 in rgengc_rememberset_mark (objspace=0x6029f0, full_mark=, immediate_mark=, immediate_sweep=, reason=) at gc.c:5376 #21 gc_marks_start (objspace=0x6029f0, full_mark=, immediate_mark=, immediate_sweep=, reason=) at gc.c:4951 #22 gc_marks (objspace=0x6029f0, full_mark=, immediate_mark=, immediate_sweep=, reason=) at gc.c:5206 #23 gc_start (objspace=0x6029f0, full_mark=, immediate_mark=, immediate_sweep=, reason=) at gc.c:5957 #24 0x00007ffff79cc65c in heap_prepare (klass=6687280, flags=8231, v1=0, v2=0, v3=0) at gc.c:1564 #25 heap_get_freeobj_from_next_freepage (klass=6687280, flags=8231, v1=0, v2=0, v3=0) at gc.c:1576 #26 heap_get_freeobj (klass=6687280, flags=8231, v1=0, v2=0, v3=0) at gc.c:1600 #27 newobj_of (klass=6687280, flags=8231, v1=0, v2=0, v3=0) at gc.c:1645 #28 0x00007ffff796ae69 in ary_alloc (capa=3) at array.c:445 ``` (1) Thread 12 runs GC (2) mark function in /usr/local/rvm/gems/ruby-2.2.3/gems/google_hash-0.8.9/lib/google_hash.so is called. #17 (how to read...?) (3) #13 calls Ruby code from mark function. <- BUG (4) Ruby code switches to Thread 6. (5) Thread 6 runs own Ruby code and cause GC (6) get [BUG] message This is 3rd party issue. ---------------------------------------- Bug #11799: Object allocation during garbage collection phase terminates the Ruby process https://bugs.ruby-lang.org/issues/11799#change-55423 * Author: Charles Leu * Status: Open * Priority: Normal * Assignee: * ruby -v: ruby 2.2.3p173 (2015-08-18 revision 51636) [x86_64-linux] * Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN, 2.2: UNKNOWN ---------------------------------------- Resurrection of Bug #11549 that was apparently inadvertently deleted. Also note the similarity to Bug #10868. **Summary:** Multi-Threaded Ruby apps are often problematic; especially so when utilizing thread pools, and scheduling work to worker threads. * While the problem isn't easily recreated, it can be reliably recreated given sufficient time and conditions (for garbage collection in conjunction with sibling thread behavior). * Initially, three trace files have been attached. Each shows the problem in different conditions, and using different client infrastructure. * If this is a problem due to usage of certain gems that aren't 'well-behaved', it would be good to know which ones to avoid. RE: attached file ruby_2.2.3_obj_alloc_gc_bug.txt > Following is the section of sap_consumer_control.rb that is presented by the Ruby interpreter as being the current execution context when the problem occurs. > > 408: loop do > 409: @worker_threads.schedule(@work_queue.pop, &@consumer) > 410: @sap_packets_consumed += 1 > 411: end > > Notes: > * @work_queue is a Ruby Queue (allocated within the main thread) into which a producer thread places work requests. > * @worker_threads is a thread pool (allocated within the main thread). > * @worker_threads schedule method simply puts a work request into the thread pool's internal work queue. One of the worker threads within the thread pool will consume/effect the work request, by executing the specified consumer Proc. > * The main program thread simply loops forever scheduling work to thread pool threads. > * It appears that an object is being allocated by virtue of the @work_queue.pop RE: attached file ruby_bug_redis_client.txt > * The observed behavior also occurs readily when employing the Redis client gem. > * This is the use case in which the bug is most easily reproduced. RE: attached file ruby_bug-gdb.txt > * The observed behavior has also been observed when using stretcher, faraday, net-http-persistent gems. > * Connections via stretcher to net-http-persistent are on a per thread basis (maintained within thread local storage). > * In this particular case, the Ruby process was run under gdb, and back traces for each of the process threads are provided. ---Files-------------------------------- ruby_bug_redis_client.txt (53.3 KB) ruby_2.2.3_obj_alloc_gc_bug.txt (49.1 KB) ruby_bug-gdb.txt (151 KB) -- https://bugs.ruby-lang.org/