From: Eric Wong <normalperson@...>
Date: 2018-02-06T10:00:00+00:00
Subject: [ruby-core:85442] Re: [Ruby trunk Bug#14357] thread_safe tests suite segfaults

Eric Wong <normalperson@yhbt.net> wrote:
> v.ondruch@tiscali.cz wrote:
> > https://bugs.ruby-lang.org/issues/14357
> > 
> > The thread_safe gem is not maintained anymore, but I don't see
> > any reason why its test suite should segfault with Ruby 2.5.
> 
> Right, no 3rd-party C exts loaded and I hit this in trunk, too.
> Using -fsanitize=address reveals use-after-free in st.c
> Investigating, but maybe Vladimir can find it sooner.

Maybe my initial investigation was correct, after all.

valgrind takes forever, but indicates the free is caused by
rebuild_table; so it doesn't look like we missed GC marking
during rebuild.  Disabling the free(tab->entries) at line
st.c:792 (patch below) seems to indicate success with the
thread_safe test suite (letting it loop overnight).

Looks like the new_tab != tab case of rebuild is leaving a
hanging reference somewhere.

==9885== Thread 32 cache_loops_sp*:
==9885== Invalid read of size 8
==9885==    at 0x235622: find_table_entry_ind (st.c:873)
==9885==    by 0x236C95: st_lookup (st.c:1049)
==9885==    by 0x1520CE: rb_hash_aref (hash.c:853)
==9885==    by 0x2A95E0: vm_opt_aref (vm_insnhelper.c:3650)
==9885==    by 0x2A95E0: vm_exec_core (insns.def:1175)
==9885==    by 0x2ACA83: vm_exec (vm.c:1790)
==9885==    by 0x2AD875: invoke_block (vm.c:993)
==9885==    by 0x2AD875: invoke_iseq_block_from_c (vm.c:1045)
==9885==    by 0x2B64A8: invoke_block_from_c_bh (vm.c:1063)
==9885==    by 0x2B64A8: vm_yield (vm.c:1108)
==9885==    by 0x2B64A8: rb_yield_0 (vm_eval.c:970)
==9885==    by 0x2B64A8: rb_yield_1 (vm_eval.c:976)
==9885==    by 0x19238D: int_dotimes (numeric.c:4984)
==9885==    by 0x29F816: vm_call_cfunc_with_frame (vm_insnhelper.c:1921)
==9885==    by 0x29F816: vm_call_cfunc (vm_insnhelper.c:1937)
==9885==    by 0x2A83D9: vm_exec_core (insns.def:719)
==9885==    by 0x2ACA83: vm_exec (vm.c:1790)
==9885==    by 0x2AD875: invoke_block (vm.c:993)
==9885==    by 0x2AD875: invoke_iseq_block_from_c (vm.c:1045)
==9885==  Address 0xbeafe88 is 43,080 bytes inside a block of size 49,152 free'd
==9885==    at 0x4C29E90: free (vg_replace_malloc.c:473)
==9885==    by 0x14C3EC: objspace_xfree (gc.c:7987)
==9885==    by 0x14C3EC: ruby_sized_xfree (gc.c:8082)
==9885==    by 0x14C3EC: ruby_xfree (gc.c:8089)
==9885==    by 0x236472: rebuild_table (st.c:792)
==9885==    by 0x237E85: rebuild_table_if_necessary (st.c:1090)
==9885==    by 0x237E85: st_add_direct_with_hash (st.c:1153)
==9885==    by 0x237E85: st_update (st.c:1431)
==9885==    by 0x150A4E: tbl_update (hash.c:561)
==9885==    by 0x150A4E: rb_hash_aset (hash.c:1654)
==9885==    by 0x2A9687: vm_opt_aset (vm_insnhelper.c:3671)
==9885==    by 0x2A9687: vm_exec_core (insns.def:1189)
==9885==    by 0x2ACA83: vm_exec (vm.c:1790)
==9885==    by 0x2AD875: invoke_block (vm.c:993)
==9885==    by 0x2AD875: invoke_iseq_block_from_c (vm.c:1045)
==9885==    by 0x2B674F: invoke_block_from_c_bh (vm.c:1063)
==9885==    by 0x2B674F: vm_yield (vm.c:1108)
==9885==    by 0x2B674F: rb_yield_0 (vm_eval.c:970)
==9885==    by 0x2B674F: rb_yield (vm_eval.c:983)
==9885==    by 0x131C86: rb_ensure (eval.c:1035)
==9885==    by 0x29F816: vm_call_cfunc_with_frame (vm_insnhelper.c:1921)
==9885==    by 0x29F816: vm_call_cfunc (vm_insnhelper.c:1937)
==9885==    by 0x2A83D9: vm_exec_core (insns.def:719)

Line numbers based on r62184
(git commit 05c18139a1545a61caaaf33d888c8427d346b571).

Following patch hides the problem by introducing a leak:
```
--- a/st.c
+++ b/st.c
@@ -789,7 +789,7 @@ rebuild_table(st_table *tab)
 	if (tab->bins != NULL)
 	    free(tab->bins);
 	tab->bins = new_tab->bins;
-	free(tab->entries);
+	/* free(tab->entries); */ /* NOT FOR PRODUCTION USE */
 	tab->entries = new_tab->entries;
 	free(new_tab);
     }
```

(gdb) up
#17 0x00005604a6dd173d in find_table_entry_ind (tab=tab@entry=0x7f13e4444ac0, hash_value=hash_value@entry=0,
    key=key@entry=94578030726560) at ../st.c:874
874                 && PTR_EQUAL(tab, &entries[bin - ENTRY_BASE], hash_value, key))
(gdb) up
#18 0x00005604a6dd2d26 in st_lookup (tab=0x7f13e4444ac0, key=key@entry=94578030726560, value=value@entry=0x7f132fdfc2f8)
    at ../st.c:1050
1050            bin = find_table_entry_ind(tab, hash, key);
(gdb) p *tab
$1 = {entry_power = 7 '\a', bin_power = 8 '\b', size_ind = 0 '\000', rebuilds_num = 213, type = 0x5604a71ce210 <objhash>,
  num_entries = 121, bins = 0x7f13e445a340, entries_start = 0, entries_bound = 121, entries = 0x7f13e445c6b0}

Looks like it's freshly rebuilt table.  Pretty easy to reproduce
the problem on 2.5, I remember it took more tries on 2.4 (didn't
valgrind).   An extra pair of eyes more experienced with this
code than I am would be appreciated.  Thanks.

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>