[#23657] [Bug #1550] String#lstrip! raises RuntimeError on Frozen String Despite Making No Changes — Run Paint Run Run <redmine@...>

Bug #1550: String#lstrip! raises RuntimeError on Frozen String Despite Making No Changes

13 messages 2009/06/01

[#23729] [Bug #1583] Time + String no Longer Raises TypeError? — Run Paint Run Run <redmine@...>

Bug #1583: Time + String no Longer Raises TypeError?

14 messages 2009/06/05

[#23770] [Bug #1595] rake unusable on windows install — Robert Gonzalez <redmine@...>

Bug #1595: rake unusable on windows install

10 messages 2009/06/09

[#23869] [Bug #1640] [PATCH] Documentation for the Rational Class — Run Paint Run Run <redmine@...>

Bug #1640: [PATCH] Documentation for the Rational Class

12 messages 2009/06/16

[#23903] [Bug #1648] Rational#div Raises NoMethodError for Invalid Argument — Run Paint Run Run <redmine@...>

Bug #1648: Rational#div Raises NoMethodError for Invalid Argument

9 messages 2009/06/17

[#23977] [ANN] meeting log of RubyDeveloperKaigi20090622 — "Yugui (Yuki Sonoda)" <yugui@...>

Hi,

41 messages 2009/06/23
[#23979] Re: [ANN] meeting log of RubyDeveloperKaigi20090622 — Run Paint Run Run <runrun@...> 2009/06/23

Thanks for the update. :-)

[#24173] Re: [ANN] meeting log of RubyDeveloperKaigi20090622 — "NARUSE, Yui" <naruse@...> 2009/07/07

Sorry for late response,

[#24174] Re: [ANN] meeting log of RubyDeveloperKaigi20090622 — Luis Lavena <luislavena@...> 2009/07/07

On Tue, Jul 7, 2009 at 12:12 AM, NARUSE, Yui<naruse@airemix.jp> wrote:

[#24242] Re: [ANN] meeting log of RubyDeveloperKaigi20090622 — Charles Oliver Nutter <headius@...> 2009/07/09

On Mon, Jul 6, 2009 at 10:18 PM, Luis Lavena<luislavena@gmail.com> wrote:

[#24010] [Bug #1685] Some windows unicode path issues remain — B Kelly <redmine@...>

Bug #1685: Some windows unicode path issues remain

26 messages 2009/06/24
[#29189] [Bug #1685] Some windows unicode path issues remain — Yuki Sonoda <redmine@...> 2010/04/01

Issue #1685 has been updated by Yuki Sonoda.

[#29200] Re: [Bug #1685] Some windows unicode path issues remain — Bill Kelly <billk@...> 2010/04/01

Yuki Sonoda wrote:

[#29892] Re: [Bug #1685] Some windows unicode path issues remain — Bill Kelly <billk@...> 2010/04/29

Hi,

[#24058] [Bug #1696] http downloads are unuseably slow — Steven Hartland <redmine@...>

Bug #1696: http downloads are unuseably slow

19 messages 2009/06/27

[#24063] [Feature #1697] Object#<=> — Marc-Andre Lafortune <redmine@...>

Feature #1697: Object#<=>

15 messages 2009/06/28

[ruby-core:24002] [Bug #1682] Low probability error in gc

From: John Carter <redmine@...>
Date: 2009-06-24 01:37:56 UTC
List: ruby-core #24002
Bug #1682: Low probability error in gc
http://redmine.ruby-lang.org/issues/show/1682

Author: John Carter
Status: Open, Priority: Normal
ruby -v: ruby 1.8.8dev (2009-06-23) [i686-linux]

I am trying to track a very hard to reproduce one in a million
segfault in ruby that seems to be a corrupted heap.

So I'm am using valgrind on the latest (24th Jun 2009) snapshot from
ftp://ftp.ruby-lang.org/pub/ruby/stable-snapshot.tar.gz

Yup, you've seen this one before and declared it innocuous, but
patience, follow my chain of logic and you will realise in certain
corner cases it may be fatal.

Valgrind reports the... and the standard answer is yes, yes, we know
it's Rubies conservative garbage collection it doesn't matter, but 
keep reading and I will show you how it may on occasions matter.

==5709== Conditional jump or move depends on uninitialised value(s)
==5709==    at 0x8075383: gc_mark (ruby.h:731)
==5709==    by 0x8074F98: mark_locations_array (gc.c:708)
==5709==    by 0x8075701: garbage_collect (gc.c:1497)
==5709==    by 0x8075FFC: rb_newobj (gc.c:453)

==5709==  Uninitialised value was created by a stack allocation
==5709==    at 0x8095D84: rb_node_newnode (parse.y:4591)

If we have a look at that location in parse.y we have this code.
----------------------------------------------------------------------
NODE*
rb_node_newnode(type, a0, a1, a2)
    enum node_type type;
    VALUE a0, a1, a2;
{
    NODE *n = (NODE*)rb_newobj();
----------------------------------------------------------------------

ie. The uninitialized value is the "n". Nothing special about this
line, we get to this state from many other locations.

So what is happening here...

..when a new object is allocated an nothing is available in on the free list,

The gc runs around marking everything that is still "live" and then
sweeps up the leavings.

But what is still "live"? Anything currently on the stack for example.

But what about things like "n" which are on the stack, but not initialized yet?

No problem, if it is a number that, if viewed as a pointer, points
into a chunk of heap. We regard it as "live" anyway. If we're wrong
and it's just a number... or even a random string of uninitialized
bits left on the stack (like "n" above), no problem.. we just don't
garbage collect it and waste a bit of ram.

    FLUSH_REGISTER_WINDOWS;
    /* This assumes that all registers are saved into the jmp_buf (and stack) */
    rb_setjmp(save_regs_gc_mark);
    mark_locations_array((VALUE*)save_regs_gc_mark, sizeof(save_regs_gc_mark) / sizeof(VALUE *));
#if STACK_GROW_DIRECTION < 0
    rb_gc_mark_locations((VALUE*)STACK_END, rb_gc_stack_start);

Inicidentally rb_gc_mark_locations is implemented in terms of
mark_locations_array so lets have a closer look at
mark_locations_array

static inline int
is_pointer_to_heap(ptr)
    void *ptr;
{
    register RVALUE *p = RANY(ptr);
    register RVALUE *heap_org;
    register long i;

    if (p < lomem || p > himem) return Qfalse;
    if ((VALUE)p % sizeof(RVALUE) != 0) return Qfalse;

    /* check if p looks like a pointer */
    for (i=0; i < heaps_used; i++) {
	heap_org = heaps[i].slot;
	if (heap_org <= p && p < heap_org + heaps[i].limit)
	    return Qtrue;
    }
    return Qfalse;
}

static void
mark_locations_array(x, n)
    register VALUE *x;
    register long n;
{
    VALUE v;
    while (n--) {
        v = *x;
	if (is_pointer_to_heap((void *)v)) {
	    gc_mark(v, 0);
	}
	x++;
    }
}

So mark_locations_array just scans along an arbitary chunk of
ram.. eg. the stack. Treating every four bytes as a pointer (whether
it is or not).

The is_pointer_to_heap predicate checks to see if that number points
to a location in the heap...
    if (p < lomem || p > himem) return Qfalse;
and that is typically the line that valgrind whinges about.

Occasionally an uninitialized value just happens to pass that first
crude check and then valgrind whinges on this line.

	if (heap_org <= p && p < heap_org + heaps[i].limit)
	    return Qtrue;

But then we can be fairly confident that p _is_ pointing somewhere
within a valid ruby object that we should mark as live. Perhaps it
shouldn't be, perhaps it was just an accident that some number just
happen to be in that range, so we on rare occassions waste some ram,
so what.

So far, perfect. 

Alas, instead of returning the start of that ruby object, we just
return true.

Actually the thing living between heaps[i].slot and
heaps[i].slot+heaps[i].limit is typically an array of roughly
10000 RValue's.

And then instead of marking the start of that heap block as live, _or_
the start of the RValue within which v is pointing to , we mark what
thing v is pointing to as live.

        v = *x;
	if (is_pointer_to_heap((void *)v)) {
	    gc_mark(v, 0);

That would be OK... except gc_mark (sometimes) modifies what v is
pointing to by flipping the FL_MARK bit. And then hunting down the
children of that object and marking them.


static void
gc_mark(ptr, lev)
    VALUE ptr;
    int lev;
{
    register RVALUE *obj;

    obj = RANY(ptr);
    if (rb_special_const_p(ptr)) return; /* special const not marked */
    if (obj->as.basic.flags == 0) return;       /* free cell */
    if (obj->as.basic.flags & FL_MARK) return;  /* already marked */
    obj->as.basic.flags |= FL_MARK;


All Good. EXCEPT if v is pointing into the middle of a 20 byte RValue
union instead of the start!

Ok, so 99.999% of random bitstrings like "n" picked up this way got
there because they were at some stage valid pointers to RValues BUT it
is NOT gauranteed.

ie. We have a very low probability bug that gc will treat the inner
part of a RValue as an RValue with possible bad results. (segfault,
corruption...)

Solution? Change the line...
	    gc_mark(v, 0);
to convert v to the nearest sizeof(RValue) aligned address at or below.
           gc_mark(v - (v % sizeof(RVALUE)), 0);

Well, at least in this case of invoking gc_mark. I haven't reviewed
all the other invocations.


----------------------------------------
http://redmine.ruby-lang.org

In This Thread

Prev Next