From: ko1@...
Date: 2014-08-17T05:55:49+00:00
Subject: [ruby-core:64434] [ruby-trunk - Feature #10137] Introducing	Incremental GC algorithm

Issue #10137 has been updated by Koichi Sasada.


FYI: here is a top slow tests with data.
http://www.atdot.net/fp_store/f.zdsfan/file.copipa-temp-image.png

Typically, GC.start slows down.


----------------------------------------
Feature #10137: Introducing Incremental GC algorithm
https://bugs.ruby-lang.org/issues/10137#change-48387

* Author: Koichi Sasada
* Status: Open
* Priority: Normal
* Assignee: Koichi Sasada
* Category: core
* Target version: current: 2.2.0
----------------------------------------
# Abstract

Introduce *incremental GC* algorithm to reduce pause time of major/full 
GC. This ticket includes design and implementation note and a working 
patch.

# Background and problem

Ruby 2.1 uses generational GC algorithm (named RGenGC) to improve GC 
throughput. Genrational GC algorithm separates existing objects into 
young generation objects and old generation objects. At the most of GC 
timing, GC only marks young generation objects (called minor GC). If 
there are no enough memory, then mark all of objects (called major GC or 
full GC). Minor GC is dramatically fast than major GC. So that total 
throughput of application improves (10% improvement in my RDoc benchmark, 
[ruby-list:49896] reported that GC intensive application is 6 times 
faster!).

However, major GC is needed periodically and it pauses same time as GC 
on Ruby 2.0 and before. This problem hits response time intensive 
application such as web application.

# Proposal

Introduce *Incremental GC* algorithm for major GC.

Incremental GC algorithm is well-known GC algorithm to reduce GC pause 
time. Scatter GC (marking) process in several phases and run processes 
interleaving with Ruby's execution process. This is similar to current 
lazy sweep algorithm (in fact, lazy sweep is a half part of incremental 
GC algorithm).

Running ruby process with marking phase, it is possible to introduce 
critical bug because marked objects can points un-marked objects (on the 
incremental GC terminology, marked objects are called "Black" objects 
and un-marked objects are called "White" objects). Such un-marked 
objects can be left in un-marking and be swept. 

```
# if `ary' is already marked, and obj is not marked
ary[0] = obj
obj = nil # no-body except `ary' refers `obj'
```

To prevent such destructive bug, we can use write barriers to detect 
such "marked objects" to "un-marked objects". We can care about such 
case.

Yes, MRI/CRuby has "WB-unprotected" objects such objects does not have 
write barriers because of compatibility or implementation issues. To 
care about such WB-unprotected objects, we need to traverse all of 
living WB-unprotected objects at a time in the last of incremental 
marking. This is new extending idea against traditional incremental GC 
algorithm (at least I surveyed).

Deisgn and implementation details are here:
http://www.atdot.net/fp_store/f.p61can/file.data-incremental-gc.pdf

Maybe a diagram at page 10 will help you to understand the flow of all 
GC process.

Code is here:
https://github.com/ko1/ruby/tree/rincgc

Compare with trunk:
https://github.com/ko1/ruby/compare/rincgc

# Implementation note

## WB-unprotected bitmap

As I said, we need to check all of living WB-unprotected objects at the 
last of incremental marking phase. To do it lightweight, introduce 
WB-unprotected bitmap intead of specific bit in RBasic::flags.

We can get all living WB-unprotected objects with the following pseudo 
code:

```
bits = mark_bits[i] & wb_unprotected_bits[i];
while (bits) {
  if (bits & 1) do something.
  bits >>= 1;
}
```

## 4 age promotion

Because we don't need to use WB-protected bit in RBasic::flags, we have 
another 1 bit in RBasic::flags. To utilize this bit, we add *age* of an 
object with exsiting promoted bit. Rename FL_WB_PROTECTED to 
FL_PROMOTED0 and FL_PROMOTED to FL_PROMOTED1.

These two bits represent object's age (0 to 3) and 3 means OLD objects.

## Write barriers

We extend write barriers to detect such [marked objects -> un-marked 
objects] reference in incremental GC. It introduces some overhead.

# Evaluation

Benchmark results on real Linux machine (*1)
http://www.atdot.net/sp/raw/yekban
*1: Intel(R) Xeon(R) CPU E5335 @ 2.00GHz, 4GB memory, 2.6.32-5-amd64 (Debian squeeze)

In most of case, there are only a few (~5%) performance down. 
Incremental GC introduces some overhead.  But I think they are 
acceptable speed-down.

Discouse benchmark (only on Virtualbox VM, so accuracy is not good)
http://www.atdot.net/sp/raw/g9uban

We can recognize reducing worst case seconds.

# TODO

(1) Clean up codes

Now, we can not disable incremental GC codes and generational GC codes. 
We need to add ability to enable/disable features with macros.

(2) Tuning parameters

Now the parameters are fixed in codes. mruby already have tuning 
parameters for incremental GC (matz said they are from Lua), 
"GC.interval_ratio" and "GC.step_ratio". We can import these functions 
(or making another interface to tell).

(3) Enter GC/ Exit GC internal events

This patch also includes function "gc_enter()" and "gc_exit()" which set 
and reset a "doing GC" flag.

If we introduce internal event to hook these functions, we can measure 
exact GC pause time (and mutators time).

# Summary

This feature proposal is to introduce incremental GC algorithm with working code.
Incremental GC algorithm reduce application's pause time of major GC.

Any feedback are welcome!

Thanks,
Koichi




-- 
https://bugs.ruby-lang.org/