Garbage Collection
Garbage collection in theory and practice.
Garbage Collection in the VM
Contents
- Allocation
- Mark and sweep
- Embedability
- Incremental
- Optimizations
Theory reminders
Tri-colour marking
In the core of concurrent and incremental garbage collectors
- white - not used, dead object
- black - used, alive object, whose references have been marked
- gray - used, alive object, whose references have yet to be marked
references are the objects that are referenced by the current object
Invariants
Weak:
All white objects pointed to by a black object are reachable from some grey object through a chain of white objects.
Strong:
There are no pointers from black objects to white objects.
Escapes
In the incremental or concurrent GC, while the marking phase is running, the mutator can change the references of a black object and break the above invariants.
Barriers
Barriers are code that gets executed by the mutator every time it mutates the heap. They will enforce that the invariants are being kept.
- read
- write
- deletion
Read Barrier
Ensure the invariants by not allowing getting a white object out of a gray one
def read(object, field):
if is_grey(object):
shade(object)
return object[field]
Not used, since read operations are more than write operations in a program and having a slower read is worse than having a slower write
Write Barrier
Ensure the invariants by not allowing to set a white object as a reference inside a black object
def write(object, field, value):
object[field] = value
if is_black(object):
shade(value)
Disclaimer
This is how GC can be implemented. Everything is subject to change.
Allocation
Having a custom allocator is a must have for most VM
- decent performance
- optimizations
Allocators and C++
Simplicity matters
For simplicity, we are going to not use a special allocator.
Mark and sweep
- mark reacheable objects starting from variables (roots)
- sweep
- unmarked objects are returned to the free heap
- marked objects are unmarked (for the next cycle)
Mark
- every object starts as dead
- reacheable objects are marked as alive
Reachable objects
Roots
- registers
- stack
- global object / environment
Where are these stored?
Roots
- The
data_stack
- everything upto
m_SP
- everything upto
- The global object
Marking
void Spasm::Mark() {
for (auto cell = &data_stack[0]; cell != m_SP; ++cell) {
Mark(cell); // ?
}
Mark(m_Global);
}
Visit every object in the heap?
- Visitor pattern again
struct MarkVisitor
{
typedef void ResultType;
void Visit(double d) const {}
void VisitUndefined() const {}
void VisitNull() const {}
void Visit(StringValue& value) const;
void Visit(ArrayValue& value) const;
void Visit(ObjectValue& value) const;
void Visit(FunctionValue& value) const;
};
void Visit(StringValue& value) const {
if (Dead(value)) {
SetAlive(value);
}
}
void Visit(ArrayValue& value) const {
if (Dead(value)) {
SetAlive(value);
for (auto i = 0u; i < value.length(); ++i) {
value.item(i).Visit(*this);
}
}
}
void Visit(ObjectValue& value) const {
if (Dead(value)) {
SetAlive(value);
for (auto* p : value.m_Properties) {
p->first.Visit(*this);
p->second.Visit(*this);
}
value.m_Prototype->Visit(*this);
}
}
State
Where to store whether an object is dead or alive
bool
per object, string, array, functionbitmap
that holds one bit for a range of objectsbit
per pointer