Garbage Collection Algorithm — Mark & Sweep

Background

A software application generally needs to allocate resources (i.e memory, database connection, socket, file handle, etc) when it runs in the memory. It requests these resources from Operating System, which is responsible for managing these resources. Once the application is done with an allocated resource, its essential that the same is released back to the Operating System.

For instance if there is a program which uses a Linked List to store data should release the memory allocated dynamically when a node is deleted from it. If the memory is not released it results in a memory leak and thereby resource wastage. We’ll see with a real world example how it results in resource wastage & can lead to performance issues when the application keeps on allocating memory but doesn’t release it once its done with it.

The above code shows a simple LinkedList with a memory leak. When a node is removed from it, it only rearranges the pointers to get rid of the target node but doesn’t explicitly call delete on it. Following is fix for the above code :-

Memory Leak fixed

Let’s take an instance of below code snippet which continuously keeps allocating memory in an infinite loop without releasing it.

The above program was compiled using g++ and executed. The below snapshots displays list of processes sorted by % memory consumption within a span of 30 sec post launching the binary(a.out) containing above code.

Initially the memory consumption which was 5% rises to 9.2%, & subsequently to 13.7 % (last pic).As you can see, the percentage memory consumption keeps increasing if the objects which are no longer in use are not deleted and this eventually results in a crash or slowing down of other applications in the system due to lack of memory.

Garbage Collection

As the name proposes, its a technique used to clean up the garbage, which in case of application software is collection of objects which are not used or needed. In C programming language, it is programmer’s responsibility to allocate and deallocate objects. The language doesn’t provide automatic garbage collection when the program runs in the memory. Hence, for every malloc() or calloc(), there should be a free() called when the object is deleted. This is highly error-prone as the programmer has to consistently keep track of objects which are allocated and deallocated.

Just like C, C++ also doesn’t support automatic garbage collection but it eases the workload on programmer to release resources automatically by providing semantics like RAII, Smart Pointers and referencing counting. The basic concept in RAII and Smart Pointers is that the resources are acquired in the constructor and released in the destructor. The language guarantees that destructors are called when the object goes out of scope and memory is released as soon as the object gets destroyed.

In the above code snippet, autoPtr goes out of scope when allocateMemory() function finishes execution. The object referenced by autoPtr gets deleted in it’s destructor. So in this case, the memory leak is avoided. Techniques like Reference Counting keep track of the number of references to a particular object. The reference count is incremented for every allocation or assignment and as soon as the reference count becomes zero the object is deleted.

Java provides automatic Garbage Collection mechanism so that the programmer doesn’t have to keep track of allocated/deallocated objects like C programming or use wrappers (like autoPtr) to manage objects at runtime in C++. In Java, Garbage Collector is a part of the java run time environment and runs as a separate thread, thereby keeping track of objects which are reachable and deleting objects which are unreachable.

Mark & Sweep Algorithm

This algorithm runs in two phases :-

  • Mark phase- In this phase, objects which are reachable from the program are marked as reachable. The garbage collector will start traversal from all references(on stack, registers, static variables) in the program and visit all the inner references in a Depth First Search (DFS) manner and mark objects as reachable. Every allocated object on the heap has a flag let’s call marked set to false when it is allocated. During the mark phase, this flag is turned to true if the object is reachable.
  • Sweep phase- This phase is used to clean up all the objects which weren’t marked in the Mark phase. The unreachable objects are deleted thereby allowing the program to allocate more objects subsequently.

Let’s quickly walk through a simple example of Mark & Sweep algorithm.

In this example, we are defining two classes Employee & City, further we have a single reference emp in the main function which is used to reference. We create 3 different Employee objects and update the reference every time we create new one.

Mark Phase

As described previously, in the mark phase, the Garbage Collector checks for references in stack, register or static variables in the memory. In the above case, it will find a reference to Employee object with firstName “Tom” after executing line 5. It will further check for all objects reachable from the Employee object. In this case, there is a City object with name “New York” and String (FirstName)object named “Tom” which is accessible from the Employee object. The Garbage Collector traverses each reference recursively. Each time the garbage collector comes across a reference reachable object, the marked flag is set to true. The other two objects will have marked flag set to false by default during initialisation.

After Completion of Mark Phase

Sweep phase

The Garbage Collector will iterate through all the entries in the heap linearly, and find all the objects are unreachable or the one which have marked flag set to false. It will delete all such unreachable objects and the program will continue execution. In this example, the two Employees with name “Jerry” and “Abraham” will get deleted from the heap along with their corresponding cities & FirstName.

After completion of Sweep Phase
// Pseudo Code for Mark 
Mark(root)
If root.marked = false then
root.marked = true
For each v root.references()
Mark(v)

Above code snippet contains pseudo code for the Mark Phase. It performs a DFS traversal on all the inner references and flips the marked flag subsequently.

Sweep()
For each object in heap
If object.marked = true then
object.marked = false
else
heap.release(object)

The sweep phase linearly scans the heap and releases the objects which have marked flag set to false.

Advantages of Mark & Sweep

  • No additional overheads incurred during the execution of algorithm
  • The algorithm handles cyclic references well and never results in an infinite loop

Disadvantages of Mark & Sweep

  • During the execution of GC algorithm, normal program execution is suspended
  • After multiple cycles of Mark & Sweep, reachable objects end up being separated by small unused segments of memory thus leading to fragmentation

In the next post, I’ll describe the memory layout of a Java Heap during execution, and different types of Garbage Collectors.

Senior Software Engineer @Microsoft. Writes about Distributed Systems, Programming Languages & Tech Interviews

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store