Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

IoT Platform Design Doc: Compacting GC

DZone's Guide to

IoT Platform Design Doc: Compacting GC

See how the Cesanta crew handles fragmentation for their Mongoose IoT platform — with a new twist on an old compaction trick

· IoT Zone
Free Resource

Discover why Bluetooth mesh is the next evolution of IoT solutions. Download the mesh overview.

One of the things that make programming today so much more approachable is the the automatic memory management feature of many high-level programming languages. However, exactly this piece is often missing from embedded platforms, and there is a reason for that: the well-established techniques used by modern Garbage Collection (GC) systems are designed with radically different trade-offs than what embedded systems with a few KB of RAM dictate.

A good GC for embedded platforms is the key for enabling high-level languages, such as JavaScript, and to make embedded programming more approachable to the masses.

The following document shows the details behind the design of one of the parts of the V7 Garbage Collector. Here's a bit of context first.

A key requirement that we share with other, non-embedded, implementations of a Garbage Collector, is reducing memory fragmentation. This is particularly a problem when managing objects of variable length, such as strings.

One of the most powerful approaches that solves the fragmentation problem in the context of fully managed heaps is a Compacting Garbage Collector. The idea is that instead of just automatically reclaiming the unused memory ranges we also move the used memory in such a way that it ends up being grouped all together without any holes. But because we're moving blocks of memory that are actively used by the program, we need to be able to update all references to their new locations.

It turns out that updating all the references requires us to store some bookkeeping information in memory, and on embedded platforms, literally every byte counts.

This is really nothing new. The chosen collector algorithm is Morris78. The funny story is that I found out about the prior work only after reinventing it. Once I knew what to look for, I found this obscure paper from '78, a year older than yours truly.

As always, our design docs are not product documentation, but rather a view into our decision making. You can check out Mongoose IoT Platform to see how the doc was implemented.

Compacting GC

Objective

To choose and describe a low space overhead and low time complexity Garbage Collector for variable sized objects such as strings.

Background

We need a space-efficient memory layout and garbage collection technique for strings and other variable length items.

Overview

Mark-compact collector, initially described in Morris78.

Let’s have a heap containing variable-length chunks. Each chunk will contain a length field at the beginning (the length field can be implemented with variable length encoding).

Chunks are allocated by bumping a pointer, which starts at the beginning of the buffer, and recording the length at the beginning of the newly allocated block.

Thus, all allocated chunks are at the beginning of the buffer, while the rest of the buffer contains unallocated space. The mbuf structure can be used to represent such a buffer.

The key choice for reducing code complexity and space usage is to free chunks by compacting instead of maintaining a free list (subject to external fragmentation).

We choose to make the length field part of the chunk, because we need it for our string representation anyway (see more here). A variation of this scheme could keep the length field before the allocated region as other allocators do.

This type of heap is subject to a minimum chunk size equal to the size of a pointer. Given that we can store six char strings inline, the minimum size of a string kept in this kind of heap would be 8 bytes anyway (one for length and seven for data), matching the size of the largest supported pointer size. In contrast, table-based compaction requires making the minimum chunk size at least two pointer sizes.

For simplicity, let’s show what happens when the heap contains only raw data and thus all incoming pointers are held elsewhere (e.g. in the fixed-width cell pools):

IoT Platform Design Doc GC

Each chunk can be pointed to by several values, and each of them thus contains the same payload, the pointer to the chunk. When we’ll need to move the chunk, all those values have to be updated. There is an easy way to keep track of all the values to be updated, which exploits the fact that there is one value that has to be preserved: the first sizeof(void *) bytes of the chunk. Let’s call this first byte the chunk head. (It contains the string size plus some initial bytes of the string payload in case of variable length len encoding)

While traversing the object graph, each time a pointer value is tagged to be a pointer to this kind of heap (e.g. a string pointer), we put the value of the chunk head inside the val_t payload, and put a pointer to the val_t location into the chunk head:

IoT Platform Design Doc GC

The next time a pointer to that chunk is encountered during graph traversal, we do the same thing again and again:

IoT Platform Design Doc GC

Until, at the end of the traversal, the chunk heads for all reachable chunks point to a linked list of val_t locations to be updated with the new address of the chunk once the chunk is moved.

We also need to distinguish reachable chunks from dead chunks somehow. We can distinguish the length field in the chunk head from a val_t pointer by using a tag bit. Details on how to do that with varint length encoding are in the Detailed Design section below.

The compaction phase starts at the first chunk. As all chunks until the end of the allocated area were valid before the collection, each of them contains a valid length field which can be used to skip to the next chunk. The mbuf len is set to 0 and the old mbuf len is recorded.

Once a marked chunk (one that has a val_t pointer in it’s chunk head) is encountered, it gets memmoved to the end of the mbuf (now starting from 0) and the linked list traversed until the len value sentinel is reached, and values are updated with the address of the move destination. Finally, the sizeof(void*) bytes are restored in the chunk head and the scan proceeds to the next chunk until the old mbuf length is reached.

Detailed Design

Varint len

If we constrain the chunk length to never be 0 (which is okay for strings because small strings are inlined in values), we can use the first 0 byte to mark fixup addresses.

We just need to store the pointer after the leading 0 byte.

64-bit pointers will be stored as 48-bit values and sign extended as we do in v7_value_to_pointer:

struct {
    uint64_t s:48;
} h;

Encode:

h.s = p;
memcpy(chunk +1,&h.s,6);

Decode:

memcpy(&h.s, chunk +1,6);
p = h.s

Pointers Inside Chunks

If this type of heap is used to store chunks which contain pointers… TBD.

Caveats

  • Chunk size is explicit and part of the “user” accessible payload.
  • the allocator requires a minimum chunk size and it’s visible to the “user.”

Both sides of this issue are fine because the user is the VM, and we like to be able to use the chunk content as a `v7_str` value.

Bloat Analysis

It turned out to be 65 lines of code! It's not been actively obfuscated or anything, it's just ... dense; the densest and most difficult to read part of V7.

References

http://cs.ucsb.edu/~ckrintz/racelab/gc/papers/morris-compaction.pdf

Take a deep dive into Bluetooth mesh. Read the tech overview and discover new IoT innovations.

Topics:
garbage collector ,fragmentation ,iot ,compaction

Published at DZone with permission of Marko Mikulicic, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}