Golang: Is It a Memory Leak?

Figuring out where memory went in Go is tricky. This article explores lessons learned and how to pinpoint the exact allocations in a 3rd party library.

Alexander Komyagin

Nov. 26, 24 · Tutorial

Likes (7)

Comment

Save

2.6K Views

This is an illustrated walkthrough for how to find the cause of high memory usage in Go programs with standard tools, based on our recent experience. There's plenty of information online, but it still took us considerable time and effort to find what works and what doesn't. Hopefully, this post will save time for everyone else.

Context

At Adiom, we're building an open-source tool for online database migration and real-time replication, called dsync. Our primary focus is on ease of use, robustness, and speed. One of the areas that we're focusing on heavily is data integrity, as Enterprise customers want to ensure data consistency between the source and the destination and know the differences, if there are any.

To that end, we have recently come across Merkle Search Trees (MST) that have some interesting properties. It may be worth a whole different post, but for those interested now, here's the link to the original paper. In theory, MSTs should allow us to efficiently and cheaply diff the source and destination datasets. Naturally, we wanted to experiment with that, so we added a dsync verify <source> <destination> command as a POC leveraging an open-source reference implementation from GitHub: jrhy/mast. It worked like a charm on "toy" datasets from 100 to 1000 records. But in one of our larger tests, to our surprise, the binary consumed 3.5GB RAM for a million records on both sides. While not a whole lot in absolute numbers, this was orders of magnitude higher than what we were expecting — maybe 10s or 100s of megabytes — because we only stored the record ID (8 bytes) and a hash (8 bytes) for each record.

Our first thought was a memory "leak." Similar to Java, Go manages memory for you and has a garbage collector based on open object references. There's no concept of leaks as such, but rather unexpected or undesirable accumulation of objects that the code kept references to. Unlike Java though, Go doesn't run as byte code on top of a JVM, so it doesn't know which specific objects are accumulating in memory.

Part 1: Run Garbage Collector

In Go, you can forcefully trigger garbage collection hoping that the memory is consumed by objects that are no longer needed:

    Go
   
 

   func main() {
	// App code

	log.Println("Triggering garbage collection...")
	runtime.GC()
	printMemUsage()
  
  	// More app code
}

func printMemUsage() {
	var m runtime.MemStats
	runtime.ReadMemStats(&m)
	log.Printf("Alloc = %v MiB", bToMb(m.Alloc))
	log.Printf("TotalAlloc = %v MiB", bToMb(m.TotalAlloc))
	log.Printf("Sys = %v MiB", bToMb(m.Sys))
	log.Printf("NumGC = %v\n", m.NumGC)
}
  

Unfortunately, that only freed up a few hundred MBs for us, so it wasn't very effective.

Part 2: Examine the Memory

In theory, we should be able to do the coredump of the process (dump the raw memory) and then examine it directly. Here are some relevant resources for that:

Obtaining the core dump from a running process: Go Wiki: CoreDumpDebugging
Specific instructions for Mac OS where gcore doesn't work right away: "How to Dump a Core File on MacOS"
The viewcore tool to parse the coredump: GitHub: golang/debug

Unfortunately, the viewcore didn't work with our coredump, presumably because we're doing development on a Mac laptop with Darwin architecture:

    Shell
   
   user@MBP viewcore % ./viewcore /cores/dsync-52417-20241114T234221Z html
failed to parse core: bad magic number '[207 250 237 254]' in record at byte 0x0

If you're using Linux, you should give it a shot.

Part 3: Profiling and Debugging

We had to resort to investigative debugging instead. The most common sources of memory leaks in Go are:

Accumulating and not closing resources such as network connections, files, etc.
Global variables
Slices with large backing arrays
Lingering Go routines that don't finish

After a brief examination of the codebase, we didn't see any smoking guns. Our next step was to use pprof. Pprof lets us sample a running process and obtain a heap dump that can later be examined.

Pprof Setup

Pprof runs as a webserver in your application. Adding it is easy:

    Go
   
 

   import (
    "net/http"
    _ "net/http/pprof" // Import pprof package for profiling
)

func main() {
    go func() {
        log.Println(http.ListenAndServe("localhost:8081", nil)) // Start pprof server
    }()
    // Your application logic here
}
  

In our case, it's accessible on port 8081.

When the application is running and shows excessive memory consumption, we can collect heap dumps:

    Shell
   
   curl -s http://localhost:8081/debug/pprof/heap > /tmp/heap1.out

I recommend collecting a few just to have a few different samples. Note that pprof actually samples information about allocations, so it's not going to be a 100% representation. By default, it's 1 sample per 512kb allocated.

From here on out, it's really smooth sailing.

Memory Profiling

We examined inuse_space and inuse_objects with pprof that track the active space and the number of objects respectively:

    Shell
   
   go tool pprof --inuse_objects /tmp/heap1.out
go tool pprof --inuse_space /tmp/heap1.out

The in-use space was particularly fruitful. First, we tried to visualize the memory allocation using the web command, which opens a new web browser window.

    Shell
   
 

   user@MBP viewcore % go tool pprof --inuse_space /tmp/heap1.out
File: dsync
Type: inuse_space
Time: Nov 14, 2024 at 5:34pm (PST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) web
(pprof)
  

The thicker the link on the graph, the more memory went to that node (and that node is bigger, too). An alternative way to do this is to use the top5 command:

    Shell
   
 

   user@MBP viewcore % go tool pprof --inuse_space /tmp/heap1.out
File: dsync
Type: inuse_space
Time: Nov 14, 2024 at 5:34pm (PST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top5
Showing nodes accounting for 1670.40MB, 99.37% of 1680.96MB total
Dropped 43 nodes (cum <= 8.40MB)
Showing top 5 nodes out of 7
      flat  flat%   sum%        cum   cum%
 1471.89MB 87.56% 87.56%  1471.89MB 87.56%  github.com/jrhy/mast.split
      78MB  4.64% 92.20%   851.79MB 50.67%  github.com/adiom-data/dsync/internal/app.(*mastVerify).Run.func1.processMast.1
      77MB  4.58% 96.78%   826.13MB 49.15%  github.com/adiom-data/dsync/internal/app.(*mastVerify).Run.func2.processMast.1
   43.50MB  2.59% 99.37%    43.50MB  2.59%  bytes.Clone
         0     0% 99.37%  1677.92MB 99.82%  github.com/adiom-data/dsync/internal/app.(*source).ProcessSource.func2
(pprof)
  

These told us that the excessive memory was allocated in the mast.split function, and that it is a lot of 4.75kB objects.

While we can't see what those objects are, we can see where in the code they were allocated using the list command:

    Shell
   
 

   user@MBP viewcore % go tool pprof --inuse_space /tmp/heap1.out
File: dsync
Type: inuse_space
Time: Nov 14, 2024 at 5:34pm (PST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top5
Showing nodes accounting for 1670.40MB, 99.37% of 1680.96MB total
Dropped 43 nodes (cum <= 8.40MB)
Showing top 5 nodes out of 7
      flat  flat%   sum%        cum   cum%
 1471.89MB 87.56% 87.56%  1471.89MB 87.56%  github.com/jrhy/mast.split
      78MB  4.64% 92.20%   851.79MB 50.67%  github.com/adiom-data/dsync/internal/app.(*mastVerify).Run.func1.processMast.1
      77MB  4.58% 96.78%   826.13MB 49.15%  github.com/adiom-data/dsync/internal/app.(*mastVerify).Run.func2.processMast.1
   43.50MB  2.59% 99.37%    43.50MB  2.59%  bytes.Clone
         0     0% 99.37%  1677.92MB 99.82%  github.com/adiom-data/dsync/internal/app.(*source).ProcessSource.func2
(pprof) list github.com/jrhy/mast.split
Total: 1.64GB
ROUTINE ======================== github.com/jrhy/mast.split in /Users/alexander/go/pkg/mod/github.com/jrhy/mast@v1.2.32/lib.go
    1.44GB     1.54GB (flat, cum) 93.57% of Total
         .          .     82:func split(ctx context.Context, node *mastNode, key interface{}, mast *Mast) (leftLink, rightLink interface{}, err error) {
         .          .     83:	var splitIndex int
         .          .     84:	for splitIndex = 0; splitIndex < len(node.Key); splitIndex++ {
         .          .     85:		var cmp int
         .          .     86:		cmp, err = mast.keyOrder(node.Key[splitIndex], key)
         .          .     87:		if err != nil {
         .          .     88:			return nil, nil, fmt.Errorf("keyCompare: %w", err)
         .          .     89:		}
         .          .     90:		if cmp == 0 {
         .          .     91:			panic("split shouldn't need to handle preservation of already-present key")
         .          .     92:		}
         .          .     93:		if cmp > 0 {
         .          .     94:			break
         .          .     95:		}
         .          .     96:	}
         .          .     97:	var tooBigLink interface{} = nil
       8MB        8MB     98:	left := mastNode{
         .          .     99:		Node{
  416.30MB   416.30MB    100:			make([]interface{}, 0, cap(node.Key)),
  458.51MB   458.51MB    101:			make([]interface{}, 0, cap(node.Value)),
  427.87MB   427.87MB    102:			make([]interface{}, 0, cap(node.Link)),
         .          .    103:		},
         .          .    104:		true, false, nil, nil,
         .          .    105:	}
         .          .    106:	left.Key = append(left.Key, node.Key[:splitIndex]...)
         .          .    107:	left.Value = append(left.Value, node.Value[:splitIndex]...)
         .          .    108:	left.Link = append(left.Link, node.Link[:splitIndex+1]...)
         .          .    109:
         .          .    110:	// repartition the left and right subtrees based on the new key
         .          .    111:	leftMaxLink := left.Link[len(left.Link)-1]
         .          .    112:	if leftMaxLink != nil {
         .          .    113:		var leftMax *mastNode
         .          .    114:		leftMax, err = mast.load(ctx, leftMaxLink)
         .          .    115:		if mast.debug {
         .          .    116:			fmt.Printf("  splitting leftMax, node with keys: %v\n", leftMax.Key)
         .          .    117:		}
         .          .    118:		if err != nil {
         .          .    119:			return nil, nil, fmt.Errorf("loading leftMax: %w", err)
         .          .    120:		}
         .    91.39MB    121:		leftMaxLink, tooBigLink, err = split(ctx, leftMax, key, mast)
         .          .    122:		if err != nil {
         .          .    123:			return nil, nil, fmt.Errorf("splitting leftMax: %w", err)
         .          .    124:		}
         .          .    125:		if mast.debug {
         .          .    126:			fmt.Printf("  splitting leftMax, node with keys: %v is done: leftMaxLink=%v, tooBigLink=%v\n", leftMax.Key, leftMaxLink, tooBigLink)
         .          .    127:		}
         .          .    128:		left.Link[len(left.Link)-1] = leftMaxLink
         .          .    129:	}
         .          .    130:	if !left.isEmpty() {
         .          .    131:		leftLink, err = mast.store(&left)
         .          .    132:		if err != nil {
         .          .    133:			return nil, nil, fmt.Errorf("store left: %w", err)
         .          .    134:		}
         .          .    135:	}
       1MB        1MB    136:	right := mastNode{
         .          .    137:		Node{
   54.24MB    54.24MB    138:			make([]interface{}, 0, cap(node.Key)),
   56.75MB    56.75MB    139:			make([]interface{}, 0, cap(node.Value)),
   49.22MB    49.22MB    140:			make([]interface{}, 0, cap(node.Link)),
         .          .    141:		},
         .          .    142:		true, false, nil, nil,
         .          .    143:	}
         .          .    144:	right.Key = append(right.Key, node.Key[splitIndex:]...)
         .          .    145:	right.Value = append(right.Value, node.Value[splitIndex:]...)
         .          .    146:	right.Link = append(right.Link, node.Link[splitIndex:]...)
         .          .    147:	right.Link[0] = tooBigLink
         .          .    148:
         .          .    149:	rightMinLink := right.Link[0]
         .          .    150:	if rightMinLink != nil {
         .          .    151:		var rightMin *mastNode
         .          .    152:		rightMin, err = mast.load(ctx, rightMinLink)
         .          .    153:		if err != nil {
         .          .    154:			return nil, nil, fmt.Errorf("load rightMin: %w", err)
         .          .    155:		}
         .          .    156:		var tooSmallLink interface{}
         .     9.54MB    157:		tooSmallLink, rightMinLink, err = split(ctx, rightMin, key, mast)
         .          .    158:		if err != nil {
         .          .    159:			return nil, nil, fmt.Errorf("split rightMin: %w", err)
         .          .    160:		}
         .          .    161:		if mast.debug {
         .          .    162:			fmt.Printf("  splitting rightMin, node with keys %v, is done: tooSmallLink=%v, rightMinLink=%v",
(pprof)
  

Now we could clearly see what those objects were: arrays with preallocated capacity. These store the []interface{} type, which is 2 words in memory or 16 bytes on our machine (64-bit system). It must be a large number, at least on average (4.75kB / 16 bytes ~= 300). The mystery is half-solved. We weren't sure if that capacity was being used or not. So we used delve, which is a debugger for Go.

Debugging

A simple code inspection showed that those objects are part of tree nodes. To find out what they actually looked like, we used VS Code debug mode to attach to the running process and inspect these objects:

Use the configuration to attach to the running process with the following launch.json:

    JSON
   
 

   {
    // Use IntelliSense to learn about possible attributes.
    // Hover to view descriptions of existing attributes.
    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
    "version": "0.2.0",
    "configurations": [
        
        {
            "name": "Attach to Process",
            "type": "go",
            "request": "attach",
            "mode": "local",
            "processId": <PID of the running process>
        }
    ]
}
  

2. Add a breakpoint in our verification loop where the tree objects are accessed.

This immediately let us see that the array length was often much less than the preallocated size (cap) that was set:

As a probe, I created a forked implementation that had no caps set in node allocation and the memory usage immediately dropped to 600MB, which is a 6x win.

Mystery solved!

Golang Open source garbage collection Go (programming language) Object (computer science)

Opinions expressed by DZone contributors are their own.

Related

Trending