Yet Another Servers in Go: Understanding epoll, kqueue, and netpoll
How Go’s standard `net` package handles thousands of connections under high load by leveraging non-blocking I/O through `epoll` (on Linux) or `kqueue` (on BSD/macOS).
Join the DZone community and get the full member experience.
Join For FreeHi there!
This article demystifies how Go’s standard net package handles thousands of connections under high load by leveraging non-blocking I/O through
epoll(on Linux)kqueue(on BSD/macOS)
I’ll explain how Go’s netpoll system efficiently parks and unparks goroutines, and how it compares to other concurrency models.
Part 1. Problems in High-Load Server Environments
When talking about “high load,” we usually refer to situations where a server must handle:
- A large number of simultaneous connections (tens or even hundreds of thousands).
- A high volume of inbound/outbound traffic.
- A high request rate per unit time.
And the key challenges that arise under these conditions include:
1. Number of simultaneously open connections: with heavy traffic, it’s easy to reach tens or hundreds of thousands of active connections. We must efficiently “multiplex” them without performance bottlenecks.
2. Resource constraints: RAM, file descriptors, context-switch time (between threads or processes), CPU time – these can all become bottlenecks.
3. Blocking I/O operations: if each read/write operation (or its equivalent in higher-level libraries) blocks the thread until completion, under high load you can end up with a huge number of “stuck” threads and critical overhead.
Typical Golang Service With Net Server
Go’s standard net package makes writing network applications straightforward. Basically, we start a server like this:
package main
import (
"fmt"
"net"
"log"
)
func main() {
ln, err := net.Listen("tcp", ":8080")
if err != nil {
log.Fatal(err)
}
defer ln.Close()
fmt.Println("Server started on port 8080...")
for {
conn, err := ln.Accept()
if err != nil {
log.Println("Accept error:", err)
continue
}
// Launch a goroutine to handle the connection
go handleConnection(conn)
}
}
The code illustrates the general idea:
1. net.Listen opens a listening socket on the specified address/port.
2. In a loop, Accept waits for incoming connections.
3. Each connection spawns a new goroutine handleConnection.
And Go automatically schedules goroutines across system threads. At first glance, such a server can handle thousands of requests because:
- Goroutines in Go are “lightweight.”
- The standard library already handles non-blocking I/O (under the hood).
However, if the load becomes extremely high, you can see tons of goroutines and leaks. The standard net package has potential bottlenecks:
1. Excessive goroutine creation: if there are many incoming connections, each goroutine (despite being “lightweight”) still consumes memory and CPU when context switching. For hundreds of thousands of connections, this can become problematic.
2. System limits:
- The maximum number of open file descriptors.
- Kernel limits on TCP buffers, queues, etc.
3. Possible blocking on Accept: if Accept is not called quickly enough, a backlog can build up.
4. Timeout configurations: by default, net.Conn does not enforce strict timeouts. With many “idle” connections, the server can become cluttered with unused resources.
Still, in most cases, the net package is not the main problem in high-load scenarios—it’s quite optimized.
Part 2. epoll and kqueue
But what should we try?
Before, let's move deeper!
Unix systems provide mechanisms for asynchronous or non-blocking I/O.
- In Linux epoll is commonly used,
- BSD-like systems often use kqueue.
Both are designed to handle large numbers of file descriptors with minimal overhead.
Quick look at epoll (Linux)
epollis an event notification mechanism for read/write/error events on file descriptors.- A server registers its file descriptors via
epoll_ctland periodically callsepoll_wait. - When data is available on a socket (or the socket is ready to write),
epoll_waitreturns a list of “ready” descriptors. - This allows handling tens or hundreds of thousands of open sockets in a single system call.
And a quick look at kqueue (BSD, macOS, FreeBSD)
kqueueis a similar mechanism, where registered events (read/write, timers, signals, files) are handled viakevent.- It also makes it efficient to multiplex many descriptors.
How This Works in netpoll
Golang has an internal abstraction called “netpoller”. When you call net.Listen, it creates a corresponding file descriptor. This descriptor is registered with the Go netpoller to monitor socket events:
- When a new connection arrives, the Go runtime “wakes up” the goroutine that is blocked on
Accept. - For established connections, the same idea applies: when the socket is ready for reading or writing, the goroutine is unparked and resumes execution.
Thus, from the developer’s perspective, you just write code using conn.Read and conn.Write, while Go automatically parks the goroutine when there’s no data and unparks it when data arrives—all thanks to the netpoll mechanism.
- The netpoller allows thousands (or tens of thousands) of goroutines to efficiently wait on thousands of connections without extra overhead.
- It uses only a few OS threads (typically equal to the number of logical CPUs, plus some spare) that “spin” around epoll wait/kqueue.
Let’s examine a slightly more “advanced” example, where we configure some parameters and handle multiple connections in parallel. The core principle remains:
package main
import (
"context"
"log"
"net/http"
"github.com/cloudwego/netpoll"
)
func main() {
listener, err := netpoll.CreateListener("tcp", ":8081")
if err != nil {
log.Fatalf("Failed to create listener on :8081: %v", err)
}
defer listener.Close()
log.Println("[netpoll] Server is running on :8081")
eventLoop, err := netpoll.NewEventLoop(onConnect)
if err != nil {
log.Fatalf("Failed to create netpoll event loop: %v", err)
}
if err := eventLoop.Serve(listener); err != nil {
log.Fatalf("Event loop Serve error: %v", err)
}
}
// onRequest handles incoming data from a connection, echoes it back, and manages read/write errors.
// It reads all available data from the connection, then writes the same data back to the sender.
// Returns an error if there is a failure during reading or writing.
func onRequest(ctx context.Context, conn netpoll.Connection) error {
reader := conn.Reader()
n := reader.Len()
if n == 0 {
return nil
}
data, err := reader.Next(n)
if err != nil {
log.Printf("[netpoll] Read error from %s: %v", conn.RemoteAddr(), err)
return err
}
_, err = conn.Write(data)
if err != nil {
log.Printf("[netpoll] Write error to %s: %v", conn.RemoteAddr(), err)
return err
}
return nil
}
What’s Under the Hood?
When we call ln.Accept(), the Go runtime registers the socket with epoll/kqueue.
When a new connection arrives, the netpoll inside Go “wakes up” and returns control to the goroutine that was blocked on Accept. Similarly, when calling conn.Read(): if there is no data, the goroutine is parked and the netpoller monitors the socket. When data arrives, that goroutine is unparked and continues.
In my test server with Golang 1.24 and 2 CPU i5, I got the results using pprof:
| test | syscall |
| net | 56% |
| epoll | 83% |
And then?
When there are only a small number of connections, you won’t see a significant difference. The Go runtime with the net package is highly optimized and also uses epoll/kqueue and a “net-poller” under the hood (just in a slightly different way). When there are few requests/connections, the standard net package almost doesn’t differ in speed and can even be faster (due to overhead from the additional library).
netpoll (or similar event-loop solutions) let you avoid creating a goroutine for each connection, instead using one (or a few) threads/“loops” for all connections.
Theoretically, this reduces memory usage and can lower scheduling overhead.
The idea “if a task takes >100ms, offload it to a worker pool” is typically implemented either in advance (knowing it’s an expensive operation) or via a “cooperative” mechanism inside onRequest, which is more complex and somewhat “reinventing the wheel”.
It’s architecturally simpler to offload all heavy work to a pool from the start.
-
In netpoll, if you “pause” in onRequest (for example,
time.Sleep(300ms)or heavy computations), you block all other connections for the duration of that operation. -
In the standard
net, each connection runs in its own goroutine, so sleeping or computing in one goroutine does not stop processing for others. -
Therefore, for high-load scenarios with netpoll, it is usually recommended to offload any lengthy or blocking tasks to a worker pool. This frees up the event loop to continue serving other connections.
If you need maximum performance, dive deeper into:
- OS-level TCP settings.
- Tuning the Go runtime (GC parameters, number of threads, stack sizes, etc.).
Take care!
Opinions expressed by DZone contributors are their own.
Comments