DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Event-Driven Pipelines With Apache Pulsar and Go
  • Clean Code: Concurrency Patterns, Context Management, and Goroutine Safety, Part 5
  • Clean Code: Package Architecture, Dependency Flow, and Scalability, Part 4
  • Clean Code: Interfaces in Go — Why Small Is Beautiful, Part 3

Trending

  • Product-Led Software Delivery: Intelligent Platforms for DevOps at Scale
  • Docker Hardened Images Are Free Now — Here's What You Still Need to Build
  • Zero-Downtime Deployments for Java Apps on Kubernetes
  • S3 Vectors: How to Build a RAG Without a Vector Database
  1. DZone
  2. Coding
  3. Languages
  4. Yet Another Servers in Go: Understanding epoll, kqueue, and netpoll

Yet Another Servers in Go: Understanding epoll, kqueue, and netpoll

How Go’s standard `net` package handles thousands of connections under high load by leveraging non-blocking I/O through `epoll` (on Linux) or `kqueue` (on BSD/macOS).

By 
Ilia Ivankin user avatar
Ilia Ivankin
·
Aug. 21, 25 · Tutorial
Likes (2)
Comment
Save
Tweet
Share
6.7K Views

Join the DZone community and get the full member experience.

Join For Free

Hi there!

This article demystifies how Go’s standard net package handles thousands of connections under high load by leveraging non-blocking I/O through 

  • epoll (on Linux)
  • kqueue (on BSD/macOS)

I’ll explain how Go’s netpoll system efficiently parks and unparks goroutines, and how it compares to other concurrency models.

Part 1. Problems in High-Load Server Environments

When talking about “high load,” we usually refer to situations where a server must handle:

  1. A large number of simultaneous connections (tens or even hundreds of thousands).
  2. A high volume of inbound/outbound traffic.
  3. A high request rate per unit time.

And the key challenges that arise under these conditions include:

1. Number of simultaneously open connections: with heavy traffic, it’s easy to reach tens or hundreds of thousands of active connections. We must efficiently “multiplex” them without performance bottlenecks.

2. Resource constraints: RAM, file descriptors, context-switch time (between threads or processes), CPU time – these can all become bottlenecks.

3. Blocking I/O operations: if each read/write operation (or its equivalent in higher-level libraries) blocks the thread until completion, under high load you can end up with a huge number of “stuck” threads and critical overhead.

Typical Golang Service With Net Server

Go’s standard net package makes writing network applications straightforward. Basically, we start a server like this:

Go
 
package main

import (
    "fmt"
    "net"
    "log"
)

func main() {
    ln, err := net.Listen("tcp", ":8080")
    if err != nil {
        log.Fatal(err)
    }
    defer ln.Close()

    fmt.Println("Server started on port 8080...")

    for {
        conn, err := ln.Accept()
        if err != nil {
            log.Println("Accept error:", err)
            continue
        }

        // Launch a goroutine to handle the connection
        go handleConnection(conn)
    }
}


The code illustrates the general idea:

1. net.Listen opens a listening socket on the specified address/port.

2. In a loop, Accept waits for incoming connections.

3. Each connection spawns a new goroutine handleConnection.

And Go automatically schedules goroutines across system threads. At first glance, such a server can handle thousands of requests because:

  1. Goroutines in Go are “lightweight.”
  2. The standard library already handles non-blocking I/O (under the hood).

However, if the load becomes extremely high, you can see tons of goroutines and leaks. The standard net package has potential bottlenecks:

1. Excessive goroutine creation: if there are many incoming connections, each goroutine (despite being “lightweight”) still consumes memory and CPU when context switching. For hundreds of thousands of connections, this can become problematic.

2. System limits:

    - The maximum number of open file descriptors.

    - Kernel limits on TCP buffers, queues, etc.

3. Possible blocking on Accept: if Accept is not called quickly enough, a backlog can build up.

4. Timeout configurations: by default, net.Conn does not enforce strict timeouts. With many “idle” connections, the server can become cluttered with unused resources.

Still, in most cases, the net package is not the main problem in high-load scenarios—it’s quite optimized. 

Part 2. epoll and kqueue

But what should we try?

Before, let's move deeper!

Unix systems provide mechanisms for asynchronous or non-blocking I/O.

  1. In Linux epoll is commonly used, 
  2. BSD-like systems often use kqueue.

Both are designed to handle large numbers of file descriptors with minimal overhead.

Quick look at epoll (Linux)

  1. epoll is an event notification mechanism for read/write/error events on file descriptors.
  2. A server registers its file descriptors via epoll_ctl and periodically calls epoll_wait.
  3. When data is available on a socket (or the socket is ready to write), epoll_wait returns a list of “ready” descriptors.
  4. This allows handling tens or hundreds of thousands of open sockets in a single system call.

And a quick look at kqueue (BSD, macOS, FreeBSD)

  1. kqueue is a similar mechanism, where registered events (read/write, timers, signals, files) are handled via kevent.
  2. It also makes it efficient to multiplex many descriptors.

How This Works in netpoll

Golang has an internal abstraction called “netpoller”. When you call net.Listen, it creates a corresponding file descriptor. This descriptor is registered with the Go netpoller to monitor socket events:

  1. When a new connection arrives, the Go runtime “wakes up” the goroutine that is blocked on Accept.
  2. For established connections, the same idea applies: when the socket is ready for reading or writing, the goroutine is unparked and resumes execution.

Thus, from the developer’s perspective, you just write code using conn.Read and conn.Write, while Go automatically parks the goroutine when there’s no data and unparks it when data arrives—all thanks to the netpoll mechanism.

- The netpoller allows thousands (or tens of thousands) of goroutines to efficiently wait on thousands of connections without extra overhead.

- It uses only a few OS threads (typically equal to the number of logical CPUs, plus some spare) that “spin” around epoll wait/kqueue.

Let’s examine a slightly more “advanced” example, where we configure some parameters and handle multiple connections in parallel. The core principle remains:

Go
 
package main

import (
	"context"
	"log"
	"net/http"

	"github.com/cloudwego/netpoll"
)

func main() {
	listener, err := netpoll.CreateListener("tcp", ":8081")
	if err != nil {
		log.Fatalf("Failed to create listener on :8081: %v", err)
	}
	defer listener.Close()

	log.Println("[netpoll] Server is running on :8081")

	eventLoop, err := netpoll.NewEventLoop(onConnect)
	if err != nil {
		log.Fatalf("Failed to create netpoll event loop: %v", err)
	}

	if err := eventLoop.Serve(listener); err != nil {
		log.Fatalf("Event loop Serve error: %v", err)
	}

}

// onRequest handles incoming data from a connection, echoes it back, and manages read/write errors.
// It reads all available data from the connection, then writes the same data back to the sender.
// Returns an error if there is a failure during reading or writing.
func onRequest(ctx context.Context, conn netpoll.Connection) error {
	reader := conn.Reader()
  
	n := reader.Len()
	if n == 0 {
		return nil
	}

	data, err := reader.Next(n)
	if err != nil {
		log.Printf("[netpoll] Read error from %s: %v", conn.RemoteAddr(), err)

		return err
	}

	_, err = conn.Write(data)
	if err != nil {
		log.Printf("[netpoll] Write error to %s: %v", conn.RemoteAddr(), err)

		return err
	}

	return nil
}


What’s Under the Hood? 

When we call ln.Accept(), the Go runtime registers the socket with epoll/kqueue. 

When a new connection arrives, the netpoll inside Go “wakes up” and returns control to the goroutine that was blocked on Accept. Similarly, when calling conn.Read(): if there is no data, the goroutine is parked and the netpoller monitors the socket. When data arrives, that goroutine is unparked and continues.

In my test server with Golang 1.24 and 2 CPU i5, I got the results using pprof:

test syscall
net 56%
epoll 83%

Profiling (pprof) and the proportion of time spent in system calls (syscall). A high proportion of time (for example, 40–80%) spent in syscalls for netpoll does not by itself indicate “slowness.” It reflects the “concentrated” event polling model (epoll_wait/kqueue), where the thread blocks in a system call until I/O events occur. This is not a sign of "slow" operation, but rather an inverse indicator that the code is efficiently spending resources on I/O multiplexing without introducing goroutine overhead.

And then?

When there are only a small number of connections, you won’t see a significant difference. The Go runtime with the net package is highly optimized and also uses epoll/kqueue and a “net-poller” under the hood (just in a slightly different way). When there are few requests/connections, the standard net package almost doesn’t differ in speed and can even be faster (due to overhead from the additional library).

netpoll (or similar event-loop solutions) let you avoid creating a goroutine for each connection, instead using one (or a few) threads/“loops” for all connections. 

Theoretically, this reduces memory usage and can lower scheduling overhead.

The idea “if a task takes >100ms, offload it to a worker pool” is typically implemented either in advance (knowing it’s an expensive operation) or via a “cooperative” mechanism inside onRequest, which is more complex and somewhat “reinventing the wheel”.

It’s architecturally simpler to offload all heavy work to a pool from the start.

Blocking operations within the event loop:
  1. In netpoll, if you “pause” in onRequest (for example, time.Sleep(300ms) or heavy computations), you block all other connections for the duration of that operation.
  2. In the standard net, each connection runs in its own goroutine, so sleeping or computing in one goroutine does not stop processing for others.
  3. Therefore, for high-load scenarios with netpoll, it is usually recommended to offload any lengthy or blocking tasks to a worker pool. This frees up the event loop to continue serving other connections.

 If you need maximum performance, dive deeper into:

  1.     OS-level TCP settings.
  2.     Tuning the Go runtime (GC parameters, number of threads, stack sizes, etc.).

Take care!

Event loop Kqueue Go (programming language)

Opinions expressed by DZone contributors are their own.

Related

  • Event-Driven Pipelines With Apache Pulsar and Go
  • Clean Code: Concurrency Patterns, Context Management, and Goroutine Safety, Part 5
  • Clean Code: Package Architecture, Dependency Flow, and Scalability, Part 4
  • Clean Code: Interfaces in Go — Why Small Is Beautiful, Part 3

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook