Kubernetes CNI Drivers
Learn how Kubernetes assigns pod IPs using CNI. This guide breaks down how container networking works and walks through building a simple custom CNI plugin.
Join the DZone community and get the full member experience.
Join For FreeEver wondered how we just create a pod in Kubernetes and it gets an IP address magically, can communicate with each other, and host nodes without issues? Networking is not that simple, so how does all this magic work? With this article, I attempt to unwrap the mystery and provide an understanding of the inner workings of CNI (Container Network Interface).
First, let’s start with the why.
The Problem
Before CNI, networking for containers was inconsistent. Different container runtimes had their own custom networking solutions (Docker, rkt, LXC, etc). Each runtime had its own way to assign IP addresses to containers, connect containers to networks, configure routes, DNS, and firewalls. There wasn’t any standard way to perform the aforementioned tasks.
As Kubernetes grew, this became a problem of scale, and so K8 needed a standardized way to set up networking, if not for a standard way :
- You would need to write customer network code for each runtime
- Plugins couldn’t be reused across different platforms
- Multi-vendor, mult-cloud deployment would be fragile.
What Is CNI?
CNI is a combination of specification and library adopted by Cloud Native Computing Foundation (CNCF); it attempts to decouple container runtimes from network implementation. It is a standard in Kubernetes, and by being that, it solves most of the problems stated earlier. While it is a standard for Kubernetes, it works with any container runtime (containerd, CRI-O, rkt, LXC, etc.).
By definition, every CNI plugin needs to implement the same contract, i.e.
- ADD – Called when a container/pod starts.
- DEL – Called when a container/pod stops.
Any container runtime, like Kubernetes, can use CNI without knowing its internals. Plugin authors need to abide by the contract and ensure their plugins are usable.
To list things out, CNI provides the following:
- Standardized interface
- Modular network plugin (Calico, Cilium)
- IPAM – IP address management
I believe in learning by example, so with the fundamentals out of the way, let’s write a simple network plugin and see how things work. As part of this demo, we are going to:
- Create a simple custom network plugin.
- Install that plugin in the KIND (Kubernetes in Docker) cluster.
- Check logs to see things (network interface creation, IP assignment) in action.
Prerequisites
- Docker Desktop installed
- Go installed (go version ≥ 1.18):
brew install go - kubectl installed (kubectl version --client):
brew install kubectl - kind installed (kind version)
Let the Coding Begin!
Set up the Go project.
mkdir cni && cd cni
The full code is present at https://github.com/justramesh2000/cni-demo/tree/main. In this article, I am going to describe the plugin's code, i.e., the add and del function implementations.
Under the hood, CNI plugins just run standard Linux networking commands like ip link, ip addr, bridge, etc., and leverage the kernel netns feature. So the plugin is really just an orchestrator or a command wrapper that executes Linux commands when pods are created or deleted. The main.go file has the majority of the code; let's focus on some of the important aspects.
The Add Function
At a high level, the add function performs the following tasks:
1. Fetches environment variables CNI_NETNS, CNI_IFNAME. Note that these environment variables are injected by K8 when it calls the plugin. CNI_NETNS specifies a path to the pod’s network namespace (Linux namespace, not K8, think of this as a separate network stack, each pod gets its own network stack and is isolated from the host ). CNI_IFNAME specifies the desired interface name inside the pod, usually eth0.
netns := os.Getenv("CNI_NETNS")
ifname := os.Getenv("CNI_IFNAME")
if netns == "" || ifname == "" {
return fmt.Errorf("CNI_NETNS or CNI_IFNAME missing")
}
2. Generate temporary interface names — tempIf is the temporary interface inside the containers netns and hostveth is the interface that stays on the host. We need these to be unique, so we use UUID.
tempIf := "temp0" + uuid.NewString()[:6]
hostVeth := "veth" + uuid.NewString()[:8]
3. Create the veth pair. A veth pair is like a cable connecting two network namespaces (again, a Linux namespace). Packets sent into one end come out of the other. Linux commands are used to create these. ip link add adds a new network interface. Type veth creates a virtual ethernet pair.
cmd := exec.Command("ip", "link", "add", tempIf, "type", "veth", "peer", "name", hostVeth)
if out, err := cmd.CombinedOutput(); err != nil {
return fmt.Errorf("ip link add failed: %v output: %s", err, string(out))
}
4. Move the pod side into container netns. Since there are two sides to the network interface, the pod side interface has to be moved to container netns (network namespace), and at that point, this interface lives inside the pod. The host talks to hostVeth which connects to pod via the veth pair we created.
if out, err := exec.Command("ip", "link", "set", tempIf, "netns", netns).CombinedOutput(); err != nil {
return fmt.Errorf("ip link set netns failed: %v output: %s", err, string(out))
}
5. Rename the interface inside the pod. Kubernetes expect eth0 as interface name, this renaming is to satisfy that.
if out, err := exec.Command("nsenter", "--net="+netns, "ip", "link", "set", tempIf, "name", ifname).CombinedOutput(); err != nil {
return fmt.Errorf("rename inside netns failed: %v output: %s", err, string(out))
}
6. Bring the host interface up. The interface must be up to send/receive packets.
if out, err := exec.Command("ip", "link", "set", hostVeth, "up").CombinedOutput(); err != nil {
return fmt.Errorf("host link up failed: %v output: %s", err, string(out))
}
7. Bring the container interface up.
if out, err := exec.Command("nsenter", "--net="+netns, "ip", "link", "set", ifname, "up").CombinedOutput(); err != nil {
return fmt.Errorf("container link up failed: %v output: %s", err, string(out))
}
8. Assign an IP to the pod. Each pod must have an IP address inside its network namespace.
if out, err := exec.Command("nsenter", "--net="+netns, "ip", "addr", "add", n.IP, "dev", ifname).CombinedOutput(); err != nil {
return fmt.Errorf("assign IP failed: %v output: %s", err, string(out))
}
You might wonder why create a temporary veth instead of creating eth0 directly if Kubernetes needs that. The idea is that, initially, you create the veth pair in the host namespace (Linux namespace) and then move it to the pod/container namespace. More than likely, the host namespace will already have eth0, so a temporary name avoids conflict and failures related to that.
The Del Function
The delete function is deleting the host-side veth interface; the runtime cleans the container side of the interface when you delete a container or pod. The host side has to be explicitly deleted to clean up any network clutter.
Inside the container’s netns, every interface has an iflink in /sys/class/net/<ifname>/iflink, which tells us about the host-side veth index connected to the container interface. On the host, we list all interfaces with ip -o link and match their index to the iflink value. And at the end, we delete the host-side veth. iflink basically has information about other side of veth pair.
Get iflink,
out, err := exec.Command("nsenter", "--net="+netns, "cat", fmt.Sprintf("/sys/class/net/%s/iflink", ifname)).CombinedOutput()
if err != nil {
return fmt.Errorf("failed to read iflink for %s: %v output: %s", ifname, err, string(out))
}
iflink := strings.TrimSpace(string(out))
get all network interfaces on the host,
out, err = exec.Command("bash", "-c", "ip -o link | awk -F': ' '{print $2,$1}'").CombinedOutput()
if err != nil {
return fmt.Errorf("failed to list host interfaces: %v output: %s", err, string(out))
}
match with iflink, and delete:
lines := strings.Split(strings.TrimSpace(string(out)), "\n")
var hostVeth string
for _, l := range lines {
parts := strings.Fields(l)
if len(parts) < 2 {
continue
}
name := parts[0]
idx := parts[1]
if idx == iflink {
hostVeth = name
break
}
}
if hostVeth == "" {
fmt.Fprintf(os.Stderr, "[DEL] No host veth found for container interface %s, skipping\n", ifname)
return nil
}
fmt.Fprintf(os.Stderr, "[DEL] Removing host veth %s (container if %s)\n", hostVeth, ifname)
if err := exec.Command("ip", "link", "del", hostVeth).Run(); err != nil {
return fmt.Errorf("failed to delete host veth %s: %v", hostVeth, err)
}
Create Plugin Binary
In the source code directory, run the following commands (note the architecture, amd64 for Linux).
go mod tidy
GOOS=linux GOARCH=amd64 go build -o demo-cni
How It Plays Out
Create a KIND (Kubernetes in Docker) cluster.
kind create cluster --name cni-demo
During KIND cluster creation, you would notice that CNI was installed as part of cluster creation. KIND is self-sufficient and doesn’t need additional steps to set up the CNI plugin, but this is going to conflict with our local setup, so we will remove this plugin as part of our demo. I will get to that step in a little bit.
The following is what you see when KIND creates a cluster.
✓ Ensuring node image (kindest/node:v1.34.0)
✓ Preparing nodes
✓ Writing configuration
✓ Starting control-plane
✓ Installing CNI // this is what I am talking about
✓ Installing StorageClass
At this point, your cluster should be running.
KIND creates docker containers that act as Kubernetes nodes, by default, it creates a single control-plane node which runs everything needed for the Kubernetes control plane. If you want, you could add more nodes, but a single control plane node is good enough for this demo. The name of the control plane node is going to be cni-demo-control-plane.
To confirm, you could run:
docker ps --filter "name=cni-demo-control-plane"
Kubernetes needs CNI plugin binaries to exist in a specific directory /opt/cni/bin, it also needs a network configuration file to live in a directory /etc/cni/net.d. Let’s first create those directories by running the following command:
docker exec -it cni-demo-control-plane mkdir -p /opt/cni/bin /etc/cni/net.d
We have already built a binary for our plugin. Let's copy the binary into the directory, run the following command
docker cp demo-cni cni-demo-control-plane:/opt/cni/bin/demo-cni
Should see a message like “Successfully copied xMB to kind-control-plane:/opt/cni/bin/demo-cni.”
Let’s also copy the network configuration file to its appropriate location by running the following command.
docker cp 10-demo-cni-config.json cni-demo-control-plane:/etc/cni/net.d/10-demo-cni-config.json
Note that you might be curious about the name of the config file, why does it have 10 in it’s name, the reason is that you may have different network plugins and the configurations are looked at in a lexicographical order, so this is a hack to ensure our custom config gets priority.
Confirm the files exist by running the following command.
docker exec -it cni-demo-control-plane ls -l /etc/cni/net.d/
total 8
-rw-r--r-- 1 501 dialout 98 Nov 8 16:55 10-demo-cni-config.json
-rw-r--r-- 1 root root 409 Nov 8 20:17 10-kindnet.conflist
Wait a sec, so our config file is there, but what is that additional file 10-kindnet.conflist? That is the configuration for default KIND CNI plugin called Kindnet, you can view the content by running the following command.
docker exec -it cni-demo-control-plane cat /etc/cni/net.d/10-kindnet.conflist
You will notice that it is very similar to our configuration file:
{
"cniVersion": "0.4.0",
"name": "demo-net",
"type": "demo-cni",
"ip": "10.120.12.10/24"
}
Let's unwrap the configuration file:
- cniVersion – field specifies which CNI spec version the plugin follows.
- name field – is the name of the network that Kubernetes is going to use when calling the plugin.
- type field – tells runtime which binary to invoke. Based on this, K8 will look for
/opt/cni/bin/demo-cni - ip field – is the IP address that the plugin will assign to the pod. In a more advanced plugin, this will be dynamically allocated from a range.
Remember, previously I mentioned that an existing plugin config could lead to conflict, to avoid any conflict with KINDnet plugin, let’s remove it (note, this is just for demo purposes, please don’t attempt this in a real environment).
docker exec -it cni-demo-control-plane rm -f /etc/cni/net.d/10-kindnet.conflist
Make the custom plugin binary executable by setting the permission. Run the following command.
docker exec -it cni-demo-control-plane chmod +x /opt/cni/bin/demo-cni
At this point, we are ready to create pods and confirm that our plugin assigns an IP address to each pod. We can use the following YAML file.
apiVersion: v1
kind: Pod
metadata:
name: nginx-pod
labels:
app: nginx
spec:
containers:
- name: nginx-container
image: nginx:latest
ports:
- containerPort: 80
Apply the YAML using the following command.
Kubectl apply -f demo-cni-pod.yaml
If the pod is running, it’s a sign and a good one that says our plugin performed its Add action successfully.
Check the pod IP using the following command; it should be the IP we mentioned in the configuration JSON file.
kubectl get pod nginx-pod -o wide
Check K8 logs.
kubectl logs -n kube-system nginx-pod
Let’s perform some more cool checks to see the interface. First, get the container information for the pod.
kubectl get pod nginx-pod -o jsonpath='{.status.containerStatuses[0].containerID}'
Let’s exec into the KIND Docker container.
docker exec -it cni-demo-control-plane bash
We need the process ID of our pod’s container. To do that, we will follow the commands within the KIND Docker container (yeah, we are dealing with nested containers; KIND uses Docker containers as nodes, thus giving us the ability to create multiple nodes. And our real containers/pods run as nested.
ctr -n k8s.io tasks list
Correlate the containerid obtained earlier, find PID, and run the following command.
nsenter -t <correlated PID> -n ip addr
Look for the eth0 interface, that is, the one created by our custom plugin. Also note the IP address assigned, it's the same as our configuration JSON. On my terminal, it looks like (16).
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
3: gre0@NONE: <NOARP> mtu 1476 qdisc noop state DOWN group default qlen 1000
link/gre 0.0.0.0 brd 0.0.0.0
4: gretap0@NONE: <BROADCAST,MULTICAST> mtu 1462 qdisc noop state DOWN group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
5: erspan0@NONE: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
6: ip_vti0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
7: ip6_vti0@NONE: <NOARP> mtu 1428 qdisc noop state DOWN group default qlen 1000
link/tunnel6 :: brd :: permaddr a613:aa95:9024::
8: sit0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
link/sit 0.0.0.0 brd 0.0.0.0
9: ip6tnl0@NONE: <NOARP> mtu 1452 qdisc noop state DOWN group default qlen 1000
link/tunnel6 :: brd :: permaddr aeae:4f13:2153::
10: ip6gre0@NONE: <NOARP> mtu 1448 qdisc noop state DOWN group default qlen 1000
link/gre6 :: brd :: permaddr be1a:ed07:53bd::
16: (this is the eth0) eth0@if15: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether ce:54:6e:91:c9:b2 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.120.12.10/24 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::cc54:6eff:fe91:c9b2/64 scope link
valid_lft forever preferred_lft forever
If, for some reason, the pod doesn’t move to running status, you can check logs from within the KIND Docker container using the following command.
journalctl -u kubelet -n 10
Conclusion
What appears as "magic" when a pod gets an IP and starts communicating is actually a set of well-defined, standardized steps driven by CNI. Understanding CNI at this level demystifies Kubernetes networking and gives a foundation to debug issues, write custom plugins, or even contribute to existing open-source CNI projects.
Opinions expressed by DZone contributors are their own.
Comments