DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Docker + .NET APIs: Simplifying Deployment and Scaling
  • Running Serverless Service as Serverful
  • Mastering Node.js: The Ultimate Guide
  • Keep Your Application Secrets Secret

Trending

  • Understanding and Mitigating IP Spoofing Attacks
  • Enhancing Security With ZTNA in Hybrid and Multi-Cloud Deployments
  • A Simple, Convenience Package for the Azure Cosmos DB Go SDK
  • The Role of Functional Programming in Modern Software Development
  1. DZone
  2. Software Design and Architecture
  3. Containers
  4. Container Checkpointing in Kubernetes With a Custom API

Container Checkpointing in Kubernetes With a Custom API

This article discusses using a Kubernetes sidecar for container checkpointing: build, push, deploy to K8s, and trigger checkpoints via API for state management.

By 
VARUNREDDY DEVIREDDY user avatar
VARUNREDDY DEVIREDDY
·
Feb. 18, 25 · Tutorial
Likes (1)
Comment
Save
Tweet
Share
4.7K Views

Join the DZone community and get the full member experience.

Join For Free

Problem Statement

Challenge

Organizations running containerized applications in Kubernetes often need to capture and preserve the state of running containers for:

  • Disaster recovery
  • Application migration
  • Debug/troubleshooting
  • State preservation
  • Environment reproduction

However, there's no straightforward, automated way to:

  1. Create container checkpoints on-demand
  2. Store these checkpoints in a standardized format
  3. Make them easily accessible across clusters
  4. Trigger checkpointing through a standard interface

Current Limitations

  • Manual checkpoint creation requires direct cluster access
  • No standardized storage format for checkpoints
  • Limited integration with container registries
  • Lack of programmatic access for automation
  • Complex coordination between containerd and storage systems

Solution

A Kubernetes sidecar service that:

  1. Exposes checkpoint functionality via REST API
  2. Automatically converts checkpoints to OCI-compliant images
  3. Stores images in ECR for easy distribution
  4. Integrates with existing Kubernetes infrastructure
  5. Provides a standardized interface for automation

This solves the core problems by:

  • Automating the checkpoint process
  • Standardizing checkpoint storage
  • Making checkpoints portable
  • Enabling programmatic access
  • Simplifying integration with existing workflows

Target users:

  • DevOps teams
  • Platform engineers
  • Application developers
  • Site Reliability Engineers (SREs)

Forensic container checkpointing is based on Checkpoint/Restore In Userspace (CRIU) and allows the creation of stateful copies of a running container without the container knowing that it is being checkpointed. The copy of the container can be analyzed and restored in a sandbox environment multiple times without the original container being aware of it. Forensic container checkpointing was introduced as an alpha feature in Kubernetes v1.25.

This article will guide you on how to deploy Golang code that can be used to take a container checkpoint using an API. 

The code takes a pod identifier, retrieves the container ID from containerd as an input, and then uses the ctr command to checkpoint the specific container in the k8s.io namespace of containerd:

Prerequisites

  • Kubernetes cluster
  • Install ctr commandline tool. if you are able to run ctr commands on the kubelet or worker node; if not, install or adjust AMI to contain the ctr. 
  • kubectl configured to communicate with your cluster
  • Docker installed locally
  • Access to a container registry (e.g., Docker Hub, ECR)
  • Helm (for installing Nginx Ingress Controller)

Step 0: Code to Create Container Checkpoint Using GO

Create a file named checkpoint_container.go with the following content:

Go
 
package main

import (
    "context"
    "fmt"
    "log"
    "os"
    "os/exec"
    "strings"

    "github.com/aws/aws-sdk-go/aws"
    "github.com/aws/aws-sdk-go/aws/session"
    "github.com/aws/aws-sdk-go/service/ecr"
    "github.com/containerd/containerd"
    "github.com/containerd/containerd/namespaces"
)

func init() {
    log.SetOutput(os.Stdout)
    log.SetFlags(log.Ldate | log.Ltime | log.Lmicroseconds | log.Lshortfile)
}

func main() {
    if len(os.Args) < 4 {
        log.Fatal("Usage: checkpoint_container <pod_identifier> <ecr_repo> <aws_region>")
    }

    podID := os.Args[1]
    ecrRepo := os.Args[2]
    awsRegion := os.Args[3]

    log.Printf("Starting checkpoint process for pod %s", podID)

    containerID, err := getContainerIDFromPod(podID)
    if err != nil {
        log.Fatalf("Error getting container ID: %v", err)
    }

    err = processContainerCheckpoint(containerID, ecrRepo, awsRegion)
    if err != nil {
        log.Fatalf("Error processing container checkpoint: %v", err)
    }

    log.Printf("Successfully checkpointed container %s and pushed to ECR", containerID)
}

func getContainerIDFromPod(podID string) (string, error) {
    log.Printf("Searching for container ID for pod %s", podID)
    client, err := containerd.New("/run/containerd/containerd.sock")
    if err != nil {
        return "", fmt.Errorf("failed to connect to containerd: %v", err)
    }
    defer client.Close()

    ctx := namespaces.WithNamespace(context.Background(), "k8s.io")

    containers, err := client.Containers(ctx)
    if err != nil {
        return "", fmt.Errorf("failed to list containers: %v", err)
    }

    for _, container := range containers {
        info, err := container.Info(ctx)
        if err != nil {
            continue
        }
        if strings.Contains(info.Labels["io.kubernetes.pod.uid"], podID) {
            log.Printf("Found container ID %s for pod %s", container.ID(), podID)
            return container.ID(), nil
        }
    }

    return "", fmt.Errorf("container not found for pod %s", podID)
}

func processContainerCheckpoint(containerID, ecrRepo, region string) error {
    log.Printf("Processing checkpoint for container %s", containerID)
    checkpointPath, err := createCheckpoint(containerID)
    if err != nil {
        return err
    }
    defer os.RemoveAll(checkpointPath)

    imageName, err := convertCheckpointToImage(checkpointPath, ecrRepo, containerID)
    if err != nil {
        return err
    }

    err = pushImageToECR(imageName, region)
    if err != nil {
        return err
    }

    return nil
}

func createCheckpoint(containerID string) (string, error) {
    log.Printf("Creating checkpoint for container %s", containerID)
    checkpointPath := "/tmp/checkpoint-" + containerID
    cmd := exec.Command("ctr", "-n", "k8s.io", "tasks", "checkpoint", containerID, "--checkpoint-path", checkpointPath)
    output, err := cmd.CombinedOutput()
    if err != nil {
        return "", fmt.Errorf("checkpoint command failed: %v, output: %s", err, output)
    }
    log.Printf("Checkpoint created at: %s", checkpointPath)
    return checkpointPath, nil
}

func convertCheckpointToImage(checkpointPath, ecrRepo, containerID string) (string, error) {
    log.Printf("Converting checkpoint to image for container %s", containerID)
    imageName := ecrRepo + ":checkpoint-" + containerID

    cmd := exec.Command("buildah", "from", "scratch")
    containerId, err := cmd.Output()
    if err != nil {
        return "", fmt.Errorf("failed to create container: %v", err)
    }

    cmd = exec.Command("buildah", "copy", string(containerId), checkpointPath, "/")
    err = cmd.Run()
    if err != nil {
        return "", fmt.Errorf("failed to copy checkpoint: %v", err)
    }

    cmd = exec.Command("buildah", "commit", string(containerId), imageName)
    err = cmd.Run()
    if err != nil {
        return "", fmt.Errorf("failed to commit image: %v", err)
    }

    log.Printf("Created image: %s", imageName)
    return imageName, nil
}

func pushImageToECR(imageName, region string) error {
    log.Printf("Pushing image %s to ECR in region %s", imageName, region)
    sess, err := session.NewSession(&aws.Config{
        Region: aws.String(region),
    })
    if err != nil {
        return fmt.Errorf("failed to create AWS session: %v", err)
    }

    svc := ecr.New(sess)

    authToken, registryURL, err := getECRAuthorizationToken(svc)
    if err != nil {
        return err
    }

    err = loginToECR(authToken, registryURL)
    if err != nil {
        return err
    }

    cmd := exec.Command("podman", "push", imageName)
    err = cmd.Run()
    if err != nil {
        return fmt.Errorf("failed to push image to ECR: %v", err)
    }

    log.Printf("Successfully pushed checkpoint image to ECR: %s", imageName)
    return nil
}

func getECRAuthorizationToken(svc *ecr.ECR) (string, string, error) {
    log.Print("Getting ECR authorization token")
    output, err := svc.GetAuthorizationToken(&ecr.GetAuthorizationTokenInput{})
    if err != nil {
        return "", "", fmt.Errorf("failed to get ECR authorization token: %v", err)
    }

    authData := output.AuthorizationData[0]
    log.Print("Successfully retrieved ECR authorization token")
    return *authData.AuthorizationToken, *authData.ProxyEndpoint, nil
}

func loginToECR(authToken, registryURL string) error {
    log.Printf("Logging in to ECR at %s", registryURL)
    cmd := exec.Command("podman", "login", "--username", "AWS", "--password", authToken, registryURL)
    err := cmd.Run()
    if err != nil {
        return fmt.Errorf("failed to login to ECR: %v", err)
    }
    log.Print("Successfully logged in to ECR")
    return nil
}


Step 1: Initialize the go Module

Shell
 
go mod init checkpoint_container


Modify the go.mod file:

Go
 
module checkpoint_container

go 1.23

require (
    github.com/aws/aws-sdk-go v1.44.298
    github.com/containerd/containerd v1.7.2
)
require (
    github.com/jmespath/go-jmespath v0.4.0 // indirect
    github.com/opencontainers/go-digest v1.0.0 // indirect
    github.com/opencontainers/image-spec v1.1.0-rc2.0.20221005185240-3a7f492d3f1b // indirect
    github.com/pkg/errors v0.9.1 // indirect
    google.golang.org/genproto v0.0.0-20230306155012-7f2fa6fef1f4 // indirect
    google.golang.org/grpc v1.53.0 // indirect
    google.golang.org/protobuf v1.30.0 // indirect
)


Run the following command:

Shell
 
go mod tidy


Step 2: Build and Publish Docker Image

Create a Dockerfile in the same directory:

Dockerfile
 
# Build stage
FROM golang:1.20 as builder

WORKDIR /app
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o checkpoint_container

# Final stage
FROM amazonlinux:2

# Install necessary tools
RUN yum update -y && \
    amazon-linux-extras install -y docker && \
    yum install -y awscli containerd skopeo && \
    yum clean all

# Copy the built Go binary
COPY --from=builder /app/checkpoint_container /usr/local/bin/checkpoint_container

EXPOSE 8080

ENTRYPOINT ["checkpoint_container"]


This Dockerfile does the following:

  1. Uses golang:1.20 as the build stage to compile your Go application.
  2. Uses amazonlinux:2 as the final base image.
  3. Installs the AWS CLI, Docker (which includes containerd), and skopeo using yum and amazon-linux-extras.
  4. Copies the compiled Go binary from the build stage.
Shell
 
docker build -t <your-docker-repo>/checkpoint-container:v1 .
docker push <your-docker-repo>/checkpoint-container:v1


Replace <your-docker-repo> with your actual Docker repository.

Step 3: Apply the RBAC resources

Create a file named rbac.yaml:

YAML
 
apiVersion: v1
kind: ServiceAccount
metadata:
  name: checkpoint-sa
  namespace: default

---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: checkpoint-role
  namespace: default
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: checkpoint-rolebinding
  namespace: default
subjects:
- kind: ServiceAccount
  name: checkpoint-sa
  namespace: default
roleRef:
  kind: Role
  name: checkpoint-role
  apiGroup: rbac.authorization.k8s.io


Apply the RBAC resources:

Shell
 
kubectl apply -f rbac.yaml


Step 4: Create a Kubernetes Deployment

Create a file named deployment.yaml:

YAML
 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: main-app
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: main-app
  template:
    metadata:
      labels:
        app: main-app
    spec:
      serviceAccountName: checkpoint-sa
      containers:
      - name: main-app
        image: nginx:latest  # Replace with your main application image
      - name: checkpoint-sidecar
        image: <your-docker-repo>/checkpoint-container:v1
        ports:
        - containerPort: 8080
        securityContext:
          privileged: true
        volumeMounts:
        - name: containerd-socket
          mountPath: /run/containerd/containerd.sock
      volumes:
      - name: containerd-socket
        hostPath:
          path: /run/containerd/containerd.sock
          type: Socket


Apply the deployment:

Shell
 
kubectl apply -f deployment.yaml


In deployment.yaml, update the following:

YAML
image: <your-docker-repo>/checkpoint-container:v1


Step 5: Kubernetes Service

Create a file named service.yaml:

YAML
 
apiVersion: v1
kind: Service
metadata:
  name: checkpoint-service
  namespace: default
spec:
  selector:
    app: main-app
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8080


Apply the service:

Shell
 
kubectl apply -f service.yaml


Step 6: Install Ngnix Ingress Contoller

Shell
 
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm install ingress-nginx ingress-nginx/ingress-nginx


Step 7: Create Ingress Resource

Create a file named ingress.yaml:

YAML
 
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: checkpoint-ingress
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/ssl-redirect: "false"
spec:
  rules:
  - http:
      paths:
      - path: /checkpoint
        pathType: Prefix
        backend:
          service:
            name: checkpoint-service
            port: 
              number: 80


Apply the Ingress:

Shell
 
kubectl apply -f ingress.yaml


Step 8: Test the API

Shell
 
kubectl get services ingress-ngnix-contoller -n ingress-ngnix
Shell
 
curl -X POST http://<EXTERNAL-IP>/checkpoint \
 -H "Content-Type: application/json" \
 -d '{"podId": "your-pod-id", "ecrRepo": "your-ecr-repo", "awsRegion": "your-aws-region"}'


Replace <EXTERNAL-IP> with the actual external IP.

Additional Considerations

  1. Security.
    • Implement HTTPS by setting up TLS certificates
    • Add authentication to the API
  2. Monitoring. Set up logging and monitoring for the API and checkpoint process.
  3. Resource management. Configure resource requests and limits for the sidecar container.
  4. Error handling. Implement robust error handling in the Go application.
  5. Testing. Thoroughly test the setup in a non-production environment before deploying it to production.
  6. Documentation. Maintain clear documentation on how to use the checkpoint API.

Conclusion

This setup deploys the checkpoint container as a sidecar in Kubernetes and exposes its functionality through an API accessible from outside the cluster. It provides a flexible solution for managing container checkpoints in a Kubernetes environment.

AWS/EKS Specific

Step 7: Install the AWS Load Balancer Controller

Instead of using the Nginx Ingress Controller, we'll use the AWS Load Balancer Controller. This controller will create and manage ALBs for our Ingress resources.

1. Add the EKS chart repo to Helm:

Shell
 
helm repo add eks https://aws.github.io/eks-charts


2. Install the AWS Load Balancer Controller:

Shell
 
helm install aws-load-balancer-controller eks/aws-load-balancer-controller \
  -n kube-system \
  --set clusterName=<your-cluster-name> \
  --set serviceAccount.create=false \
  --set serviceAccount.name=aws-load-balancer-controller


Replace <your-cluster-name> with your EKS cluster name.

Note: Ensure that you have the necessary IAM permissions set up for the AWS Load Balancer Controller. You can find the detailed IAM policy in the AWS documentation.

Step 8: Create Ingress Resource

Create a file named ingress.yaml:

YAML
 
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: checkpoint-ingress
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
spec:
  rules:
  - http:
      paths:
      - path: /checkpoint
        pathType: Prefix
        backend:
          service:
            name: checkpoint-service
            port: 
              number: 80


Apply the Ingress:

Shell
 
kubectl apply -f ingress.yaml


Step 9: Test the API

1. Get the ALB DNS name:

Shell
 
kubectl get ingress checkpoint-ingress


Look for the ADDRESS field, which will be the ALB's DNS name.

2. Send a test request:

Shell
 
curl -X POST http://<ALB-DNS-NAME>/checkpoint \
     -H "Content-Type: application/json" \
     -d '{"podId": "your-pod-id", "ecrRepo": "your-ecr-repo", "awsRegion": "your-aws-region"}'


Replace <ALB-DNS-NAME> with the actual DNS name of your ALB from step 1.

Additional Considerations for AWS ALB

1. Security groups. The ALB will have a security group automatically created. Ensure it allows inbound traffic on port 80 (and 443 if you set up HTTPS).

2. SSL/TLS: To enable HTTPS, you can add the following annotations to your Ingress:

YAML
 
alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS":443}]'
alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:region:account-id:certificate/certificate-id


3. Access logs. Enable access logs for your ALB by adding the following:

YAML
 
alb.ingress.kubernetes.io/load-balancer-attributes: access_logs.s3.enabled=true,access_logs.s3.bucket=your-log-bucket,access_logs.s3.prefix=your-log-prefix


4. WAF integration. If you want to use AWS WAF with your ALB, you can add:

YAML
 
alb.ingress.kubernetes.io/waf-acl-id: your-waf-web-acl-id


5. Authentication. You can set up authentication using Amazon Cognito or OIDC by using the appropriate ALB Ingress Controller annotations.

These changes will set up your Ingress using an AWS Application Load Balancer instead of Nginx. The ALB Ingress Controller will automatically provision and configure the ALB based on your Ingress resource.

Conclusion

Remember to ensure that your EKS cluster has the necessary IAM permissions to create and manage ALBs. This typically involves creating an IAM policy and a service account with the appropriate permissions.

This setup will now use AWS's native load-balancing solution, which integrates well with other AWS services and can be more cost-effective in an AWS environment.

API AWS Kubernetes Docker (software) Go (programming language)

Opinions expressed by DZone contributors are their own.

Related

  • Docker + .NET APIs: Simplifying Deployment and Scaling
  • Running Serverless Service as Serverful
  • Mastering Node.js: The Ultimate Guide
  • Keep Your Application Secrets Secret

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!