DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Containers

Containers allow applications to run quicker across many different development environments, and a single container encapsulates everything needed to run an application. Container technologies have exploded in popularity in recent years, leading to diverse use cases as well as new and unexpected challenges. This Zone offers insights into how teams can solve these challenges through its coverage of container performance, Kubernetes, testing, container orchestration, microservices usage to build and deploy containers, and more.

icon
Latest Premium Content
Trend Report
Kubernetes in the Enterprise
Kubernetes in the Enterprise
Trend Report
Cloud Native
Cloud Native
Refcard #400
Java Application Containerization and Deployment
Java Application Containerization and Deployment

DZone's Featured Containers Resources

Zero-Downtime Deployments for Java Apps on Kubernetes

Zero-Downtime Deployments for Java Apps on Kubernetes

By Ramya vani Rayala
This article provides a comprehensive guide to achieving zero-downtime deployments for Java-based applications on Kubernetes. We cover deployment strategies, Kubernetes primitives, Java-specific considerations, session state handling, database migrations, traffic shifting techniques, CI/CD pipelines, GitHub Actions, Jenkins with automated rollbacks, observability (Prometheus, Grafana, Jaeger), Helm/ArgoCD examples, testing strategies (canary analysis, chaos, smoke tests), and troubleshooting. Deployment Strategies Kubernetes offers several strategies for deploying new versions without downtime: Rolling Update Incrementally replace old pods with new ones, maintaining availability. Kubernetes Deployment object uses rolling updates by default. You can control maxUnavailable and maxSurge to tune the rollout. Blue-Green Deployment Run two separate environments: Blue = current, green = new. Only one serves live traffic at a time. Once the Green version is verified, switch the Service or Ingress to point at Green, then scale down Blue. This allows instant rollback by redirecting traffic back to Blue. Argo Rollouts defines a blue/green strategy with an active and preview Service. Traffic flows only to the active version until promotion. Canary Deployment Gradually shift a small percentage of traffic to the new version. Start with a few pods of v2, monitor, then incrementally increase. Tools like Istio or Argo Rollouts can control traffic weights. For instance, sending 10% of traffic to v2 can be done by running 9 v1 pods and 1 v2 pod (10%). Argo defines a canary rollout with setWeight steps and pauses for analysis. Shadow/Mirroring The new version receives a copy of live requests for testing under real load, but its responses are not returned to users. This is low risk but does not assist in rollback decisions since users don’t see the new behavior. Kubernetes Primitives for Zero Downtime Deployment A Deployment naturally performs rolling updates. By default, it creates a new ReplicaSet and scales it up while scaling down the old one controlled by maxUnavailable/maxSurge. This ensures some pods always serve traffic. To use blue/green, you would deploy two separate Deployments (e.g., app-blue, app-green) and switch Services. Service and Ingress A Service fronts pods. For blue/green, you can point a single Service at either the blue or green pods. Ingress can also switch between backend services. E.g., label selectors can be adjusted to redirect traffic from version blue to version green pods. PodDisruptionBudget Ensures a minimum number of pods stay running during voluntary disruptions. For instance, setting minAvailable 1 ensures at least one pod remains during a rolling update. To avoid complete downtime during maintenance. Horizontal Pod Autoscaler (HPA) Scales pods based on CPU/memory or custom metrics. It automatically updates a workload to match demand. An HPA can be attached to the Deployment so that if traffic spikes during a rollout, new pods will be created to handle the load. Example: YAML apiVersion autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: myapp-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: myapp minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 50 Liveness and Readiness Probes Critical for zero downtime. A liveness probe checks if the app is alive; if it fails, K8 restarts the pod. A readiness probe tells if the app is ready to serve traffic. During startup or shutdown, the readiness probe should fail, causing the pod to be removed from the service load balancer. Spring Boot Actuator provides /actuator/health for this. In K8S YAML: YAML livenessProbe: httpGet: path: /actuator/health/liveness port: 8080 initialDelaySeconds: 15 periodSeconds: 10 readinessProbe: httpGet: path: /actuator/health/readiness port: 8080 initialDelaySeconds: 5 periodSeconds: 5 Spring Boot exposes health/liveness and health/readiness groups by default. Quarkus and Micronaut have similar health endpoints. Spring Boot supports graceful shutdown by setting server.shutdown is equals to graceful and tuning spring.lifecycle.timeout-per-shutdown-phase. This causes the embedded server, either Tomcat/Jetty/Undertow, to stop accepting traffic and wait up to the timeout for active requests. Java @Component public class ShutdownListener implements SmartLifecycle { private boolean running = true; @Override public void stop() { running = false; } @Override public boolean isRunning() { return running; } } Quarkus provides graceful shutdown configuration. By setting quarkus.shutdown.timeout=10s, Quarkus will wait up to 10 seconds for current requests to finish before exiting. You can annotate a bean method with @Shutdown to run cleanup code. Micronaut has @EventListener for ShutdownEvent: Java @Singleton public class ShutdownBean { @EventListener void onShutdown(ShutdownEvent event) { } } Kubernetes Hooks You can use a preStop hook in the Deployment spec to run a script before SIGTERM. YAML lifecycle: preStop: exec: command: ["/bin/sh","-c","sleep 5"] terminationGracePeriodSeconds: 30 The grace period (default 30s) should be tuned to let the app finish. K8S doc 77†L99-L107 describes the sequence container enters Terminating, runs preStop, sends SIGTERM, waits terminationGracePeriodSeconds, then SIGKILL. JVM Tuning Set -XX +ExitOnOutOfMemoryError to avoid hanging. Tune thread pools so they drain quickly. Monitor GC pause times, consider using low-latency GC to minimize pause before shutdown. Session and State Handling To maintain zero downtime when pods switch: Stateless services: Best practice is to keep services stateless. Store session state or user data in an external store, such as Redis or a database. This way, any pod can handle any request, and pods can be replaced without losing the user session.Sticky sessions: If an app uses in-memory sessions, you can enforce sticky sessionsService affinity: Set sessionAffinity: ClientIP on the Service. Kubernetes routes requests from the same client IP to the same pod.Ingress affinity: Use Ingress annotations to bind a user’s requests to one pod. However, sticky sessions introduce risk and are not suitable for autoscaling.StatefulSets: For true stateful workloads, use StatefulSet with stable identities. StatefulSets pair pods with PersistentVolumes, which are not zero-downtime by themselves. GitHub Actions CI/CD Pipeline zero-downtime: YAML name: Deploy on: push: branches: [ main ] jobs: build: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - uses: actions/setup-java@v3 with: { java-version: '17' } - name: Build run: mvn clean package -DskipTests name: Docker Build & Push run: | docker build -t ghcr.io/myorg/myapp:${{ github.sha } echo ${{ secrets.GITHUB_TOKEN } | docker login ghcr.io -u ${{ github.actor } --password-stdin docker push ghcr.io/myorg/myapp:${{ github.sha } - name: Set image tag run: echo "::set-output name=image::ghcr.io/myorg/myapp:${{ github.sha } deploy: needs: build runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 with: { path: manifests } - name: Update K8s deployment uses: azure/setup-kubectl@v3 - name: Deploy to Kubernetes run: | kubectl set image deployment/myapp-deployment myapp=ghcr.io/myorg/myapp:${{ needs.build.outputs.image } kubectl rollout status deployment myapp-deployment This workflow builds the image, pushes it, and updates the deployment. The rollout status command waits for all new pods to become ready. If health checks fail, it will abort without downtime. Conclusion Zero-downtime deployment on Kubernetes combines careful architecture and automation, using rolling updates, progressive strategies, ensuring graceful shutdown and health checks in your Java apps, externalizing state, managing database changes, and orchestrating with CI/CD pipelines. Kubernetes primitives like Deployments, Services, Probes, and HPA, along with tools like Istio or Argo Rollouts, provide the building blocks. More
Pragmatica Aether: Let Java Be Java

Pragmatica Aether: Let Java Be Java

By Sergiy Yevtushenko
The Aberration We build Java applications like Go or Rust programs. Fat JARs. Docker images. Kubernetes deployments. Everyone does it, so it looks normal. It contradicts Java’s design DNA. Java has always been a language for managed environments. Applets ran inside browsers. Servlets ran inside application servers. EJBs ran inside containers like JBoss and WebLogic. OSGi bundles ran inside runtime containers like Eclipse Equinox. In every generation, the pattern was the same: a managed runtime hosts the application. The application handles business logic. The runtime handles infrastructure. The fat-jar era threw that away. We stopped letting Java be Java. We started bundling web servers, serialization frameworks, service discovery clients, configuration management, health checks, metrics libraries, and logging frameworks into every application. Then we wrapped the result in a Docker container and deployed it to an orchestration platform that reimplements — poorly — the infrastructure management that Java runtimes used to provide natively. This article introduces Pragmatica Aether: a distributed runtime that returns Java to its natural habitat. The application handles business logic. Runtime handles infrastructure. This isn’t radical — it's returning to what Java was designed for. The Problem: Infrastructure Wearing a Business Logic Mask Think of what a typical Java microservice carries. A web server (Tomcat, Netty, Undertow). A serialization framework (Jackson, Gson). A dependency injection container (Spring, Guice). A service discovery client (Eureka, Consul). Health check endpoints. Configuration management (Spring Cloud Config, Consul KV). A metrics library (Micrometer, Dropwizard). A logging framework (Logback, Log4j2). Retry logic (Resilience4j). Circuit breakers. HTTP client configuration. The application is wearing a heavy winter coat of infrastructure, armed to the teeth to survive in a hostile environment. Now consider the coupling this creates. Update the Java version — rebuild and test every service. Change your message broker from RabbitMQ to Kafka — modify, rebuild, and redeploy every application that touches messaging. Add a new observability tool and update dependencies in every microservice. Switch cloud providers — rewrite configuration, SDK calls, and deployment manifests across the entire fleet. Each change ripples through dozens or hundreds of services because infrastructure is entangled with business logic at the dependency level. This is the coupling trap. Your application’s pom.xml doesn't distinguish between business dependencies and infrastructure dependencies. They compile together, deploy together, and break together. A security patch in Netty requires a new build of every service that embeds a web server, which is all of them. Framework lock-in worsens this. It isn’t a vendor problem — it's an architecture problem. Spring’s dependency injection fights with Kubernetes service mesh for control over service routing and circuit breaking. The framework’s configuration system overlaps with Consul KV and Kubernetes ConfigMaps. Your cloud SDK’s retry logic conflicts with Resilience4j. Every layer claims authority over the same cross-cutting concerns, and the conflicts surface as subtle bugs in production — not during development. This is an architecture problem. Architectural problems have architectural solutions. Aether: The Core Idea What you write: an interface annotated with @Slice, plus business logic implementation. Java @Slice public interface OrderService { Promise<OrderResult> placeOrder(PlaceOrderRequest request); static OrderService orderService(InventoryService inventory, PricingEngine pricing) { return request -> inventory.check(request.items()) .flatMap(available -> pricing.calculate(available)) .map(priced -> OrderResult.placed(priced)); } } What you don’t write: everything else. No HTTP clients — inter-slice calls are direct method invocations via generated proxies. No service discovery — the runtime tracks where every slice instance lives. No retry logic — built-in retry with exponential backoff and node failover. No circuit breakers — the reliability fabric handles failure automatically. No serialization code — request/response types are serialized transparently. A method call via an imported interface is the only visible contract. The only hint that the actual call might be remote is a design requirement: slice methods should be idempotent. This isn’t a limitation — it's what enables retry, scaling, and fault tolerance to work transparently. The same request, processed by any available instance, produces the same result. Most read operations are naturally idempotent. For writes, standard patterns like idempotency keys and conditional writes handle it cleanly. Everything else is the environment’s job: resource provisioning, scaling, transport, discovery, retries, circuit breakers, configuration, observability, logging, tracing, monitoring, and security. None of these are application concerns, and none should be handled at the business logic level. The JBCT Leaf pattern serves two purposes here: it documents the design (“what we expect from an external implementation”) and encourages exactly one interface per dependency. Different implementations may have different technical properties — performance, latency, memory consumption — but as long as they’re compatible with the interface, business logic works unchanged. You write basically pure business logic that scales from your local computer to a global multi-zone distributed deployment, transparently. Under The Hood: What Makes It Work Five architectural decisions make this possible. Consensus KV Store. A single source of truth for all configuration, deployment state, and service discovery. Based on the Rabia protocol, a crash-fault-tolerant, leaderless consensus algorithm was published in 2021. Any node can propose; agreement is reached through a two-round voting protocol with a fast path when a supermajority agrees in round one. No external config servers. No etcd. No Consul. Configuration changes propagate through consensus and take effect cluster-wide. Built-in Artifact Repository. DHT-based storage with configurable replication — 3 replicas with quorum reads/writes in production, full replication in development. Artifacts are chunked into 64KB pieces, distributed across nodes via consistent hashing, and integrity-verified with MD5 and SHA-1 on every resolve. No external Nexus or Artifactory is needed. During development, slices resolve from your local Maven repository. In production, the cluster is self-contained. ClassLoader Isolation. Each slice runs inside its own SliceClassLoader with child-first delegation. Two slices can use different versions of the same library without conflict. Shared dependencies like Pragmatica Lite core are loaded once in a parent classloader. No dependency conflicts. No classpath hell between slices. Declarative Deployment. Blueprints — TOML files — describe the desired state: which slices, how many instances. TOML id = "org.example:commerce:1.0.0" [[slices]] artifact = "org.example:inventory-service:1.0.0" instances = 3 [[slices]] artifact = "org.example:order-processor:1.0.0" instances = 5 Apply with one command: aether blueprint apply commerce.toml. The cluster resolves artifacts, loads slices, distributes instances across nodes, registers routes, and starts serving traffic. The cluster converges to the desired state automatically. Infrastructure Independence. Aether nodes are identical — there's only one deployment artifact to manage at the infrastructure level. Node updates and application deployments run on completely independent schedules. Update Java — roll it out across nodes without touching applications. Update the Aether runtime — same. Update business logic — deploy new slice versions without touching infrastructure. Each independently, each without downtime. This is the fundamental benefit of proper separation: when layers don’t share a deployment unit, they don’t share a deployment schedule. Fault Tolerance: The 50% Rule The system survives the failure of less than half the nodes. Performance may degrade until replacements spin up, but functionality remains intact — actual redundancy, not just graceful degradation. A 5-node cluster tolerates 2 simultaneous failures. A 7-node cluster tolerates 3. The same request, processed by any available node, produces the same result. Quorum requires (N/2) + 1 nodes — as long as a majority is alive, the cluster operates normally. Leader failover is consensus-based and near-instant. Node replacement happens automatically — the Cluster Deployment Manager detects the deficit and provisions a replacement through the NodeProvider interface. The entire recovery sequence — from failure detection through state restoration to serving traffic — completes without human intervention. When a node fails, the recovery is automatic. Requests to slices on the failed node are immediately retried on healthy nodes. A replacement node is provisioned. It connects to peers, restores consensus state from a cluster snapshot, re-resolves artifacts from the DHT, and reactivates assigned slices. Dead nodes are automatically removed from routing tables. The new leader reconciles the stale state. No human intervention required. Rolling updates leverage this fault tolerance for zero-downtime deployments with weighted traffic routing: SQL aether update start org.example:order-processor 2.0.0 -n 3 aether update routing <id> -r 1:3 # 25% to v2, 75% to v1 aether update routing <id> -r 1:1 # 50/50 aether update complete <id> # 100% to v2, drain v1 Deploy during business hours. Shift traffic gradually — 10% canary, then 25%, 50%, 75%, 100%. Monitor health metrics at each step. If health degrades — error rate exceeds thresholds, latency spikes — instant rollback with one command: aether update rollback <id>. Traffic immediately shifts back to the old version. The 3 AM pager alert becomes an audit log entry. For Every Project: Legacy, Greenfield, And Everything Between Legacy Migration Your legacy Java system doesn’t need a complete rewrite. It needs a path forward. Pick a relatively independent part of your system — something hitting limits, something with clear boundaries. Extract an interface. Annotate it with @Slice. Wrap the legacy implementation: Java private Promise<Report> generateReport(ReportRequest request) { return Promise.lift(() -> legacyReportService.generate(request)); } One line to enter the Aether world. Promise.lift() wraps the legacy call, catches exceptions, and returns a proper Result inside a Promise. Your legacy code keeps running. Call sites don't change. You haven't added risk — the initial deployment in Ember runs in the same JVM as your existing application, which means it's no worse than what you have today. You've laid the foundation for removing risk, not adding it. Moving from Ember to a full Aether cluster is a configuration change, not a code change — and that's when the 50% rule starts to apply. From there, it’s the strangler fig pattern. Extract a hot path, deploy it as a slice, route traffic, repeat. Each extracted slice can be gradually refactored using the peeling pattern: first wrap everything in Promise.lift(), then decompose into a Sequencer with each step still wrapped, then peel individual steps into clean JBCT patterns. Tests pass at every step. The lift() calls mark exactly where legacy code remains, making progress visible and remaining work obvious. No rewrite is required. No big bang migration. One sprint to the first slice in production. The migration article covers the full path in detail — from initial wrapping through gradual peeling to clean JBCT code. Greenfield Development For new projects, slices enable a granularity that’s impossible with traditional microservices. Each slice can be as lean as a single method — and that’s the recommended approach. There are no operational or complexity tradeoffs for small slices because Aether handles all the infrastructure overhead. No container to configure, no load balancer to provision, no monitoring to set up per service. You get per-use-case scaling: one slice serving 50 instances during peak load while another idles at minimum. That kind of granularity would be operationally insane with traditional microservices — each needing its own container, load balancer, monitoring, and deployment pipeline. With Aether, it’s the default. JBCT patterns — Leaf, Sequencer, Fork-Join, Condition, Iteration, and Aspects — compose naturally within slices. Each slice method is a data transformation pipeline: parse input, gather data, process, respond. The patterns provide consistent structure within slices. Slices provide consistent boundaries between them. The Spectrum Same slice model, different granularity. A service slice wraps an entire legacy component. A lean slice implements a single method. Both coexist in the same cluster, deployed and scaled independently. Slice is the executable unit. It can be big or small as necessary and convenient. The architecture accommodates both monolith migration and greenfield development simultaneously. Your legacy system gains fault tolerance while new features get maximum deployment flexibility. Scaling: Two Levels, Three Tiers of Intelligence Two-Level Horizontal Scaling Aether scales in two dimensions independently: Slice scaling: Spin up more instances of a specific slice on existing nodes. Classes are already loaded—scaling takes milliseconds, not seconds.Node scaling: Add more machines to the cluster. The node connects, restores state, and begins accepting work. Independent controls, combined effect. Each node hosts at most one instance of a given slice, so scaling a slice beyond the current node count requires adding nodes first. Add 2 more nodes to a 3-node cluster, then scale a hot slice to 5 instances—one per node. No coordination between the two dimensions is required. Three-Tier Decision System Tier 1—Decision Tree (1-second intervals) Instant reactive decisions based on CPU utilization, request latency, queue depth, and error rate. CPU above 70%? Add an instance. Below 30% sustained? Remove one (if above minimum). Latency exceeding the P95 threshold? Scale up. Error rate above 1% due to timeouts? Scale up. Deterministic, predictable, fast. Handles routine load changes with configurable cooldown periods — 30 seconds for scale-up, 5 minutes for scale-down — to prevent oscillation. Tier 2—TTM Predictor (60-second intervals) An ONNX-based machine learning model (Tiny Time Mixers) analyzes a 60-minute sliding window of metrics — CPU usage, request rate, P95 latency, and active instances. Forecasts load and adjusts the Decision Tree’s thresholds preemptively. If TTM predicts a load increase, it lowers the scale-up CPU threshold by 20% so the reactive tier responds earlier. The cluster scales before the spike arrives, not after. The key design principle: the cluster always survives on Tier 1 alone. TTM enhances; it doesn’t replace. If TTM fails — model load error, insufficient data, inference failure — the Decision Tree continues with default thresholds. The error is logged and recorded in metrics. No scaling disruption. Tier 3—LLM-based (planned) Long-term capacity planning and cluster health monitoring. Seasonal pattern prediction, maintenance window planning, anomaly investigation. This tier is not yet implemented — the current system operates with Tiers 1 and 2. Fault tolerance makes preemptible instances viable for burst scaling. If a spot instance gets reclaimed, the cluster survives — it was designed for nodes to disappear. You don’t need a PhD in distributed systems or a dedicated platform team. The scaling system manages itself. Development Experience: From Laptop To Production Three Environments, Zero Code Changes Ember Single-process runtime with multiple cluster nodes running in the same JVM. Fast startup, simple debugging. Deploy your slices alongside your existing application — slices call each other directly in-process. No network overhead. Standard debugger breakpoints work as expected. Perfect for local development and unit testing. Forge A 5-node cluster simulator running on your laptop. Real consensus. Real routing. Real failure scenarios. Kill nodes, crash the leader, trigger rolling restarts — and watch the cluster recover in real time through a web dashboard with D3.js topology visualization, per-node metrics (CPU, heap, leader status), and event timeline. Configurable load generation with TOML-based multi-target configuration lets you stress-test realistic scenarios — set request rates, define body templates, and run duration-limited load tests. Chaos operations include node kill, leader kill, and rolling restart. Forge validates the entire dependency graph before starting anything. Aether Production cluster. Same slices, same code, different scale. Your code doesn’t know which environment it’s running in. Whether inter-slice calls are in-process or cross-network is transparent. Tooling 37 CLI commands cover deployment, scaling, updates, artifacts, observability, controller configuration, and alerts — in both single-command and interactive REPL modes. A web dashboard streams real-time metrics via WebSocket — no polling. 30+ REST management endpoints enable full programmatic control of everything the CLI can do. Prometheus-compatible metrics export (/metrics/prometheus) integrates with existing monitoring stacks. Metrics are push-based at 1-second intervals, with zero consensus overhead — they bypass the consensus protocol entirely. Per-method invocation tracking with P50/P95/P99 latency and configurable slow-invocation detection strategies (fixed threshold, adaptive, per-method, composite) surfaces performance issues before users notice. Dynamic aspects let you toggle LOG/METRICS/LOG_AND_METRICS modes per method at runtime via REST API, without redeployment. Test realistic failure scenarios on your laptop. Deploy to production with a config change, not a code change. Maturity Aether is a working system, not a concept paper. 81 end-to-end tests are run against real 5-node clusters in Podman containers, validating cluster formation, quorum establishment, slice deployment and scaling, blueprint application with topological ordering, multi-instance distribution, artifact upload, and cross-node resolution with integrity verification, leader failure and recovery, node restart with state restoration, and orphaned state cleanup after leader changes. The recovery and fault tolerance claims come from automated tests against real clusters, not marketing slides. Let Java Be Java Java’s lineage leads here. From applets managed by browsers, through servlets managed by application servers, through EJBs managed by enterprise containers, through OSGi managed by runtime frameworks, to Aether, managed by a distributed runtime. The fat-jar era was a detour. An understandable one — when Docker emerged, it offered a universal packaging format, and the industry standardized on it regardless of language. Java adopted the patterns of languages that were designed to produce standalone binaries. We started treating Java applications like Go programs with a heavier runtime. But it was never the destination. Java was designed for managed environments. The JVM makes it possible. The runtime manages the application. That’s the lineage. Aether continues it. Two entry points exist today. Wrap your legacy monolith behind a @Slice interface in one sprint and gain fault tolerance without rewriting anything. Or start fresh with maximum clarity — lean slices, explicit contracts, per-use-case scaling. Both paths converge on the same runtime, the same cluster, the same operational model. Both paths can coexist — legacy service slices and new lean slices running side by side. Fault tolerance is not an afterthought — it's the foundation. Scaling is not your problem — it's the environment’s. Infrastructure is not your code — it's the runtime’s. The heavy winter coat comes off. The application breathes. Resources Pragmatica Aether—project siteGitHub Repository—source code More
Docker Hardened Images Are Free Now — Here's What You Still Need to Build
Docker Hardened Images Are Free Now — Here's What You Still Need to Build
By Shamsher Khan DZone Core CORE
Beyond Partitioning and Z-Order: A Deep Dive into Liquid Clustering for Unity Catalog Managed Tables
Beyond Partitioning and Z-Order: A Deep Dive into Liquid Clustering for Unity Catalog Managed Tables
By Seshendranath Balla Venkata
One Query, Four GPUs: Tracing a Distributed Training Stall Across Nodes
One Query, Four GPUs: Tracing a Distributed Training Stall Across Nodes
By Ingero Team
Self-Hosted Inference Doesn’t Have to Be a Nightmare: How to Use GPUStack
Self-Hosted Inference Doesn’t Have to Be a Nightmare: How to Use GPUStack

The Problem Nobody Warned You About You bought the GPUs. Maybe you've got a couple of NVIDIA A100s in a rack, some RTX 4090s under desks, or a Kubernetes cluster with mixed hardware. You've got the compute. Congratulations! Now what? Here's the part that catches most teams off guard: having GPUs is the easy part. Managing them is where things go sideways. You need to figure out which models fit on which cards, how to balance load across machines, how to handle a node going down at 2 AM, and how to expose all of this as a clean API your application team can actually call. Most teams end up building a brittle collection of Python scripts and crontab entries that haven't been updated since 2022. It works until it doesn't, and then someone's paging you on a Saturday. This is the problem GPUStack was built to solve. What Is GPUStack, Exactly? GPUStack is an open-source tool for managing GPU clusters. Think of it as Kubernetes for your inference workloads, except you don't need to spend three days debugging a whitespace error in a Helm chart. At its core, GPUStack does three things well: It aggregates your GPUs. Whether your hardware is spread across bare-metal servers, Kubernetes pods, or cloud instances, GPUStack sees them all as a single pool of compute. One dashboard, full visibility. It orchestrates inference engines. GPUStack doesn't try to reinvent the inference wheel. It plugs into engines like vLLM, SGLang, and TensorRT-LLM, picks the right one for the job, configures it, and manages the lifecycle so you don't have to. It serves models through an OpenAI-compatible API. Once a model is deployed, your application team gets a familiar REST endpoint. No custom client libraries. No new protocols to learn. Swap out the base URL, and you're talking to your own infrastructure. Getting Started in Under 5 Minutes I'm not exaggerating on the timeline. Here's how you go from zero to a running GPUStack server. Step 1: Fire Up the Server You need one machine to act as your control plane. It doesn't even need a GPU. A basic CPU-only box works fine for the server role. Shell sudo docker run -d --name gpustack \ --restart unless-stopped \ -p 80:80 \ --volume gpustack-data:/var/lib/gpustack \ gpustack/gpustack That's it. Open your browser, navigate to http://<your-server-ip>, and you'll see the GPUStack dashboard. The first time you log in, you'll set up your admin credentials. Step 2: Add Your GPU Workers Now for the fun part. On each worker node, make sure you have the NVIDIA driver and NVIDIA Container Toolkit installed, then run: Shell sudo docker run -d --name gpustack-worker \ --restart unless-stopped \ --gpus all \ -e GPUSTACK_SERVER_URL=http://<your-server-ip> \ -e GPUSTACK_TOKEN=<your-token> \ gpustack/gpustack Replace the server URL and token (grab the token from the GPUStack dashboard). Within seconds, your worker appears in the cluster view with GPU model info, VRAM capacity, and health status. Rinse and repeat for every GPU machine you want to add. Got 3 machines? Three commands. Got 30? Thirty commands, or one Ansible playbook if you're smart about it. Running the worker command is actually the easiest part. The real final boss of GPU clusters is usually getting the drivers and toolkit installed correctly on the host. Step 3: Deploy a Model Head over to the model catalog in the web UI. GPUStack supports pulling models from Hugging Face and the Ollama Library. Pick a model and click deploy. Here's where the scheduler really excels. It reads the model's metadata, computes the resource requirements for VRAM, compute, and memory, then figures out which workers can handle it. If the model is too big for a single GPU, it can shard it across multiple cards. You don't have to manually calculate whether a 70B parameter model fits on your hardware. GPUStack does the math for you. Step 4: Call the API Once the model is running, you get an OpenAI-compatible endpoint. Grab an API key from the dashboard and test it: Shell curl http://<your-server-ip>/v1/chat/completions \ -H "Authorization: Bearer <your-api-key>" \ -H "Content-Type: application/json" \ -d '{ "model": "llama3", "messages": [ {"role": "user", "content": "Explain GPU cluster management in one paragraph."} ] }' If you're already using the OpenAI Python SDK, switching to your GPUStack endpoint is a one-line change: Python from openai import OpenAI client = OpenAI( base_url="http://<your-server-ip>/v1", api_key="<your-api-key>" ) response = client.chat.completions.create( model="llama3", messages=[{"role": "user", "content": "Hello from my own GPU cluster!"}] ) print(response.choices[0].message.content) Your application code stays the same. Your infrastructure is now fully under your control. Why This Actually Matters Let me break down the features that make GPUStack more than a nice-looking dashboard. Multi-Backend Flexibility GPUStack supports vLLM, SGLang, and TensorRT-LLM out of the box. This matters because no single engine is best for every workload. vLLM is great at high-throughput batch processing. TensorRT-LLM squeezes out every last drop of performance on NVIDIA hardware. SGLang shines with structured generation. GPUStack lets you pick the right tool for each deployment, or lets the scheduler pick for you. Built-In Monitoring GPUStack integrates with Grafana and Prometheus, giving you real-time dashboards for GPU utilization, VRAM usage, token throughput, and API request rates. No need to bolt on a separate monitoring stack (which usually ends up being three half-finished Grafana dashboards anyway). When something breaks at 2 AM, you'll know exactly which GPU on which machine is the problem. Automated Failure Recovery We’ve all been there - a node drops off the map because of a weird PCIe bus error or a driver mismatch that only appears under heavy load. Normally, that means your inference API just returns 500s until you manually intervene. GPUStack handles the panic phase for you. When Should You Use GPUStack? GPUStack isn't the right fit for every scenario. Here's a quick way to think about it: Use GPUStack if: You have 2+ GPU machines and want to serve LLMs or other AI models behind a unified API. Especially if your team doesn't want to become full-time infrastructure engineers just to keep models running. You want to run inference on your own hardware instead of paying per-token to a cloud provider. The cost savings at scale are real, and GPUStack removes the operational overhead that usually makes self-hosting painful. Maybe skip GPUStack if: You have a single GPU and just want to run a model locally for personal use. Tools like Ollama are simpler for that use case. You're already deep into a custom Kubernetes-based ML platform with KubeFlow or similar. GPUStack can work alongside Kubernetes, but if you've already invested heavily in that ecosystem, the overlap might not be worth it. The Bigger Picture The AI infrastructure landscape is shifting. A year ago, most teams defaulted to API providers for inference. Today, with open-weight models getting better every month and GPU costs coming down, self-hosted inference is becoming a real option. Not just for Big Tech, but for startups and mid-size companies too. The bottleneck isn't hardware anymore. It's operations. It's the glue code between "we have GPUs" and "our application can reliably call a model." GPUStack is a serious attempt at solving that gap, and it's open source under the Apache 2.0 license, so you can inspect, modify, and deploy it without vendor lock-in. If you’re sitting on a pile of hardware that’s currently just acting as expensive space heaters, or if you’re tired of seeing cloud inference bills that look like mortgage payments, give this a shot. You might find that self-hosting is actually viable again!

By Sandeep Sadarangani
Smart Deployment Strategies for Modern Applications
Smart Deployment Strategies for Modern Applications

Modern application development has moved toward distributed, cloud-based, and even microservices-based applications, requiring scalability, reliability, and performance under different conditions. Therefore, deployment has become a part of application development, not merely a final activity. Intelligent deployment patterns and practices are all about building applications that are not just easy to deploy, but also reliable, scalable, and efficient in production. This means moving away from traditional, manual deployment patterns and toward automated, container-based deployment practices. Docker and Kubernetes are two prominent technologies that play a vital role in this transformation and shift toward intelligent deployment patterns and practices. Docker helps developers build applications and deploy them along with their dependencies in lightweight, portable containers, overcoming environment consistency problems, while Kubernetes helps deploy, scale, and self-heal these containers. However, without an appropriate strategy, it is possible to introduce unnecessary complexity and even performance issues. Not every application needs Kubernetes, nor does every deployment issue call for a distributed solution. Knowing when to use Docker on its own, when to use Kubernetes, and when to balance performance, cost, and complexity is vital to deliver effective modern applications. This article provides smart deployment strategies using Docker and Kubernetes. It highlights the advantages, disadvantages, and performance of using Docker and Kubernetes. This gives an overview of the deployment strategy. What Docker Does Docker packages your application, all dependencies, and the run time into a small container. Issues Before Docker It works on my machine and is inconsistent in different environments, such as development, test, staging, and productionDependency conflicts – code language version, missing library version, configuration mismatch Docker Benefits Same behavior everywhere – local development environment, production environment, staging environment, etc.Isolation between apps – create each app that has separate containers.Fast startup – light weight versus a virtual machineEasy deployment – just run the container Plain Text Docker start <containername> How Docker Works Plain Text Application Code → Dockerfile → Docker Image → Docker Container → Run application A container image can run on a developer laptop, on virtual machines, in a data center, or in cloud environments with the same packaged runtime and dependencies. So that Docker resolves our packaging issues. But what if the machine has 100 containers? What if one crashes? How to scale during high traffic? How to manage deployments? Docker itself does not solve these problems. Here, we need a deployment strategy; there, we can use Kubernetes. What Kubernetes Does The operational problem of managing the image once it has been created is addressed by Kubernetes, which automates the deployment, scaling, and management of containerized applications, and can even maintain the state of the application by replacing failed containers and rescheduling applications as needed. Kubernetes Benefits Auto scaling: More containers (pods) if traffic increases, and fewer containers if traffic decreases.Self-healing: Starts the container again if it crashes.Load balancing: Spreads the load across the containers.Zero downtime deployment: Updates the system without stopping it.Service management: Manages multiple microservices easily. Docker builds and runs the container. Kubernetes runs the container reliably at scale. For example, in a real-world scenario: Docker = packing lunch boxesKubernetes = managing a large cafeteria serving thousands Plain Text build app → Docker container ↓ Deploy many containers → Kubernetes manages them What a Kubernetes Deployment Actually Does A Kubernetes deployment is a resource in a cluster that manages a group of pods and replica sets for a workload, typically a stateless application. Define the desired state, and the actual state in the cluster moves towards it. Kubernetes also supports rolling updates, where new Pods are created and marked as ready before the old ones are terminated. The typical process for deploying a Spring Boot application to a Kubernetes cluster Develop a Spring Boot application.The Spring Boot application is built and packaged as a Docker image.The Docker image is pushed to a repository.Kubernetes Deployments define the image.Kubernetes creates Pods and exposes them via a Service. Advantages Consistent deployments: Docker provides a standard unit for bundling the application and its run-time dependencies. This minimizes environment drift between development, testing, and production environments. This is one of the biggest advantages of using containers for Java-based Spring Boot applications.Declarative operations: Kubernetes uses a declarative model to manage its deployments. This is a significant advantage because it makes it easy for organizations to implement automation for the deployment of applications.Self-healing: Kubernetes has self-healing features. It can automatically replace failing containers and reschedule the application in case of unavailability. This is a significant advantage because it makes it easy for organizations to implement self-healing for the application.Inbuilt scaling options: Kubernetes provides built-in autoscaling features for the application. This makes it easy for organizations to implement elastic and efficient scaling for the application.Improved service abstraction and traffic routing: A Kubernetes Service is an API object that defines a single service and provides a consistent endpoint. It is then possible to have the system distribute traffic to matching Pods. If access to the service outside the cluster is required, then Ingress or Gateway-based routing is an option.Safer upgrades: It is possible to gradually roll out new versions using rolling updates. This reduces the deployment risk. Disadvantages 1. More Operational Complexity While Docker is simple in itself for small applications, Kubernetes introduces additional complexity, such as pods, deployments, services, ingress, ConfigMaps, secrets, autoscaling, networking policies, etc. While these features can be justified for production environments, they are complex features and must be appreciated for their complexity. Kubernetes documentation is divided into so many sections because of the complexity of the platform, which is multi-functional by design, encompassing features like orchestration, networking, scaling, storage, etc. 2. Higher Resource Overhead Kubernetes introduces operational complexity, which is absent in Docker. This could be a problem for very small applications, as the complexity may outweigh the advantages. This is an assumption based on the complexity of the Kubernetes model compared to the Docker model. 3. Harder Debugging While debugging a Docker application is relatively simple because the application is hosted on a single host, debugging a distributed application is far more complex because of the involvement of multiple hosts, pods, services, etc. This is an assumption based on the complexity of the Kubernetes model compared to the Docker model. 4. Misconfiguration Risk Kubernetes is a powerful technology, but misconfiguration can lead to application failures. Network Policies, for example, are complex features by design, requiring production-level configurations. Performance Considerations Kubernetes doesn’t make your application run faster on its own. Performance still relies on many factors such as application design, JVM tuning, container image quality, database performance, network latency, and resource allocation. However, there are many operational tools provided by Kubernetes for improving performance under varying loads. These tools include autoscaling and rollout features. In general terms, performance considerations can be divided into four categories: Startup performance. Startup performance of a Spring Boot container can be slow, depending on factors such as application size. However, rollout relies on new Pods becoming available for use. Thus, startup performance can impact rollout performance.Runtime efficiency. Containers are much more efficient than traditional deployment models that use many virtual machines. This is why Docker is so popular for container deployment. However, inefficient Docker images or large JVMs can still cause inefficiencies. Docker documentation lists many factors, such as glibc-based or musl-based Docker images.Scaling behavior. Horizontal pod autoscaling is useful when load increases, as it adds more pods to handle it, rather than scaling up resources for existing pods. However, it is critical to note that the application should scale horizontally and not have any bottlenecks at the single-node level.Networking overhead. Kubernetes provides Services, which add abstraction to the network. Although this is helpful for manageability and load balancing, it is critical to note that there should be careful design for every layer in latency-sensitive applications. The abstraction provided by Services is useful for operational purposes, but is not conceptually. Limitations One limitation to be aware of is the fact that Kubernetes deployments are designed for stateless workloads. This means if the application has state tightly coupled with the identity of the instance or has ordered storage, the application may not be the best candidate for a Kubernetes deployment. The Kubernetes documentation itself describes Deployments as typically being used for workloads that “do not maintain state.” Other practical limitations are: Small teams may find Kubernetes too heavy for a simple internal app.Stateful systems still require careful storage, backup, and failover planning.Local development experience can become more complex than plain Docker Compose.Security and networking require active design, not default trust. When/What to use ScenarioNeed DockerNeed Kubernetes Run single app Yes No Microservices Yes Yes Production scale Yes Yes (Mandatory) Auto scaling needed No Yes High Availability No Yes Conclusion The modern deployment model is not just about shipping code; it’s about shipping it reliably and at scale. Docker helps in providing consistency across environments, while Kubernetes helps in providing scale, resilience, and automation. The smart approach in deployment strategy is about selecting the appropriate tool for the job. Docker might be enough for a simple application, but for a complex application with high availability requirements, Kubernetes becomes a must-have. By understanding the strengths and weaknesses of both tools, we can develop efficient, scalable, and sustainable deployment strategies.

By Manju George
Solving the Mystery: Why Java RSS Grows in Docker on M1 Macs
Solving the Mystery: Why Java RSS Grows in Docker on M1 Macs

The Problem You're running a Java application in a Docker container on your M1 Mac. Everything works fine, but you notice something strange: The resident set size (RSS) keeps growing, even though your heap usage is stable. After hours of investigation, you find mysterious rwxp memory regions, each exactly 128 MB, accumulating in your process memory map. What's causing this? Is it a memory leak? A JVM bug? Something else entirely? The Investigation Our journey began with monitoring RSS growth in a Java 17 application deployed on Docker-backed Minikube. Despite stable heap usage and no obvious memory leaks, RSS continued to grow by hundreds of megabytes over time. Initial Observations RSS growth: ~500-700 MB over 11 hoursHeap usage: Stable and within limitsThread count: StableNative memory tracking: No obvious leaks Deep Dive Into Memory Maps Using /proc/PID/maps and /proc/PID/smaps, we discovered the growth was coming from anonymous executable memory regions: Shell $ cat /proc/1/maps | grep rwxp efffd1d7c000-efffd9d7c000 rwxp 00000000 00:00 0 efffdb185000-efffe3185000 rwxp 00000000 00:00 0 efffe3d85000-efffebd85000 rwxp 00000000 00:00 0 ... Each region was exactly 128 MB, in the 0xefff* address range, with read-write-execute permissions. But what was in them? The Discovery Reading the memory content revealed something unexpected: ARM64 machine code instructions. But wait, the Java binary was x86-64, and the process reported x86_64 architecture. What was ARM64 code doing there? The "Aha!" Moment The answer: Rosetta 2 translation cache. When running x86-64 containers on ARM64 M1 Macs via Docker Desktop, Rosetta 2 translates x86-64 instructions to ARM64. The translated code is cached in executable memory regions-those mysterious RWXP regions we were seeing! The Root Cause Here's what was happening: JIT compilation: Java's JIT compiler generates x86-64 native code for hot methodsRosetta 2 intercepts: When x86-64 code executes, Rosetta 2 translates it to ARM64Translation cache: Translated ARM64 code is stored in 128 MB RWXP memory regionsGrowth: More JIT-compiled methods = more translations = more RWXP regions Evidence ObservationExplanationRWXP regions contain ARM64 codeRosetta 2's translated codeExactly 128 MB per regionRosetta 2 allocation granularityAnonymous (no file backing)Runtime translation cacheGrowth correlates with JIT activityMore compiled methods = more translations The Proof To definitively prove JIT was the trigger, we disabled JIT compilation using the -Xint flag: Java -Xint # Run in interpreter-only mode Results MetricBefore (JIT Enabled)After (JIT Disabled)RWXP Regions5 -> 12 -> 15 (growing)1 (stable, no growth)RWXP Memory~1.9 GB~128 MBGrowth RateMultiple regions/hour0 regions/hourCompiled Methods25,606 nmethods0 nmethods Result: With JIT disabled, RWXP growth completely stopped. Monitoring over 1+ hour confirmed zero growth. Why This Happens The Perfect Storm ARM64 host: M1 Mac (Apple Silicon)x86-64 container: Docker image built for AMD64Rosetta 2 enabled: Docker Desktop uses Rosetta 2 for emulationDynamic code generation: Java JIT compiler When all four conditions are met, Rosetta 2 must translate every JIT-compiled method from x86-64 to ARM64, storing the translations in executable memory regions that count toward process RSS. The Solution Option 1: Use Native ARM64 Images (Recommended) The best solution is to use ARM64-native Docker images: Shell # Build for ARM64 docker build --platform linux/arm64 ... # Or use multi-arch images docker pull --platform linux/arm64 your-image:tag Benefits: No Rosetta 2 translation neededNo RWXP growthBetter performance (native execution)Lower memory usage Option 2: Deploy to x86-64 Infrastructure If ARM64 images aren't available, deploy to x86-64 servers or cloud instances where Rosetta 2 isn't needed. Option 3: Accept and Monitor If you must use x86-64 containers on M1 Macs: Increase container memory limitsMonitor RWXP growthPlan for periodic restarts if needed Not Recommended Don't disable JIT in production (-Xint). While it stops RWXP growth, it dramatically reduces performance. Use it only for testing/debugging. Key Takeaways Rosetta 2 translation cache causes RWXP memory growth in x86-64 containers on ARM64 MacsJIT compilation is the primary trigger; each compiled method needs translationNative ARM64 images eliminate the problem entirelyThis is expected behavior, not a bug-it's the cost of emulation Conclusion What started as mysterious RSS growth turned out to be Rosetta 2's translation cache storing ARM64 translations of JIT-compiled Java code. By understanding the mechanism and testing with JIT disabled, we proved the root cause and identified the best solution: use native ARM64 images. If you're experiencing similar RSS growth in Java applications on M1 Macs, check for RWXP regions in your process memory map. If you see them, Rosetta 2 translation is likely the culprit. How to Check Shell # Check for RWXP regions cat /proc/PID/maps | grep rwxp # Count RWXP regions cat /proc/PID/maps | grep rwxp | wc -l # Check if Rosetta 2 is active cat /proc/PID/maps | grep rosetta Have you encountered similar issues? Share your experience in the comments below!

By Sumeet Sharma
How We Diagnosed a Hidden Scheduler Failure in a Docker Swarm Cluster Serving 2 Million Users
How We Diagnosed a Hidden Scheduler Failure in a Docker Swarm Cluster Serving 2 Million Users

Context: 120 Nodes, Strict SLAs, and Legacy Infrastructure Our team is responsible for the mobile backend infrastructure serving over 2 million registered users. The Docker Swarm cluster consists of 120 nodes: 5 manager nodes, 40 worker nodes, and the rest are infrastructure servers. The cluster runs about 50 services, totaling hundreds of replicas. We inherited Swarm from the previous contractor. The client is not yet ready to migrate to Kubernetes, and Swarm is currently sufficient for the current scale. Services are distributed across nodes in groups and bound by labels: up to 4 worker nodes are allocated to heavier services, 2 to less loaded ones, and 1 to non-critical services. Nodes can host replicas of multiple services. Our SLAs are strict: If any part of the mobile app is completely unavailable, we have 30 minutes to resolve the issue, after which penalties begin to accrue. What Happened The issue was detected thanks to a monitoring alert regarding the unavailability of service replicas. While investigating the incident in the manager-node logs, we found the following warning: Plain Text Mar 03 07:46:32 swarm3 dockerd[875]: time="2025-03-03T07:46:32.123554337Z" level=warning msg="underweighting node nt98wn9he8my6tsuasgkhrrjp for service 86jgkc35ctasmu8ubpnilsrqo because it experienced 5 failures or rejections within 5m0s" module=scheduler node.id=gaip86ri06jyrdwxcogl9j2p5 This message indicates that Swarm's internal scheduler is lowering the priority (weight) of a specific worker node when scheduling service tasks. The reason is 5 failures or rejections in the last 5 minutes. Swarm effectively excludes this node from the pool of candidates for running replicas. There was no critical downtime: Several replicas of the problematic services were running, and traffic was routed to the live instances. However, some replicas could not start — meaning the cluster was operating with reduced fault tolerance. With this SLA, that's a ticking time bomb. Why Swarm Lowers a Node's Weight Before describing our diagnosis, it's worth understanding the mechanics. Swarm lowers a node's weight for several reasons: Resource constraints. A container requires more CPU, memory, or disk space than is available on the node. Swarm cannot place the task and records a failure.Network issues. The node is unresponsive, or the connection is unstable. The manager loses contact with the worker and marks it as unreliable.Previous failed launches. If a container fails to start on a specific node several times in a row, Swarm temporarily excludes it from the list of candidates.Docker Daemon or hardware issues. Unstable Docker daemon operation or hardware failures lead to a cascade of failures when launching tasks.Mismatch between the number of replicas and the number of nodes with the required labels. This turned out to be our case. The service is bound to specific nodes via placement constraints with labels. If the number of replicas in the service configuration exceeds the number of nodes with the required label, the scheduler enters a cycle of failed placement attempts — even if there are enough free worker nodes in the cluster without that label.Service errors. The container starts but immediately terminates with an error or fails the health check. Swarm attempts to restart it, incrementing the failure count. What We Tried First The initial response to such errors is the standard set of steps: Rebuilding the service. We recreated the service using docker service update --force. The replicas restarted, but the problem returned after a few minutes.Changing the number of replicas. We reduced and then increased the number of replicas again. It didn't help.Reading container logs. The container logs themselves didn't show anything meaningful — the service was fine when it managed to start. None of this yielded a consistent result. It became clear that the problem wasn't with the service, but at the infrastructure level — specifically, in how the scheduler makes placement decisions. Troubleshooting: Identifying the Root Cause Step 1: Checking Node Status Shell docker node ls If any node has a status of Down or Unreachable, it is the first candidate. We look for the specific node mentioned in the error message: Shell docker node ls | grep nt98wn9he8my6tsuasgkhrrjp In our case, all nodes were in the Ready state — the issue wasn't related to availability. Step 2: Identify the Problematic Service Using the first 12 characters of the service ID from the log, we find its name: Shell docker service ls | grep 86jgkc35ctas Next, check the status of the tasks: Shell docker service ps 86jgkc35ctasmu8ubpnilsrqo Here you can see on which node the task failed to start and why: Rejected, Shutdown, No suitable node. Step 3: Checking Placement Constraints This is where we found the cause. Let's see what placement constraints are configured for the service: Shell docker service inspect 86jgkc35ctasmu8ubpnilsrqo \ --format '{{json .Spec.TaskTemplate.Placement}' | jq . The service was bound to nodes with a specific label. Let's check how many nodes have this label: Shell docker node ls --filter "label=cli=1" And then it became clear: The number of replicas in the service configuration exceeded the number of nodes with the required label. Most likely, the mismatch occurred during a routine service update, when the number of replicas was set higher than the number of available labeled nodes during reconfiguration. Replicas for which suitable nodes were found started normally, while for the rest, the scheduler repeatedly attempted to find a suitable node, received a rejection, and logged a failure. Step 4: Checking Resources (for a Complete Picture) Even after identifying the root cause, we checked the resources on the problematic nodes to rule out a combined issue: Shell docker node inspect nt98wn9he8my6tsuasgkhrrjp \ --format '{{json .Description.Resources}' | jq . And also the load directly: Shell top -o %CPU free -m df -h The resources were fine — it was confirmed that the issue was indeed due to a configuration mismatch. Solution Main action: We adjusted the number of service replicas to match the number of available nodes with the required label — we reduced the number of replicas in the .yml configuration file: YAML deploy: replicas: 2 # Match the number of nodes with the label After applying the updated configuration, the error disappeared — the scheduler no longer attempted to place replicas on non-existent nodes. Additionally, we reviewed the configuration of the remaining services, verifying that the number of replicas and nodes matched the required labels. We found several more services with a similar potential issue — and fixed them proactively. If the Cause Is Different, Additional Solutions Our specific case was related to a configuration error, but there are other scenarios that can cause the same error: Resource shortage. Free up space and clean up unused images: Shell docker system prune -a Or lower the limits for the service: Shell docker service update --limit-cpu 0.5 --limit-memory 512M <SERVICE_ID> Issues with the Docker Daemon on the node. Restart the daemon: Shell systemctl restart docker Temporarily excluding a problematic node. Switching to drain mode so that all tasks migrate to other nodes: Shell docker node update --availability drain <NODE_ID> Reconnecting the node to the cluster. If nothing else works, remove the node and add it again: Shell docker swarm leave --force docker swarm join --token <TOKEN> <MANAGER_IP>:2377 Conclusion This situation taught us a few things: The underweighting node error is a symptom, not a diagnosis. The same warning in the logs can stem from a wide variety of causes, ranging from a lack of resources to a configuration error. Configuration errors are the most insidious cause. In a cluster with dozens of services and labels, it's easy to introduce a mismatch between the number of replicas and available nodes during a routine update. The absence of downtime does not mean there is no problem. The cluster continued to operate thanks to live replicas, but it was running with reduced fault tolerance. One more failure, and the SLA would have been violated.

By Denis Tiumentsev
Mastering Kubernetes to Maximize Your Cloud Potential
Mastering Kubernetes to Maximize Your Cloud Potential

Kubernetes is often introduced as a container orchestrator. That’s like calling a modern city “a collection of buildings.” Technically correct, but wildly incomplete. In reality, Kubernetes is a layered ecosystem where storage, compute, networking, security, and developer workflows interlock like gears in a precision machine. If one gear slips, everything grinds. If all align, you unlock a platform that scales, heals, and evolves with your applications. After working through complex deployments, production outages, and cost optimization journeys, one truth stands out: Kubernetes mastery is not about knowing objects. It’s about understanding layers. Let’s break down the seven critical layers of Kubernetes and the tools that make them powerful. 1. Storage Layer: Where State Meets Reality Stateless is easy. Real-world systems aren’t. The storage layer ensures your applications don’t forget who they are every time a pod restarts. Key Components Persistent Volumes (PV) & Persistent Volume Claims (PVC): Abstract storage from workloads. Your app asks, Kubernetes provides.StorageClass & CSI (Container Storage Interface): Enable dynamic provisioning and seamless integration with cloud providers like AWS EBS, GCP PD, or Azure Disk. Why It Matters Without a well-designed storage strategy: Databases become fragileStateful apps become unreliableRecovery becomes painful This layer is the difference between ephemeral experiments and production-grade systems. 2. Compute / Runtime Layer: The Engine Room This is the layer most engineers start with, but ironically, it’s not where mastery ends. Core Primitives Pods: The smallest deployable unitDeployments: Declarative app managementReplicaSets: Ensure desired stateDaemonSets: One pod per node (great for agents) What It Solves Auto-healing (failed pods restart automatically)Horizontal scalingDeclarative infrastructure Hidden Complexity Misconfigured probes, resource limits, or rollout strategies can silently degrade performance or cause cascading failures. Compute is powerful, but blind compute is dangerous. 3. Observability Layer: Seeing the Invisible If Kubernetes is a living organism, observability is its nervous system. Without it, you’re operating blind. Essential Stack Prometheus + Grafana Metrics collection and visualizationLoki Log aggregation without heavy indexingOpenTelemetry Standardized tracing across distributed systems Why It Matters Detect anomalies before users doDebug distributed failuresUnderstand system behavior under load A cluster without observability is like flying a plane without instruments. You may stay airborne… until you don’t. 4. Networking Layer: The Silent Enabler Kubernetes networking “just works”… until it doesn’t. Core Components Services Stable internal communication (ClusterIP, NodePort, LoadBalancer)CNI (Container Network Interface) Handles pod-to-pod communicationIngress Manages external access to services Real Challenges Debugging network policiesHandling cross-cluster communicationManaging latency and service mesh complexity Networking is often underestimated because it’s invisible when functioning and painfully obvious when broken. 5. Security Layer: Guardrails, Not Afterthoughts Security in Kubernetes is not a feature. It’s a discipline. Key Tools RBAC (Role-Based Access Control) Define who can do whatOPA (Open Policy Agent) Enforce admission policiesKyverno Kubernetes-native policy managementPod Security Standards (PSS) Baseline security enforcement Why It Matters Without strong policies: Privilege escalation becomes trivialMisconfigurations slip into productionCompliance becomes reactive instead of proactive Modern Kubernetes security is about policy-as-code, not manual reviews. 6. Developer & DevOps Tooling: Speed Without Chaos Kubernetes can either accelerate developers… or slow them down dramatically. The difference lies in tooling. Key Tools Skaffold & Tilt Rapid local development and feedback loopsHelm Package management for KubernetesKustomize Environment-specific customization without templating What This Layer Enables Faster iteration cyclesStandardized deploymentsReduced cognitive load for developers Without this layer, Kubernetes becomes an operational burden rather than a developer platform. 7. CI/CD & GitOps: The Control Plane for Change This is where Kubernetes evolves from infrastructure to platform. Core Tools: ArgoCD & Flux GitOps-driven continuous deliveryTekton Kubernetes-native CI pipelinesJenkins X Cloud-native CI/CD automation Why GitOps Wins: Git becomes the single source of truthChanges are auditable and reversibleDrift detection is automatic Instead of pushing changes to the cluster, the cluster pulls desired state from Git. That subtle shift changes everything. The Bigger Picture: Kubernetes as a System of Systems Each layer solves a specific problem: Individually, they’re powerful. Together, they form a self-healing, scalable, policy-driven platform. Final Thought Most teams struggle with Kubernetes not because it’s complex, but because they approach it as a tool instead of a system. You don’t “use Kubernetes.” You operate an ecosystem. And the moment you start thinking in layers instead of YAML files, everything begins to click. Which Kubernetes layer challenges you the most today? Observability gaps?Security policy chaos?GitOps adoption struggles? If you’re facing these, it might be time for a Kubernetes maturity or reliability audit. The bottleneck is rarely where you think it is.

By Jaswinder Kumar
AI Agents for DevOps on Kubernetes Need Real Engineering, Not Magic
AI Agents for DevOps on Kubernetes Need Real Engineering, Not Magic

In a real Kubernetes cluster, incidents rarely appear as a single, clean alert. They arrive as waves of Kubernetes events, latency spikes, pod restarts, rollout failures, and unpredictable autoscaling behavior all at once. The hard part is usually not “Can we fix it?” but “Can we understand what’s happening fast enough to make a safe decision?” AI agents for DevOps can help here — but only when they sit on solid engineering foundations. They should compress the early correlation and triage phase, not take opaque, unsafe control of production. Google’s 2024 DORA report underlines why this matters: more than 75 percent of respondents now rely on AI for at least one professional task each day, and over one‑third report moderate to extreme productivity gains, yet 39 percent still have little to no trust in AI‑generated code. That gap between use and trust is exactly where our architecture and guardrails matter. Why Incident Triage Needs Help Now Traditional AIOps pitches often promise full automation, but most SREs do not want a black‑box system taking unilateral action in production. What they need is help with triage: Grouping noisy alerts into a single incident viewCorrelating Kubernetes events, metrics, and recent rolloutsProposing safe, reversible next steps — not silently applying risky changes The DORA research still centers on the same four key metrics: lead time, deployment frequency, change failure rate, and time to restore service. AI can absolutely improve developer productivity and documentation, but it can also undermine delivery stability when used on top of weak fundamentals such as oversized batch changes and poor test coverage. For a broader perspective on integrating DevOps services, see "Incorporating DevOps Services into Software Development." Traceable – every recommendation is explainable from telemetry and cluster stateAuditable – logs and decisions reviewable after the factReversible – actions easy to roll backLeast‑privilege – permissions constrained by Kubernetes RBAC Architecture Overview LayerResponsibilityKey TechnologiesTelemetry captureCollect traces, metrics, logs, and Kubernetes eventsOpenTelemetry CollectorEvent busBuffer and fan‑out telemetryKafkaLightweight consumerNormalize/enrich data, build incident contextCustom serviceAI agent layerTriage, correlate, draft next actionsCrewAI, Llama via OllamaControlled executionSafe, reversible scaling under RBACKubernetes RBAC, scale subresource Related: DZone's "10 Best Practices for Managing Kubernetes at Scale." The pattern that consistently holds up under load uses simple, composable layers: OpenTelemetry collector – capture traces, metrics, logs, and Kubernetes eventsKafka event bus – buffer, fan‑out, and replay telemetryLightweight consumer – normalize signals into “incident contexts.”AI agent layer – CrewAI agents backed by Llama 3.1 via OllamaSlack approval – humans approve or reject remediation stepsRBAC‑limited scaling – Kubernetes permissions restricted to the scale subresource Each layer can be tested, inspected, and replaced without rewriting the entire system. Why OpenTelemetry Fits Kubernetes OpenTelemetry Collector gives you one place to capture multi‑signal telemetry—traces, metrics, logs, and Kubernetes events — with pluggable receivers and exporters. Key points for Kubernetes: The k8sevents receiver (in contrib distributions) captures events from the Kubernetes API server and converts them into logs.Kubernetes events are short‑lived in the cluster (often an hour or less) and are not persisted long term; exporting them via OpenTelemetry preserves them for incident analysis.Events complement, but do not replace, application logs and traces; they describe what Kubernetes is doing to your workloads (e.g., scheduling failures, image pull errors, autoscaling decisions). Why Kafka Belongs in the Middle Dropping all telemetry straight into an AI model couples your reasoning to whatever the cluster happens to emit at that moment. Kafka gives you a much sturdier backbone: Replayable telemetry – reproduce incident contexts for testing and post‑mortemsMultiple consumers – feed different tools (dashboards, anomaly detectors, AI agents) from the same topicsDecoupled ingestion and analysis – collectors push at their own pace, consumers pull at theirs Kafka does not fix bad metric names or broken alert rules, but it does give you a consistent, durable pipe to reason about. A typical OpenTelemetry Collector configuration for this pattern looks like this (simplified): YAML text receivers: otlp: protocols: grpc: endpoint: 0.0.0.0:4317 http: endpoint: 0.0.0.0:4318 k8sevents: namespaces: [production, staging] processors: memory_limiter: check_interval: 1s limit_mib: 512 spike_limit_mib: 128 batch: timeout: 10s send_batch_size: 1000 send_batch_max_size: 1500 exporters: kafka: brokers: - kafka-1.example.com:9092 - kafka-2.example.com:9092 - kafka-3.example.com:9092 retry_on_failure: enabled: true sending_queue: enabled: true traces: topic: otel-traces encoding: otlp_proto metrics: topic: otel-metrics encoding: otlp_proto logs: topic: otel-logs encoding: otlp_proto service: pipelines: traces: receivers: [otlp] processors: [memory_limiter, batch] exporters: [kafka] metrics: receivers: [otlp] processors: [memory_limiter, batch] exporters: [kafka] logs: receivers: [otlp, k8sevents] processors: [memory_limiter, batch] exporters: [kafka] This keeps the collector focused on one job: getting signals in and pushing them reliably to Kafka. Why a Separate Consumer Layer Matters It is tempting to point your AI agents directly at Kafka topics, but that couples fragile prompt engineering with noisy raw data. A thin consumer service in the middle gives you a deterministic place to: De‑duplicate repeated events and alertsJoin pod‑level signals to the Deployment and Service metadataAttach rollout information (who changed what, when, and via which pipeline)Apply simple rules (“ignore known‑benign events,” “group alerts by owner team”) before AI sees them This consumer produces a single “incident context” document per active incident. AI agents then reason over this structured context instead of a firehose of raw logs. A straightforward Kubernetes Deployment for the consumer might look like this: YAML text apiVersion: apps/v1 kind: Deployment metadata: name: incident-context-consumer spec: replicas: 2 selector: matchLabels: app: incident-context-consumer template: metadata: labels: app: incident-context-consumer spec: serviceAccountName: agent-runner containers: - name: consumer image: your-registry/incident-consumer:v1.0.0 env: - name: KAFKA_BROKERS value: "kafka-1:9092,kafka-2:9092,kafka-3:9092" - name: INCIDENT_TOPIC value: "otel-logs" - name: OUTPUT_TOPIC value: "incident-contexts" AI Agent Layer With CrewAI and Llama 3.1 On top of incident contexts, we can deploy a small CrewAI‑based agent layer. Meta’s Llama 3.1 models are available in 8B, 70B, and 405B parameter sizes, and the llama3.1:8b variant runs comfortably on a single modern GPU or even a beefy workstation via Ollama. We split responsibilities into three agents: Triage Agent – groups related alerts, assigns severity, and identifies the likely owning teamDiagnosis Agent – correlates Kubernetes events, metrics, and rollout changes to propose the most likely root causeExecutor Agent – drafts safe, reversible next steps and requests human approval A minimal CrewAI definition might look like this (illustrative): Python from crewai import Agent, Task, Crew from llmclient import Llama31Client from tools import K8sTool, SlackTool, PrometheusTool llm = Llama31Client( endpoint="http://ollama-gateway:11434", model="llama3.1:8b" ) triage_agent = Agent( role="Incident Triage Engineer", goal="Group related alerts and identify likely impact and owning team.", tools=[K8sTool, SlackTool], llm=llm, ) diagnosis_agent = Agent( role="Correlation Analyst", goal="Correlate Kubernetes events with metrics and recent rollout data.", tools=[PrometheusTool, K8sTool], llm=llm, ) executor_agent = Agent( role="Runbook Automator", goal="Draft safe, reversible next steps and send them for approval.", tools=[K8sTool, SlackTool], llm=llm, ) crew = Crew( agents=[triage_agent, diagnosis_agent, executor_agent], tasks=[ Task(description="Triage incident context and assign severity.", agent=triage_agent), Task(description="Diagnose probable causes.", agent=diagnosis_agent), Task(description="Draft a safe remediation step and request approval.", agent=executor_agent), ], ) The key is that only the Executor Agent proposes actions, and even then, those actions are routed through Slack for explicit human approval. RBAC: Safe, Scale‑Only Permissions Kubernetes RBAC lets you grant fine‑grained permissions to specific subresources, including deployments/scale. This is exactly what we want for an AI‑assisted incident system: the ability to scale workloads up or down, without the power to change container images, environment variables, or security settings. Scaling is reversible and far safer than mutating Deployment specs. See the official Kubernetes RBAC docs for full details on subresource permissions. A typical “scaling‑only” role for agents looks like this: YAML text apiVersion: v1 kind: ServiceAccount metadata: name: agent-runner namespace: default --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: deployment-scaler rules: # Read deployments and replica sets to understand current state - apiGroups: ["apps"] resources: ["deployments", "replicasets"] verbs: ["get", "list", "watch"] # Scale deployments via the scale subresource - apiGroups: ["apps"] resources: ["deployments/scale"] verbs: ["get", "update", "patch"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: agent-runner-deployment-scaler roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: deployment-scaler subjects: - kind: ServiceAccount name: agent-runner namespace: default By operating only on the `/scale` subresource, you give the agent layer exactly enough power to adjust replica counts and nothing else. See DZone's Implementing RBAC Configuration for Kubernetes Applications for more RBAC patterns. How a Real Incident Flows When a rollout goes wrong, or a dependency starts failing, a typical incident flows like this through the system: Telemetry capture: The OpenTelemetry Collector gathers metrics, traces, logs, and Kubernetes events, and exports them to Kafka.Context building: The consumer service reads relevant records from Kafka and builds an “incident context” (involving namespaces, Deployments, pods, events, SLOs, and recent changes).AI‑assisted triage: The Triage Agent classifies severity (e.g., SEV‑1 vs SEV‑3), identifies impacted services, and tags likely owner teams.Correlation and diagnosis: The Diagnosis Agent matches restart reasons (ImagePullBackOff, OOMKilled, CrashLoopBackOff, etc.) with rollout timelines and metric anomalies to propose plausible root‑cause hypotheses.Drafting a reversible action: The Executor Agent proposes a small, clearly reversible change: for example, temporarily scaling a canary deployment from 10 replicas back to 2, or scaling a known‑stable previous version up to absorb traffic.Human approval: The proposed command and rationale are posted to a Slack incident channel. An on‑call SRE or incident commander explicitly approves or rejects the action.Execution under RBAC: If approved, the agent uses its deployments/scale permissions to apply the change. Every call is logged and auditable. For a deeper context for incident response, see DZone's Incident Response Guide. Where This Pattern Works Best (and Where It Doesn’t) This architecture is strongest when: Telemetry is clean and labeled (good metric names, consistent labels, sane alerts)Triage, not remediation, is the bottleneckRunbooks already exist with reversible actionsPlatform teams are comfortable owning Kafka and the consumer service It is less effective when: Every incident is truly novel and unstructuredData is sparse or heavily delayedOrganizational trust in automation is low, and there is no appetite for experimental changesThe AI endpoint itself has no SLOs, rate limits, or clear failure modes Final Thoughts This pattern fits squarely within the 2024–2026 shift toward platform engineering and AI-augmented DevOps workflows, but it succeeds only when built on strict operational guardrails. The goal isn't to replace humans in the incident response loop — it's to dramatically compress the time between "something broke" and "we understand the blast radius and have safe, reversible recovery options on the table." AI agents excel at grouping noisy Kubernetes signals into coherent incident contexts and proposing next steps grounded in telemetry and recent changes. Humans remain the final decision-makers for production actions, retaining full control through Slack approval gates and Kubernetes RBAC constrained to the safe scale subresource. When telemetry is clean, runbooks exist, and platform teams can own the Kafka/consumer layers, this architecture delivers measurable wins in mean time to understanding. When incidents remain truly novel or organizational trust in automation is low, it gracefully falls back to human-led triage. Either way, the system stays transparent, auditable, and reversible — never expanding blast radius through opaque "magic.

By Abdul Majid Qureshi
Java Backend Development in the Era of Kubernetes and Docker
Java Backend Development in the Era of Kubernetes and Docker

We moved our monolithic Java application to Kubernetes last year. The promise was scalability and resilience. The reality was a series of silent failures during deployments. Users reported dropped connections every time we pushed a new version. Our monitoring showed zero downtime, but the customer experience told a different story. Requests vanished into the void during rolling updates. We spent weeks chasing network ghosts before finding the root cause. The issue was not the network. It was how our Java application handled termination signals. In this article, I will share how we adapted our Java backend for container orchestration. I will explain the specific lifecycle issues we encountered. I will detail the configuration changes that solved the dropout problem. This is not a guide on writing Dockerfiles. It is a record of the operational friction we faced when Java met Kubernetes. Building cloud-native Java apps requires more than just packaging a JAR. It requires understanding how the orchestration layer interacts with the JVM. The Silent Dropout Problem Our deployment strategy used standard Kubernetes rolling updates. The controller would start a new pod before killing the old one. This should ensure zero downtime. Our users still reported errors during these windows. We checked the service logs. The old pods stopped accepting traffic instantly upon receiving the kill signal. The Kubernetes service endpoint removed the pod IP immediately. There was a gap between traffic cessation and process termination. In-flight requests died mid-stream. Java applications do not shut down instantly. They need time to finish processing current requests. They need to close database connections gracefully. Our Spring Boot app ignored the termination signal initially. It kept running until the kernel killed it. This hard kill interrupted active transactions. Data consistency was at risk. We needed to implement a graceful shutdown sequence. Implementing Graceful Shutdowns We started by configuring Spring Boot to handle shutdown signals. The framework provides a property for this. We enabled it in our application configuration. This told Spring to stop accepting new requests upon shutdown. It allowed existing requests to complete within thirty seconds. This was a good start, but it was not enough. Kubernetes sends a SIGTERM signal to the container. The JVM catches this signal. The application starts shutting down. Kubernetes waits for a preStop hook or the termination grace period. If the app takes too long, Kubernetes sends SIGKILL. We added a preStop hook to our deployment manifest. This script sleeps for a few seconds before allowing the container to stop. This delay ensures the Kubernetes service removes the pod IP from the load balancer before traffic stops flowing. This five-second sleep bridged the gap. The service mesh updated its endpoints. Traffic stopped routing to the terminating pod. Then the application began its graceful shutdown. No in-flight requests were dropped. The error rate during deployments dropped to zero. Configuration Management Challenges Configuration management was another pain point. We used ConfigMaps to store environment settings. Kubernetes mounted these as files inside the container. Our Java app reads these files at startup. Changing a ConfigMap triggered a rollout. Every config change restarted all pods. This was disruptive for minor tweaks. We wanted hot reloading for certain properties. Spring Cloud Kubernetes supports this feature. It watches for ConfigMap changes and refreshes the context. We enabled the reload strategy. This allowed us to update logging levels without restarting pods. It reduced deployment frequency for operational changes. However, we learned to be careful. Reloading the entire context can be heavy. We restricted hot reload to specific beans. Critical infrastructure settings still required a restart. This balance reduced risk while improving agility. Logging in a Distributed Environment Legacy Java apps often write logs to local files. This pattern fails in Kubernetes. Containers are ephemeral. When a pod dies, the local disk disappears. Logs vanish with it. We needed to stream logs to stdout. Kubernetes captures stdout and sends it to the logging driver. We reconfigured our Logback setup. We removed file appenders. We added a console appender with JSON formatting. Structured logs are easier for aggregation tools to parse. This change integrated us with our ELK stack seamlessly. We could trace requests across multiple pods. We could search logs without accessing individual containers. This visibility was crucial for debugging production issues. It also reduced disk IO within the container. The application ran lighter without file writes. Security and User Context Running Java as root in a container is a security risk. If an attacker escapes the JVM, they gain root access to the node. We audited our Docker images. The base images ran as root by default. We created a non-root user in our Dockerfile. This simple change reduced our attack surface. However, it introduced permission issues. The application could not write to certain directories. We had to adjust volume mounts. We ensured the tmp directory was writable by the new user. This step is often overlooked during migration. Testing security contexts in staging is essential. Resource Limits and JVM Awareness We faced memory issues early in the migration. The JVM did not know about container limits. It allocated a heap based on host memory. The container got OOMKilled repeatedly. We fixed this by using percentage-based flags. This ensured the JVM respected the cgroup limits. It left room for non-heap memory. We also set requests and limits in Kubernetes. Requests guaranteed resources for scheduling. Limits prevented runaway processes from starving neighbors. This alignment between JVM and Kubernetes was critical for stability. Health Checks and Startup Probes Java applications can be slow to start. Loading classes and connecting to databases takes time. Kubernetes liveness probes might kill the pod before it is ready. We used startup probes to handle this. The startup probe disables liveness checks until it succeeds. This gave our app up to five minutes to start. Once ready, the liveness probe took over. This prevented premature restarts during cold starts. It also protected us during heavy garbage collection pauses. The app remained healthy even if response times spiked temporarily. Lessons Learned and Best Practices Our journey taught us several key lessons. We incorporated these into our development standards. Handle SIGTERM. Always configure graceful shutdown. Do not rely on default behavior.Use preStop hooks. Bridge the gap between service discovery and process termination.Log to stdout. Never write to local files in containers. Use structured logging.Run as non-root. Reduce security risks by dropping privileges.Tune JVM for containers. Use percentage-based memory flags. Respect cgroup limits.Configure probes. Use startup probes for slow-starting applications. Tune liveness thresholds.Test failure modes. Simulate pod kills in staging. Verify no data loss occurs. Conclusion Moving Java to Kubernetes is more than just an infrastructure change; it is a fundamental shift in how we design, build, and operate software. Over time, we learned that the orchestration layer introduces new requirements. Graceful shutdowns, proper logging, and resource management are now fundamental for reliability. As a result, our application is resilient to both deployments and runtime failures. We can trust the platform to manage our workloads efficiently while we focus on delivering features. We continue to refine our patterns as the ecosystem evolves and best practices emerge. Java remains a powerful tool for backend development — it just requires a new mindset for the cloud-native era. Happy coding, and always keep your containers healthy.

By Ramya vani Rayala
Java in a Container: Efficient Development and Deployment With Docker
Java in a Container: Efficient Development and Deployment With Docker

There is a specific kind of frustration reserved for Java developers who have just containerized their application. You spend hours optimizing your Spring Boot microservice, ensuring your logic is sound and that your tests pass. You wrap it in a Docker container, push it to the registry, and deploy. Then the reality sets in. Your image is 800MB, your startup time is 40 seconds, and during load testing, the container is killed silently by the OS. In my recent work, migrating a monolithic Java application to a microservices architecture, we faced this exact triad of issues. We were treating Docker containers like lightweight virtual machines and ignoring the nuances of how the JVM interacts with container boundaries. The result was bloated infrastructure costs, slow CI/CD pipelines, and unstable production pods. In this article, I will walk through the inefficiencies we uncovered and the specific Docker and JVM configurations that resolved them. I will detail the best practices we adopted to ensure our Java containers are both lean and resilient. This is not just about writing a Dockerfile. It is about understanding the runtime environment. The Problem: The Fat JAR Antipattern Our initial Dockerfile was straightforward and perhaps too straightforward. We were using a single-stage build that copied our built fat JAR into a standard JDK image. On the surface, this looks fine. However, this approach bundles every dependency, every library, and the entire JDK into a single layer. Whenever we changed a single line of code, the entire JAR was rebuilt. This invalidated the Docker cache for that layer. This meant our CI pipeline had to push hundreds of megabytes of unchanged data for every commit. Furthermore, we were using a full JDK image in production. For running a Java application, we do not need the compiler or development tools. This unnecessary bloat increased our attack surface and memory footprint. We realized that our build strategy was optimized for simplicity rather than efficiency. This is a common trap for teams moving to containers for the first time. Diagnosis: Analyzing Image Layers and Memory To understand the bottleneck, we used dive. This is a tool for exploring Docker images. It revealed that 90 percent of our image size was comprised of dependencies that rarely changed. Only 10 percent was our actual application code. Simultaneously, we noticed intermittent OOMKilled errors during peak traffic. Despite setting -Xmx512m, the container would crash when memory usage hit the limit. This mirrored the Kubernetes issues many face, but it originated in how we defined the Docker runtime limits. The JVM was not aware it was running in a constrained environment. This led it to allocate heap space based on the host memory rather than the container limit. We realized that the Linux kernel was killing the process because the total memory usage exceeded the cgroup limit. The heap was only part of the equation. Non-heap memory usage was the hidden variable causing the crashes. The Solution: Multi-Stage Builds and Layering The first fix was adopting a multi-stage build. This allows us to build the artifact in one container and run it in a much smaller optimized runtime container. Switching to a JRE instead of a JDK reduced the base image size significantly. Using Alpine Linux further shaved off megabytes. However, we could go deeper. Spring Boot 2.3 plus introduced layered JARs. By default, a Spring Boot JAR is organized into layers. These include dependencies, spring-boot-loader, snapshot dependencies, and application code. Dependencies change infrequently while application code changes constantly. By exploiting this, we can cache dependency layers in Docker. With this configuration, changing a Java class only invalidates the top application layer. The heavy dependencies layer remains cached. In our CI pipeline, this reduced build times by 60 percent and image push times by 75 percent. This improvement allowed our developers to get feedback much faster. It also reduced the bandwidth costs associated with pushing images to the registry. JVM Awareness: Configuring for Containers Addressing the memory crashes required tuning the JVM. Modern Java versions are container-aware, but they still need guidance to operate efficiently within Docker cgroups. We stopped using fixed heap sizes, such as -Xmx512m. Instead, we switched to percentage-based flags. This ensures the JVM adapts when we later change the Docker memory limit without rebuilding the image. Setting MaxRAMPercentage to 75 percent reserves the remaining 25 percent for non-heap memory. This includes threads, metaspace, and code cache. This prevents the Linux OOM killer from terminating the process when off-heap usage spikes. We also added -XX:+ExitOnOutOfMemoryError to ensure the container restarts cleanly rather than hanging in a degraded state. We learned that the default JVM behavior assumes it has access to all host memory. This assumption is fatal in a containerized environment. The container limits are enforced by the kernel, and the JVM must respect them. Using percentage-based flags is the most robust way to ensure this respect. Security and Best Practices Efficiency is not just about speed and size. It is about security. Running Java as the root user inside a container is a significant risk. If an attacker exploits a vulnerability in the application, they gain root access to the container. We added a non-root user to our Dockerfile. Additionally, we implemented health checks directly in the Dockerfile. This allows the orchestrator to detect unresponsive applications quickly. This configuration ensures that Kubernetes or Docker Swarm can restart unhealthy pods automatically. It reduces the mean time to recovery during incidents. We also considered using distroless images for even greater security. These images contain only the application and its runtime dependencies. They do not include a shell or package manager. This reduces the attack surface significantly. However, debugging can be harder without shell access. We decided to stick with Alpine for now, but plan to migrate to distroless in the future. Monitoring Container Health Once the application was deployed, we needed to ensure it stayed healthy. We integrated Prometheus to scrape metrics from the Spring Boot Actuator endpoint. This gave us visibility into JVM memory, GC pauses, and thread counts. We set up alerts for high memory usage and high GC pause times. This allowed us to catch issues before they caused outages. We also monitored the container restart count. A high restart count indicated instability. This metric helped us identify pods that were struggling to stay alive. Lessons Learned and Best Practices Our journey taught us several valuable lessons. We incorporated these into our development standards. Always use multi-stage builds. Single-stage builds are convenient but inefficient. Multi-stage builds produce smaller and more secure images.Leverage layer caching. Order your Dockerfile commands to maximize cache hits. Copy dependencies before copying source code.Tune JVM for containers. Use percentage-based memory flags. Never assume the JVM knows the container limits.Run as non-root. Reduce security risks by dropping privileges. Create a dedicated user for the application.Implement health checks. Allow the orchestrator to detect failures quickly. Use actuator endpoints for health checks.Monitor continuously. Use metrics to track container health. Set alerts for memory and GC issues.Test under load. Simulate production traffic in staging. Verify that memory usage stays within limits. Conclusion Containerizing Java applications requires more than just wrapping a JAR in a Docker image. It demands an understanding of layer caching, JVM memory management, and security contexts. By moving to multi-stage builds, leveraging Spring Boot layers, and configuring JVM flags for container awareness, we transformed our deployment process. Our images shrank from 800 MB to under 200 MB. Build times dropped significantly and allowed for faster feedback loops. Most importantly, the silent crashes disappeared. They were replaced by stable and predictable memory usage. If you are still using single-stage builds or fixed heap sizes, I encourage you to revisit your Docker configuration. The efficiency gains are not just incremental. They fundamentally change how resilient and cost-effective your Java infrastructure becomes. Docker got us thinking differently about deployment. Let us make sure we are using it to its full potential.

By Ramya vani Rayala
The Pod Prometheus Never Saw: Kubernetes' Sampling Blind Spot
The Pod Prometheus Never Saw: Kubernetes' Sampling Blind Spot

The Fix That Doesn't Fix It Reducing your Prometheus scrape interval from 15 seconds to 5 seconds does not fix the sampling blind spot. It moves it. Any pod whose entire lifetime falls within one 5-second scrape gap is still structurally invisible — not because of misconfiguration, not because of missing rules, but because poll-based collection has an irreducible sampling gap that no interval setting eliminates. This article explains exactly why that is, what it costs in production, and what actually fixes it. What Is the H5 Evidence Horizon? Kubernetes evidence horizons are deterministic points after which specific diagnostic context becomes permanently unrecoverable. H5 — the scrape-interval sampling blind spot — is the only horizon that prevents observability data from being created in the first place. Unlike H1 (LastTerminationState rotation at ~90 seconds) or H2 (scheduler event pruning at 1 hour), H5 has no timer and no API call. It fires silently for every pod whose entire lifetime falls within one Prometheus scrape gap. The full evidence horizon taxonomy is documented at opscart.com/kubernetes-evidence-horizons-h2-h3-h4-h5/. Why Poll-Based Observability Has an Irreducible Blind Spot Prometheus collects metrics by sending HTTP requests to targets at a fixed interval. The default scrape interval in kube-prometheus-stack is 15 seconds. Every 15 seconds, Prometheus asks the world: "What is your current state?" This model works exceptionally well for persistent, long-running workloads. A deployment that has been running for hours will be scraped hundreds of times. Its CPU trends, memory patterns, and request rates are captured with high fidelity. It fails completely for ephemeral workloads — and Kubernetes generates ephemeral workloads by design. The math is straightforward. Given a scrape interval S and a pod lifetime L: If L > S: the pod will be scraped at least once, generating at least one data pointIf L < S: the pod may generate zero data points — not because of any failure in Prometheus, but because it never existed between two consecutive scrape cycles This is not a probability statement. It is deterministic. A pod with a 6-second lifetime and a 15-second scrape interval will generate exactly zero Prometheus data points if its entire lifetime falls within one scrape gap. There is no configuration change that fixes this for that specific pod in that specific gap. The only way to eliminate the blind spot entirely is to move from a poll-based model to an event-driven model. And this is precisely the architectural distinction that most observability discussions miss. The Ghost Pod Experiment To validate this claim empirically, I ran a controlled experiment on a 3-node Minikube cluster (Kubernetes 1.31, Apple M-series hardware). Setup: Pod memory limit: 64MiPod memory allocation: 128Mi (guaranteed OOMKill)Prometheus scrape interval: 15s (kube-prometheus-stack default)Pod name: ghost-pod, namespace: oma-sampling What happened: The pod started, allocated memory beyond its limit, and was OOMKilled by the kernel at T+5s. Total observed pod lifetime: 6 seconds. Prometheus result: SQL # Query executed the morning after the experiment $ promql: container_cpu_usage_seconds_total{pod="ghost-pod"} {} # empty — 0 data points $ promql: kube_pod_container_status_last_terminated_reason{pod="ghost-pod"} {} # empty — 0 data points $ kubectl get pod ghost-pod -n oma-sampling Error from server (NotFound): pods "ghost-pod" not found Zero data points. No alert. No record. From Prometheus's perspective, ghost-pod never existed. Event-driven result: An OMA (Operational Memory Architecture) collector subscribed to the Kubernetes watch API captured the following at the moment of occurrence: SQL OOMKill P001 captured at T+5s pod: ghost-pod namespace: oma-sampling exit_code: 137 memory_limit: 64Mi node: opscart-m03 timestamp: 2026-04-18T23:38:06Z The causal evidence — exit code, resource limits, node placement — captured at occurrence. No scrape gap. No sampling window. The watch API delivers every pod state transition at the moment it fires, regardless of timing. Poll-based vs event-driven architecture: a pod with a 6-second lifetime falls entirely within one 15-second Prometheus scrape gap, generating zero data points. An event-driven collector subscribed to the Kubernetes watch API captures the OOMKill at occurrence — no sampling gap exists by architecture. "Just Reduce the Scrape Interval" This is the most common response when engineers first encounter the H5 blind spot. It deserves a direct answer. Reducing the scrape interval from 15s to 5s does not eliminate the blind spot. It shifts the threshold from 15 seconds to 5 seconds. Any pod whose lifetime falls within one 5-second scrape gap is still structurally invisible. Consider the real-world distributions: CrashLoopBackOff with OOMKill on startup: A pod that allocates memory before its first checkpoint can OOMKill in under 1 second. No scrape interval short of continuous polling catches this. Init container failures: Init containers that fail immediately may have lifetimes measured in milliseconds. These are architecturally invisible to any poll-based system, regardless of scrape interval. Batch job bursts: Short-lived Job pods in a batch processing cluster can complete their entire lifecycle — start, run, succeed, or fail — within a single scrape gap at any reasonable interval. Reducing the scrape interval also has real costs: Storage: Prometheus metric storage grows proportionally with scrape frequency. Moving from 15s to 5s triples your time-series storage requirements.Cardinality: More frequent scrapes of high-cardinality metrics (per-pod, per-container) increase label cardinality and query latency.Target load: Every scrape is an HTTP request to your metrics endpoints. High scrape frequencies create measurable load on instrumented services. You are paying a real cost to shift the threshold — not to eliminate it. For workloads with sub-second or sub-5-second lifetimes, no scrape interval is fast enough. Why the Watch API Is Structurally Different The Kubernetes watch API is not a faster poll. It is a fundamentally different delivery mechanism. When you run kubectl get pods --watch, you are not asking Kubernetes "what is the current pod state every N seconds." You are opening a long-lived HTTP connection to the API server and subscribing to a stream of state change events. Every time a pod transitions — from Pending to Running, from Running to Terminated, from any state to OOMKilled — the API server pushes that transition to every active watcher. The delivery is at-occurrence. There is no polling interval. There is no sampling gap. If a pod OOMKills at T=17.3 seconds, the watch API delivers that event at T=17.3 seconds — not at the next scrape boundary. This means the H5 blind spot does not exist for event-driven collectors by architecture. A pod with a 6-second lifetime generates exactly one OOMKill transition event. That event is delivered to every watcher at the moment it fires. The watcher captures it. Done. The practical implication: event-driven collection provides complete coverage of pod lifecycle events regardless of pod lifetime, without any configuration tuning. What Sampling Blind-Spot Costs in Production The blind spot has three concrete operational consequences. Undetected crash loops. A pod in CrashLoopBackOff with a very short failure cycle can OOMKill dozens of times per hour without generating a single Prometheus alert. The restart counter increments in kubectl get pods output, but if nobody is looking at that specific pod, the pattern goes undetected. By the time an engineer investigates, the pod may have crashed hundreds of times with no metric record of any individual failure. Incomplete capacity planning. Short-lived batch pods that OOMKill during processing spikes are invisible to Prometheus-based capacity analysis. Your memory utilization reports show only long-running pods. The actual peak memory demand — which caused the batch pod OOMKills — never appears in your capacity data. Silent compliance gaps. In pharmaceutical and financial production environments with audit requirements, unrecorded container failures are a compliance problem. An auditor asking "what failed in this namespace between 2 AM and 4 AM on this date" deserves a complete answer. A Prometheus query that returns empty results for pods that actually OOMKilled is not a complete answer. The Structural Fix The H5 blind spot cannot be patched within a poll-based architecture. The fix is additive: complement Prometheus with an event-driven collector that subscribes to the Kubernetes watch API. This does not mean replacing Prometheus. Prometheus remains the right tool for what it does — metric aggregation, trend analysis, alerting on long-running workloads. The event-driven collector handles what Prometheus cannot: discrete lifecycle events for pods of any duration. The implementation I've validated uses a Go-based collector subscribing to CoreV1().Pods(namespace).Watch(). On each Modified event, the collector inspects ContainerStatus for OOMKill signals and captures the full forensic context synchronously — before the pod restarts and overwrites LastTerminationState. Go // Simplified watch loop watcher, _ := clientset.CoreV1().Pods(namespace).Watch( ctx, metav1.ListOptions{}) for event := range watcher.ResultChan() { pod := event.Object.(*corev1.Pod) for _, cs := range pod.Status.ContainerStatuses { if cs.LastTerminationState.Terminated != nil { reason := cs.LastTerminationState.Terminated.Reason if reason == "OOMKilled" { captureOOMKillEvidence(pod, cs) } } } The watch API delivers the event at occurrence. The capture is synchronous. No polling gap. No sampling threshold. Ghost pods are no longer invisible. Full implementation with reproducible Minikube scenarios is at github.com/opscart/k8s-causal-memory. H5 in Context: The Evidence Horizon Taxonomy H5 is one of five evidence destruction mechanisms I've identified and formalized as an evidence horizon taxonomy. The full taxonomy: HorizonTriggerWhat's lostH1Pod restart (~90s)OOMKill forensics, limits, ConfigMapsH2Event TTL (1hr/1000)Scheduler placement rationaleH3Debug session exitkubectl debug exit code, durationH4Kubelet restartIn-memory operational stateH5Scrape intervalSub-interval pod lifetimes H5 is unique in the taxonomy: H1 through H4 destroy the Kubernetes API state that previously existed. The scrape-interval blind spot prevents observability data from being created in the first place. It is the only horizon that requires no destruction event — the evidence simply never reaches any persistent store. The full taxonomy with empirical validation across Minikube and AKS 1.32.10 is documented in the canonical OpsCart article: Beyond the 90-Second Gap and in the research preprint at Zenodo DOI: 10.5281/zenodo.19685352. Conclusion The H5 blind spot is not a Prometheus bug. It is not a configuration problem. It is an irreducible consequence of poll-based collection applied to a platform that generates arbitrarily short-lived workloads. Kubernetes is designed to self-heal faster than humans can observe. A pod that OOMKills in 6 seconds and restarts in 2 is working exactly as designed. Prometheus, also working exactly as designed, sees nothing. The architectural answer is equally straightforward: subscribe to the Kubernetes watch API. Receive events at occurrence. No scrape interval. No sampling gap. No ghost pods. Every pod that crashes in your cluster deserves a record. The watch API ensures it gets one. Resources: github.com/opscart/k8s-causal-memory — open-source implementation with reproducible H5 scenarioBeyond the 90-Second Gap — full evidence horizon taxonomy (OpsCart canonical)Research preprint — 30-run statistical analysis, AKS 1.32.10 validation

By Shamsher Khan DZone Core CORE
The Invisible OOMKill: Why Your Java Pod Keeps Restarting in Kubernetes
The Invisible OOMKill: Why Your Java Pod Keeps Restarting in Kubernetes

Imagine deploying a robust Spring Boot microservice that passes every integration test in your local Docker environment, only to watch it crash loop endlessly shortly after launching to your Kubernetes production cluster. Everything ran fine on your laptop, but in the live environment, your pods start terminating en masse. Requests to your critical endpoints begin failing with 503 errors. Panic sets in as your service, the backbone of your transaction pipeline, is effectively brought down by an invisible foe. In our recent migration to a cloud-native architecture, the culprit was a hidden memory configuration issue involving how the Java Virtual Machine interacts with Kubernetes container limits. A tiny mismatch in resource allocation, something that went unnoticed during development, led to a chain reaction of OOMKilled events in production. In this article, we will walk through the scenario step by step, including how the problem manifested and how we diagnosed the root cause. We will discuss the configuration that was to blame and the fixes and best practices that emerged from the post-mortem. Along the way, we will highlight common Kubernetes pitfalls for Java developers that can similarly wreak havoc if left unchecked. Symptoms: When Pods Turn Against You The first sign of trouble was our monitoring dashboard lighting up with red alerts. Shortly after deploying our new payment service, we noticed patterns like mass restarts where pods that had started successfully were suddenly restarting every few minutes. This was not a one-off fluke since it was happening across all replicas simultaneously. Our ingress controller started returning 503 Service Unavailable responses. Essentially, Kubernetes was killing the pods before they could serve traffic. Digging into application logs revealed nothing unusual. There were no stack traces or Java exceptions. The logs simply stopped abruptly. However, checking the Kubernetes pod status revealed the cryptic message Reason: OOMKilled. This error essentially means the container exceeded its memory limit and was terminated by the Linux kernel. At first glance, we were not sure why this would happen. We had set the JVM heap size to 512 MB, and our Kubernetes memory limit was set to 1 GB. Surely there was enough headroom. Why would the kernel kill the process when the heap was only half the limit? The impact of this issue was severe. Since our app relied on steady uptime for processing transactions, widespread pod instability meant no requests could be completed. In effect, our service was down for all users until the issue was resolved. Reproducing and Observing the Failure In our staging environment, we tried to reproduce the sequence of events. We deployed the same Docker image and applied the same Kubernetes manifests. We watched the memory usage via kubectl top pods. Sure enough, as the load increased, the container memory usage climbed steadily until it hit the limit and the pod vanished. Interestingly, the application worked fine under low load. The issue only surfaced during peak traffic when non-heap memory usage spiked. This was a crucial clue. It hinted that the JVM heap was not the only consumer of memory within the container. We realized that focusing solely on heap size was a mistake. Understanding JVM vs. Container Memory At this point, it is helpful to explain how the JVM accounts for memory within a container. Many Java developers assume that the max heap flag controls the total memory usage of the process. However, the JVM requires memory for more than just the heap. Metaspace is used for class metadata. Thread Stacks require memory for each thread. Code cache is used for JIT-compiled code. Garbage collector structures need internal data structures for GC. Direct buffers handle NIO direct memory. In older Java versions, the JVM was not container-aware. It would calculate memory limits based on the host machine RAM, not the container limit. While modern Java versions have improved container awareness, they still require explicit configuration to ensure the non-heap memory fits within the Kubernetes cgroup limit. In our case, the JVM heap was set to 512 MB, but the non-heap memory usage under load grew to approximately 600 MB. Total usage was 1.1 GB. Kubernetes Limit was 1 GB. The result was OOMKilled. The Misconfigured Manifest and How It Failed Let us look at a simplified version of the Kubernetes deployment manifest that led to this issue. We set the Kubernetes memory limit to 1Gi. We set the JVM max heap to 512m. On paper, this looks safe. However, we failed to account for the JVM off-heap memory footprint. When the application loaded large libraries or processed high volumes of concurrent requests, the non-heap memory expanded, pushing the total process size over the 1Gi cgroup limit. Unlike the OAuth token issue, where the server rejected us, here the Linux kernel simply killed the process without warning the application. There was no chance to log an error or gracefully shut down. This silent failure made debugging incredibly difficult since the application never got a chance to speak. How We Fixed It: Correct Memory Alignment The fix for this issue was twofold. We needed to adjust the Kubernetes limits and tune the JVM flags to respect those limits dynamically. First, we increased Container Limits. We raised the memory limit to provide sufficient headroom for non-heap usage. Second, we decided to use a percentage-based heap. Instead of a fixed max heap value, we configured the JVM to use a percentage of the container's available memory. Here is the corrected configuration we applied. We used the MaxRAMPercentage flag so the JVM automatically calculates the heap size based on the cgroup limit detected at runtime. This prevents the configuration from becoming stale if we change the Kubernetes limits later. We also increased the total limit to ensure the remaining 25 percent was sufficient for metaspace and threads. This change allowed the JVM to adapt to the environment automatically. It removed the hard-coded assumption about available memory. This is critical in cloud environments where resource limits might change based on scaling policies. Preventing Similar Issues: Best Practices for Java on Kubernetes We learned several valuable lessons during this incident. We incorporated these into our development standards to prevent recurrence. Always account for non-heap memory: Never set the max heap equal to the container memory limit. Always leave at least 20-25 percent of the container memory for off-heap usage. This buffer is essential for stability.Use modern base images: Ensure you are using JDK versions that support container awareness. Java 8 update 191 or later is required. Java 11 or 17 is better. Consider using distroless images or Jib to reduce the attack surface and image size.Configure liveness probes carefully: A common pitfall is setting liveness probes too aggressively. If your Java app pauses for garbage collection, it might miss a probe timeout and get killed unnecessarily. Add initial delay and failure thresholds to accommodate GC pauses.Monitor memory trends: Implement monitoring using Prometheus and Grafana. Track both container memory usage bytes and JVM-specific metrics like JVM memory used bytes. Alert when usage approaches 80 percent of the limit. This gives you time to react before the kernel steps in.Simulate load in staging: One reason this bug slipped by is that in development, we rarely simulated production-level concurrency. To prevent such surprises, we now use tools like k6 or JMeter in our staging cluster to validate memory stability under load.Secure your secrets: Ensure you store sensitive configuration securely. In Kubernetes, use Secrets mounted as environment variables or files rather than hardcoding them in Docker images. This prevents accidental exposure during debugging.Handle graceful shutdowns: Configure your Spring Boot app to handle SIGTERM signals properly. Kubernetes sends this signal before killing a pod. Ensure your application stops accepting new requests and finishes processing in-flight requests before shutting down. The Human Element in Incident Response Beyond the technical fixes, we also improved our response process. We established a blameless post-mortem culture. This encouraged team members to share mistakes without fear. We documented the incident in our internal knowledge base. This ensures new team members learn from our experience. We also added a checklist for production deployments. This checklist includes verifying JVM flags and memory limits. These process changes are just as important as the code changes. Conclusion Kubernetes is powerful, but with power comes complexity. Our Java service went down due to a tiny memory alignment bug, something easy to overlook but with catastrophic consequences in production. The hidden issue was simply that we were not accounting for the JVM total memory footprint versus the container cgroup limit. Once identified, the fix was a configuration change, yet it brought to light the importance of thoroughly understanding how your runtime interacts with the orchestration layer. In the aftermath, we reinforced our processes. We simulate real-world load in testing and added robust monitoring around memory usage. We kept an eye on JVM flags for containerized environments. By sharing this story, we hope to spare others that moment of dread when you realize your service at the front door to your business logic has unexpectedly locked out your users due to a silent kernel kill. In the end, our system is now stable and more resilient. We treat container resources with greater care. We always align JVM flags with Kubernetes limits. We guard them like the infrastructure keys to the kingdom that they are. We never assume something as critical as resource management will just work without thorough validation. Kubernetes got the best of us once, but with these lessons learned, we are determined not to let a sneaky configuration issue slip by again. Happy and safe coding.

By Ramya vani Rayala

Top Containers Experts

expert thumbnail

Yitaek Hwang

Software Engineer,
NYDIG

expert thumbnail

Marija Naumovska

Co-founder & Head of Growth,
Microtica

expert thumbnail

Naga Santhosh Reddy Vootukuri

Principal Software Engineer,
Microsoft

Naga Santhosh Reddy Vootukuri, a seasoned professional with over 16+ years working at Microsoft, reflects on his journey from India to the USA. Graduating from Sreenidhi Institute of Science and Technology in 2008, he now serves as a Principal Software Engineer for Azure SQL. His role involves leading his team through software development cycles, ensuring successful product launches. Currently, Naga focuses on a significant initiative in Azure SQL Deployment, emphasizing high availability for SQL customers during feature rollouts. Previously, he managed Master Data Services (MDS) within SQL Server, gaining community connections and contributing actively to Microsoft forums. Currently his focus is mainly on AI LLM's and he shares his knowledge through detailed articles. Aside from technical responsibilities, Naga engages in Microsoft hackathons and mentors junior engineers, finding fulfillment in guiding their career paths. He also champions diversity and inclusion, advocating for equality within the tech industry. Naga sees himself not only as a technical leader but also as a catalyst for positive change at Microsoft. Also a Docker Captain

The Latest Containers Topics

article thumbnail
Zero-Downtime Deployments for Java Apps on Kubernetes
Achieve zero-downtime deployments for Java applications on Kubernetes using rolling updates, readiness/liveness probes, and graceful shutdown strategies.
May 29, 2026
by Ramya vani Rayala
· 3,586 Views
article thumbnail
Pragmatica Aether: Let Java Be Java
A modern, distributed, fault-tolerant runtime environment for the language that was intentionally designed for managed environments.
May 29, 2026
by Sergiy Yevtushenko
· 3,760 Views · 1 Like
article thumbnail
Docker Hardened Images Are Free Now — Here's What You Still Need to Build
Docker Hardened Images solve the CVE problem. But CVEs aren't why containers fail in production — governance gaps are. Here's the trust architecture that closes them.
May 27, 2026
by Shamsher Khan DZone Core CORE
· 3,878 Views
article thumbnail
Beyond Partitioning and Z-Order: A Deep Dive into Liquid Clustering for Unity Catalog Managed Tables
Liquid Clustering replaces rigid partitioning and Z-Order with adaptive clustering in Unity Catalog, improving performance with less maintenance.
May 26, 2026
by Seshendranath Balla Venkata
· 2,452 Views · 1 Like
article thumbnail
One Query, Four GPUs: Tracing a Distributed Training Stall Across Nodes
One SQL query across 4 GPU nodes found a straggler in under a second using eBPF fleet fan-out, no central collector needed.
May 25, 2026
by Ingero Team
· 3,467 Views
article thumbnail
Self-Hosted Inference Doesn’t Have to Be a Nightmare: How to Use GPUStack
GPUStack is an open-source tool that turns a bunch of scattered GPU machines into one managed cluster for deploying AI models behind an OpenAI-compatible API.
May 21, 2026
by Sandeep Sadarangani
· 3,605 Views · 1 Like
article thumbnail
Smart Deployment Strategies for Modern Applications
Docker packages applications to ensure consistent and portable deployments. Kubernetes manages them with scaling, reliability, and automation in production.
May 18, 2026
by Manju George
· 3,538 Views
article thumbnail
Solving the Mystery: Why Java RSS Grows in Docker on M1 Macs
Java apps running in x86-64 Docker containers on ARM64 M1 Macs experience mysterious RSS memory growth due to Rosetta 2 translation cache. The culprit? JIT compilation.
May 12, 2026
by Sumeet Sharma
· 3,674 Views · 1 Like
article thumbnail
How We Diagnosed a Hidden Scheduler Failure in a Docker Swarm Cluster Serving 2 Million Users
A real production incident in a Docker Swarm cluster — how a routine service update triggered a silent scheduler failure, and how we uncovered it.
May 5, 2026
by Denis Tiumentsev
· 1,738 Views · 1 Like
article thumbnail
Mastering Kubernetes to Maximize Your Cloud Potential
Understanding Kubernetes architecture through seven critical layers: storage, compute, networking, observability, security, dev tools, and CI/CD.
May 4, 2026
by Jaswinder Kumar
· 1,823 Views · 2 Likes
article thumbnail
AI Agents for DevOps on Kubernetes Need Real Engineering, Not Magic
Kubernetes incident triage: OpenTelemetry → Kafka → CrewAI → RBAC scale. DORA 2024: 75% AI use, 39% low trust. AI correlates, humans approve changes.
April 30, 2026
by Abdul Majid Qureshi
· 2,281 Views
article thumbnail
Java Backend Development in the Era of Kubernetes and Docker
Containerization with Docker and orchestration through Kubernetes enables Java backends to be deployed, scaled, managed efficiently in modern cloud-native environments.
April 28, 2026
by Ramya vani Rayala
· 4,256 Views · 5 Likes
article thumbnail
Java in a Container: Efficient Development and Deployment With Docker
Docker containers make Java apps portable and consistent across environments, development, and deployment, and improve s scalability and streamline CI/CD.
April 28, 2026
by Ramya vani Rayala
· 2,588 Views · 2 Likes
article thumbnail
The Pod Prometheus Never Saw: Kubernetes' Sampling Blind Spot
Prometheus sampling gaps are irreducible — reducing the scrape interval just moves the threshold. The Kubernetes watch API eliminates it entirely.
April 23, 2026
by Shamsher Khan DZone Core CORE
· 2,202 Views · 1 Like
article thumbnail
The Invisible OOMKill: Why Your Java Pod Keeps Restarting in Kubernetes
A Kubernetes pod may restart due to an OOMKill when the Java process exceeds the container’s memory limit. JVM memory tuning and correct resource limits prevent crashes.
April 22, 2026
by Ramya vani Rayala
· 5,422 Views · 5 Likes
article thumbnail
When Kubernetes Breaks Session Consistency: Using Cosmos DB and Redis Together
Cosmos DB stores durable state; Redis acts as a coordination layer, enabling predictable, stateless scaling without sticky sessions, strong consistency, or high costs.
April 15, 2026
by Vikas Mittal
· 2,533 Views
article thumbnail
NeMo Agent Toolkit With Docker Model Runner
Agent observability is often missing in the rush to build AI agents. NeMo adds observability to AI agents, helping trace, evaluate, and debug multi-agent workflows.
April 15, 2026
by Siri Varma Vegiraju DZone Core CORE
· 2,615 Views
article thumbnail
Run AI Agents Safely With Docker Sandboxes: A Complete Walkthrough
A full walkthrough of how to set up Docker sandboxes on a local machine and how to run AI agents safely in YOLO mode without corrupting the host environment.
April 7, 2026
by Naga Santhosh Reddy Vootukuri DZone Core CORE
· 5,917 Views · 3 Likes
article thumbnail
TOP-5 Lightweight Linux Distributions for Container Base Images
Choosing a base Linux image for containers is not just about the size. It is also about licensing, compatibility, update cadence, security posture, and support options.
April 7, 2026
by Catherine Edelveis
· 4,176 Views · 4 Likes
article thumbnail
Docker Secrets Management: From Development to Production
Why environment variables leak, how Docker Swarm secrets work, when to use HashiCorp Vault, and building a layered approach to secrets in production containers.
April 7, 2026
by Shamsher Khan DZone Core CORE
· 3,083 Views · 1 Like
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×