Why Your "Stateless" Services Are Lying to You

“Stateless” systems aren’t. Hidden state — caches, pools, SDK retries, kernel buffers — breaks deployments and scaling. Make it explicit, externalized, and observable.

Mar. 02, 26 · Opinion

Likes (0)

Comment

Save

1.3K Views

The architecture diagram shows clean rectangles. "Stateless API tier," someone wrote in Lucidchart, then drew an arrow to a managed database. The presentation went well. Everyone nodded. Six months later, after the third incident where a rolling deployment dropped active uploads and the on-call engineer spent two hours discovering that session affinity was secretly enabled in the load balancer config — that's when you realize the diagram lied.

Not maliciously. But comprehensively.

I've written services that claimed statelessness while leaking it from a dozen seams. The HTTP handler held user preferences in a package-level map "just for this request." The container wrote 8GB of preprocessed model weights to /tmp during startup because downloading them on every invocation would obliterate our P95 latency. The connection pool maintained TCP state to the database — arguable, sure, but try bouncing pods under load and watch what happens to those half-open sockets. State metastasizes. It finds cracks.

The Honest Inventory

True statelessness might exist in purely functional languages executing atomic transforms over immutable data. Everywhere else, you're managing a spectrum. On one end: ephemeral computation that genuinely forgets everything between invocations. On the other: a PostgreSQL primary holding transactional truth. Your "stateless" web service? Probably somewhere in the middle, pretending harder than it should.

Here's what actually holds state in a typical deployment, whether you documented it or not:

In-memory sessions. The default in Rails, Express, Spring Boot. Works beautifully on a single server. Scales to exactly one instance. The moment you add a second pod, you need sticky sessions (which breaks load distribution) or you need to move that state out. I've seen production systems run for years with sessionAffinity: ClientIP quietly enabled in the Kubernetes Service definition. Nobody remembered why. Removing it caused mysterious logouts because twenty percent of requests were landing on the wrong pod.

Local filesystem writes. Your service accepts file uploads. It writes them to /var/uploads/staging before pushing to S3. Reasonable, even. Except now that directory is state. If the pod dies mid-upload, that file vanishes. If you run two replicas, the second one can't see the first one's staging area. Lambda's /tmp is even more treacherous — 512MB that persists sometimes across invocations, giving you just enough consistency to convince you it's reliable, then wiping itself when the execution environment recycles. I debugged a payment processing bug once that boiled down to CSV generation writing to /tmp/report-{timestamp}.csv and assuming it would still exist thirty seconds later. It usually did. Except during scale-down events.

Caches that rebuild inconsistently. You have an in-memory LRU cache of product metadata. "It's just a cache," the comment says, "we can always refetch." True. But if two pods start cold, they each fetch different subsets depending on which requests they serve first. Now you've got split-brain caching. Queries for product X hit pod A (which has it cached) in 8ms. The next request hits pod B (which doesn't) in 340ms. Your P99 is a lie composed of cache lottery results. Worse: if that cache holds computed aggregates — say, a materialized rollup of user permissions — and you don't version the computation logic, you get inconsistent authorization decisions across replicas until the next deployment forces a cold start.

Configuration that isn't really config. Environment variables feel stateless. They're injected at container start, right? Immutable. Except your app reads DATABASE_POOL_SIZE once at startup and allocates a fixed connection pool. Change the env var in the ConfigMap, redeploy... and now half your pods have 10 connections, half have 50, because the rolling update hasn't finished. For six minutes, you have two different connection behaviors running simultaneously. Not state, exactly. But not stateless either — it's behavioral divergence derived from externalized initialization state.

Kernel and OS-level buffering. Linux caches file reads. The kernel's page cache is gigantic and sophisticated. If your service reads a 200MB reference file from an NFS mount during startup, the first pod takes twelve seconds; the second takes three because the NFS server cached it; the third takes one second because the node's kernel cached the NFS cache. You didn't write caching logic. The operating system did. That's state. It affects latency distributions, readiness probe timing, whether your HPA thrashes or stabilizes. I once tracked a memory leak to vm.vfs_cache_pressure being too low on certain nodes, causing the kernel to hoard dentries until the kubelet OOMKilled our JVM.

Third-party SDK state. The AWS SDK for Go holds a credentials cache. The Stripe client library retries failed requests with exponential backoff and remembers recent failures. These are stateful behaviors embedded in ostensibly stateless HTTP clients. If pod A makes a Stripe API call that fails, the SDK internally backs off. Pod A becomes slower for the next thirty seconds. Pod B doesn't know about that failure and retries immediately. Your latency now depends on which pod handled which prior failures — a hidden distribution of localized state.

Thread-local and goroutine-local storage. Java's ThreadLocal, Go's context.Context if misused. I've seen authentication tokens stored in thread-local variables "for convenience," which works fine until the thread pool reuses a thread and suddenly a request inherits the previous user's identity. Or someone stashes a database transaction in context.Context and passes it down through twelve function calls, and now your "stateless" handler is actually managing transaction lifecycle state that must begin and commit in the same execution.

The database is state — everyone knows that. The message queue is state—sure. But also: DNS caches are state. TLS session resumption tickets are state. The Linux conntrack table holding NAT mappings is state. The load balancer's health check grace period is state. The fact that your container image has layers and the bottom three are shared across nodes while the top one isn't — that's state influencing pull times and therefore startup times and therefore autoscaling behavior.

Where Systems Fracture

The fractures appear when you assume statelessness and encounter state.

Deployment becomes a minefield. You do a rolling update. Kubernetes terminates pods gradually. Except each pod has a warm connection pool to RDS (state), and MySQL doesn't immediately close those connections (state), and now you've got orphaned connections piling up while new pods open their own pools, and suddenly you hit max_connections. The outage isn't caused by the new code — it's caused by the transition itself, by the brief doubling of connection state during the rollover.

Autoscaling becomes adversarial. The HPA sees high CPU and spins up three new pods. They start cold. No caches. Their first requests are slow, which increases queue depth, which looks like higher load, which triggers more scaling. Positive feedback loop. Unless you've carefully tuned readiness probes to exclude cold pods from the load balancer until their caches warm (which requires accepting that, yes, your caches are state), you get scaling storms that make things worse.

Observability becomes archaeological guesswork. A request takes 800ms. Why? It could be the database query. Or it could be that this particular pod doesn't have the reference data cached yet. Or it could be that the pod is on a node that hasn't pulled the Docker layer containing the ML model yet. Or it could be that the client's TCP connection landed on a replica that just restarted. Without structured logging of pod age, cache hit rates, and node-level resource state, you're reading tea leaves.

Failover becomes incomplete. Your service has a leader-election sidecar (Vault, Consul). Only one pod is the "leader" at any time; it runs background cron jobs. Stateless, right? Except the leader election is state. If the leader crashes, a new one is elected... but the lease hasn't expired yet, so for twelve seconds, no pod believes it's the leader. The cron jobs don't run. If those jobs are purging old records, you get silent data growth. If they're renewing a certificate, you get an expiration outage three months later when the cert wasn't renewed during that twelve-second gap and nobody noticed because retries eventually succeeded.

What Changes on Monday

When you accept that statelessness is aspirational, not descriptive, you design differently.

Externalize relentlessly. Sessions go to Redis or a signed JWT. Cookies become bearer tokens. File uploads stream directly to S3 via presigned URLs, bypassing your service entirely. The application code touches the filesystem only for read-only artifacts baked into the container at build time. If you must write — logs, for instance — write to stdout and let the log shipper handle it. Every write to disk is a future debugging session waiting to happen.

Version your stateful artifacts. If you cache computed data, tag it with a schema version. When the computation changes, increment the version. Cold pods starting with the new code reject old cache entries and recompute. No silent inconsistency. We did this with permission checks: each cached decision had a v2 prefix. When we changed the permission logic, we bumped to v3. Old cache entries were ignored. Yes, we recomputed them. Yes, latency spiked for an hour. But it didn't cause incorrect authorization decisions across a mixed-version deployment.

Measure state warming. Export a metric: cache_entries_loaded, reference_data_age_seconds, model_weight_loaded. Graph them per pod. If one pod's cache is 90% empty when others are 90% full, your load distribution is stochastic and your latencies are bimodal. Surface this in your readiness probe. Don't serve traffic until the cache is 80% warm. Yes, this makes deployments slower. It also makes them successful.

Isolate connection state. Don't share database connections across logical boundaries. A request handler should acquire a connection, use it, release it — not pass it to a background goroutine that might outlive the request. Use connection pooling libraries that handle preemption gracefully (HikariCP, pgbouncer) rather than raw database/sql. Set aggressive maxIdleTime so connections die during quiet periods instead of timing out mid-request when load resumes.

Design for partial state. A pod restarts mid-request. Can the user retry safely? If not, you need idempotency keys, client-side request IDs, at-least-once delivery semantics. If you do have idempotency, then losing local state is fine — the client retries, hits a different pod, and succeeds. This is harder than it sounds. It means every mutation must be conditional: UPDATE ... WHERE id = X AND version = Y. It means checking whether the side effect already happened before reissuing it. It means your API must distinguish between "this failed" and "I don't know if this succeeded."

Test cold starts obsessively. Your canary deployment is useless if the canary pod is warm and the cold pods behave differently. Chaos engineering here means: scale to zero, scale back up, immediately hit with production load. Does it fall over? Most services do. The first request after a cold start is often 10–50x slower than steady-state. If your SLO is P99 < 200ms and cold start P99 is 3000ms, you either need to keep instances warm (which costs money and isn't truly stateless) or you need to prewarm asynchronously during startup.

Document the implicit state map. Literally draw it. "This service has no persistent volumes, but it has: (1) a 2GB connection pool to RDS with 50 connections; (2) a 512MB in-memory cache of tenant metadata, rebuilt on startup via a 90-second SQL scan; (3) a dependency on a Kubernetes Secret that's mounted read-only but must be refreshed via a sidecar every 15 minutes; (4) thread-local storage of the current user's session object." That document is more valuable than the architecture diagram. When the next incident happens, you'll grep it for "cache" or "session" and immediately know where to look.

The Honest Trade-off

Pure statelessness is expensive. It means every request pays the full cost of context retrieval. No caching, no connection reuse, no amortization. Lambda billing is friendly to this model — pay per invocation, cold start be damned — but even Lambda caches containers between invocations when it can, because the alternative is economically brutal.

The compromise is explicit, externalized, reconciled state. You have caches — fine. They live in Redis, shared across all pods, with a TTL short enough that staleness is tolerable. You have sessions — fine. They're in DynamoDB, keyed by an opaque token, expiring after thirty minutes. You have local filesystem writes—fine, but they're in a PersistentVolumeClaim backed by EBS, and you run a DaemonSet that scrubs orphaned files older than one hour.

You have state. Admit it. Then manage it deliberately.

The systems that break are the ones that pretend they're stateless while quietly depending on stickiness, warm caches, and the assumption that pods live long enough for their accumulated state to matter. The systems that survive are the ones that say: "This component is stateful. Here's where the state is, how it replicates, what happens if it's lost, and how we test recovery."

I've run both kinds. I know which one I'm debugging at 2 AM.

Connection pool IT Network File System Software development kit Transmission Control Protocol Cache (computing) Connection (dance) Load balancing (computing) pods Requests

Opinions expressed by DZone contributors are their own.

Related

Trending