The Bill You Didn't See Coming

Egress — not compute — drives surprise cloud costs. Fix it by designing for data locality, using compression/caching wisely, and actively monitoring data flows.

Apr. 28, 26 · Opinion

Likes (0)

Comment

Save

3.2K Views

There's a moment, familiar to anyone who has run infrastructure at scale, when you open the cloud billing dashboard mid-month and feel the floor shift slightly beneath you. Not a catastrophic number — not yet — but a trend line that bends upward with an unsettling confidence. You start clicking through cost categories. Compute looks fine. Storage, manageable. Then you hit the networking section and something goes cold in your chest.

This is not a hypothetical.

A media company's CFO once found herself staring at a $2.4 million monthly bill, roughly 80% of which was data egress. Not servers. Not databases. Moving bytes from one place to another. A marketing firm traced 60% of its cloud spend to CDN traffic it had never consciously provisioned for growth. Another company, for weeks — weeks — was hemorrhaging $220,000 every seven days in cross-region replication fees that nobody on the team had thought to monitor. The code was doing exactly what it was told to do. That was the problem.

The foundational misconception that makes all of this possible is deceptively simple: engineers, trained on a mental model where CPU and memory are the scarce resources, build systems optimized around compute efficiency while treating network traffic as approximately free. It isn't. In cloud pricing structures, egress — data leaving a cloud provider's network or crossing availability zones — is priced in a way that punishes architecture laziness with almost mechanical precision. AWS, GCP, Azure: they all do it. The meter runs whether you're paying attention or not.

To understand why, you have to think about what's actually happening physically. When your application in us-east-1 queries a database replica in us-west-2, that data isn't teleporting. It traverses backbone infrastructure that the provider has built and must maintain and amortize. Cross-AZ traffic within the same region is cheaper but not free — typically around $0.01/GB each direction. Cross-region traffic climbs toward $0.02–0.09/GB depending on destination. Egress to the public internet can hit $0.08–0.09/GB in volume tiers, and even that underrepresents the damage when you're moving terabytes daily. Do the arithmetic: 10TB out of AWS costs roughly $900 in a single transfer. If that's a daily sync job — a backup, a replication pipeline, an analytics feed — you're looking at $27,000 a month for one data flow that someone scheduled and forgot about.

Most teams have dozens of these.

The failure modes tend to cluster around a few specific architectural patterns, and they share a common ancestor: systems designed without any mental model of data gravity.

Multi-region database replication is the canonical trap. The logic feels sound at the time — you want your data close to your users globally, you want resilience, you stand up replicas across regions. What nobody draws on the whiteboard is the replication stream itself: every write to the primary propagates outward, continuously, to every replica. Without differential sync — without sending only the delta, the changed rows or blocks — you end up shipping entire state updates repeatedly. At modest write volumes this is invisible. At scale it becomes a river of billable bytes flowing in all directions simultaneously, and the scary part is that your application latency metrics look fine the whole time. The system is "working."

Verbose logging to external aggregators is the sneakier version of the same disease. Structured logging is good engineering — feeding every service log to a centralized ELK stack or Datadog or Splunk is how you actually debug distributed systems. But few engineers sit down and calculate the byte cost of logging. A single high-traffic API service emitting detailed request logs — user agent, full request body, response payload snippets, timing breakdowns for each internal step — can produce gigabytes per hour. Multiply that by a dozen services shipping logs cross-region to a centralized logging cluster and you have a nontrivial egress line item that is, functionally, the cost of knowing what your system is doing. You can't eliminate it. You have to be more surgical about it.

Chatty microservice architectures manufacture this problem at the application layer. When service A calls service B, which calls service C, which calls service D, and each hop is transmitting relatively large payloads — full object graphs, redundant metadata, entire records where you needed one field — you're paying for each of those traversals if they cross AZ boundaries. Which they often do, because load balancers distribute traffic across zones for redundancy, and a single user request can trace a path through four availability zones before it resolves. The application team sees nothing wrong; each individual service is performing correctly. The bill sees everything.

Here's what tends to happen in practice when these costs surface: there's a fire drill. An engineer is handed a spreadsheet of line items and asked to "find the quick wins." They add gzip compression to a couple of API endpoints. They maybe set up a CloudFront distribution in front of an S3 bucket that was previously serving directly. The bill drops 15%. Everyone exhales. The underlying architecture is unchanged.

This is the wrong frame. Compression and caching are tactical interventions that reduce the cost of a bad architecture. They're worth doing — gzip on a high-volume JSON API can halve your payload sizes, and binary serialization formats like Protocol Buffers or Avro can get you another 3–5x reduction over verbose JSON, particularly for structured domain objects with repetitive field names. A CloudFront distribution in front of S3 absolutely makes sense: you're paying CDN egress rates instead of origin egress rates, and cache hits cost almost nothing in comparison to origin fetches. These things matter. But they don't address why so much data is moving in the first place.

The more durable intervention is locality: designing computation to happen where the data already is, rather than pulling data to where the computation lives.

This sounds like a platitude. It isn't. Consider an analytics pipeline that runs nightly, pulling records from a production database in us-east-1 into an analytics cluster in us-west-2, transforming them, and writing results back. The instinct to "keep production and analytics separate" is correct. The instinct to separate them geographically when they're deeply coupled by data dependency is less considered. Running that transformation workload in us-east-1 — even using spot instances that spin up, do the work, and terminate — costs a fraction of the cross-region transfer, and it's faster, because the data never moves far. The compute is cheap. The bandwidth isn't.

Edge serving is where teams find their most reliable structural improvements, when they actually commit to it rather than doing it halfway. A CDN does more than cache static assets — or it should. A well-architected edge layer performs filtering, authentication, basic authorization, header normalization, and light transformation before a request ever reaches origin. Lambda@Edge and CloudFront Functions, Cloudflare Workers, Fastly Compute@Edge — these execution environments let you push logic toward the user. Not all logic. But the logic that deals with the highest-volume, most-repeated request patterns. If 40% of your requests are authenticated reads of the same resource, varying only by user preference metadata that could be embedded in a cache key, you should be serving those from edge. The origin should never see them.

The caveat — and this is worth sitting with — is that edge caching creates consistency problems that bite hard in specific contexts. Cache invalidation is, famously, one of the two hard problems in computer science. When your data changes and you have copies distributed across 200+ edge nodes globally, "purge and refetch" is not instantaneous. There are windows — typically seconds to tens of seconds for a propagated purge — during which some users see stale data. For most content this is fine. For financial data, live inventory, anything where two users seeing different values simultaneously is consequential, it is very much not fine. The architecture that saves you money on egress can introduce subtle correctness bugs that only manifest at the edge of your cache topology, in the users farthest from origin, after a write. These bugs are genuinely hard to reproduce in local development or staging.

Know which data you can afford to serve stale. Be explicit about TTLs. Use cache-control headers precisely, not aspirationally.

Monitoring this class of cost requires different instrumentation than most teams have in place. Application performance monitoring tools — the ones that track request latency, error rates, throughput — don't surface network cost by default. You need to be instrumenting at a different level.

CloudWatch's NetworkOut metric is a starting point but only a starting point: it tells you bytes leaving an EC2 instance, not where they're going or why. The more useful construct is tagging your data flows and costing them individually — either through a FinOps platform (CloudZero, Cloudability, Vantage) that enriches cost allocation data, or through custom instrumentation where you record the destination of every significant data transfer alongside its size. In Kubernetes environments, service mesh telemetry (Istio, Linkerd) gives you per-service-pair bytes transferred, which is exactly the data you need to find the expensive relationships in your service graph.

The SLO framing is useful here, though unusual in practice. Almost no team has a defined SLO on inter-region traffic volume, but there's no reason not to. "Cross-region egress must not exceed X GB/hour" is a measurable, alertable condition. If you set it, you will discover violations almost immediately — probably from jobs that someone scheduled six months ago and hasn't thought about since.

The competitive topology of these tools is worth understanding, not for product selection purposes but because it reveals something about where the industry thinks the problem lives. The CDN market is substantial and mature. The FinOps tooling market is growing fast specifically because these costs are opaque and large. What's slower to emerge is tooling that makes architectural decisions — that looks at your service dependency graph, models the data flows, and tells you "this particular call pattern is generating $40K/month in egress that could be eliminated by moving this service." That's a hard problem, blending static analysis with cost modeling and deployment topology knowledge. Some platforms are approaching it. Nobody's solved it.

The dirty secret is that cloud providers don't have a strong incentive to make egress costs maximally visible or easy to optimize. Egress is enormously profitable for them. This isn't a conspiracy — it's a business structure that engineers need to understand and work against deliberately.

Monday morning, then. Practically.

Start with the audit. Map your data paths — not your service dependencies in the abstract, but the actual bytes: where does data originate, where does it get read, where does it get written, what crosses an AZ or region boundary. Most organizations haven't done this. The first time you do it, the map will surprise you. There will be a data flow generating significant cost that nobody owns, that's been running on autopilot, that exists because of a decision made by someone who left two years ago.

Then: be skeptical of replication. Multi-region is a legitimate reliability strategy. Multi-region with full, continuous, synchronous replication of everything is often an expensive approximation of a strategy. Think carefully about what actually needs to be multi-region versus what is multi-region because you didn't have time to think carefully about it.

Compress. Enable gzip on API responses if it isn't on. Switch high-volume internal APIs to Protobuf. These are days of work, not weeks, and the savings are immediate.

Cache where the access patterns support it. Not everywhere — be honest about where you can tolerate staleness and where you can't.

Put something in front of your egress. An alert, a metric, a weekly review. The bill will not generate itself; that's the one thing it actually won't do.

The broader lesson in all of this is older than cloud computing. Computing resources that are cheap, fast, and invisible invite abuse. Memory used to be the expensive thing and developers were meticulous about it; now it's practically free and nobody thinks twice about a 2GB heap. Bandwidth used to be clearly expensive, then fiber made it feel infinite, and the muscle memory for treating it as precious atrophied. Cloud pricing re-introduces the scarcity, artificially or otherwise, and the engineers who build cheaply at scale are the ones who internalized that latency and bandwidth are not the same axis of cost — and behaved accordingly.

API Cloud computing Content delivery network Database IT Mental model Cache (computing) Data (computing) Requests teams

Opinions expressed by DZone contributors are their own.

Related

Trending

The Bill You Didn't See Coming

Egress — not compute — drives surprise cloud costs. Fix it by designing for data locality, using compression/caching wisely, and actively monitoring data flows.

Related

Partner Resources