Latency Is Cheap, Bandwidth Is Not

Bandwidth — not compute — is what quietly blows up cloud bills. Measure bytes. Minimize payloads. Treat caching and topology as cost controls.

Mar. 19, 26 · Opinion

Likes (1)

Comment

Save

3.3K Views

The first time I really understood this, I was staring at a billing dashboard at 11 p.m., trying to explain to a VP why our AWS bill had doubled in a single month. We hadn't added significant compute. We hadn't provisioned new databases. What we'd done, quietly, as part of a feature nobody thought twice about, was start returning full user objects from a search endpoint instead of IDs. Forty fields per record. Hundreds of records per page. Millions of requests per day. The math, once you actually run it, is brutal.

AWS charges roughly $0.09 per GB for the first 10 TB of outbound egress. That sounds trivial until you realize that 500 TB of monthly egress — a number that a moderately successful video platform reaches without trying — lands you somewhere around $37,500 every month. For moving bytes. Not for compute, not for storage, not for the engineering talent that built the thing. Just for the physical act of electrons crossing a boundary Bezos drew on a map.

Latency, by contrast, is almost always survivable. Users tolerate 200 ms where they'd never tolerate a bill 3x what you projected. And while the product instinct is always to optimize the thing people feel — the spinner, the time to first byte — the thing that quietly destroys the economics of a system is almost always the thing nobody's watching: data volume.

Cross-AZ transfer sits at about $0.01 per GB, which sounds cheap until your microservices start chatting across availability zones with the enthusiasm of a distributed systems tutorial that forgot to mention that prod costs money. Ten GB within the same AZ runs you roughly $0.10 total. That same 10 GB going region to region, or worse, out to the internet, is an order of magnitude more expensive. The architecture that looked clean on a whiteboard — services neatly decoupled, each owning its own data, communicating over well-defined APIs — starts bleeding money at the seams the moment someone maps actual byte volumes onto actual prices.

What breaks first is almost always the naive fan-out. You have a service that aggregates results from five downstream services. Each downstream returns its full response object because that's what the contract says, and nobody went back to renegotiate the contract once the objects got fat. Now you're moving five large payloads to aggregate them into one smaller response, discarding 80% of the fields at the aggregation layer. The data traveled. You paid for it. The user got a spinner that lasted 40 milliseconds longer, which nobody measured.

The fix isn't intellectually hard. It's socially hard. You have to go back to teams with established contracts and say: this field you're returning — does anyone downstream actually use it? That conversation, in practice, involves three Slack threads, two design docs, and someone's pet service that turns out to depend on a field you thought was dead.

CDNs are the most obvious weapon here and also the most commonly misused. CloudFront, Akamai, Fastly — the pitch is always the same: push your content to the edge, serve it from 50 ms away instead of 200 ms, make your users happy. And that's real. But the more important story for most businesses is the egress bill. Traffic served from a CDN edge point of presence costs a fraction of traffic served directly from your origin infrastructure. You're not just reducing latency. You're reclassifying expensive origin egress as cheaper CDN egress, and in some configurations, CDN-to-user transfer costs less per GB than origin-to-CDN costs per GB if you've negotiated well.

The practical trap is cache invalidation, which remains one of the genuinely hard problems in the field — not because the mechanism is complex, but because the organizational contract around it is. Who owns the decision to purge? What's the TTL on product images when a flash sale starts? What happens when customer service updates a record and the CDN serves stale data for the next four hours? These aren't engineering failures. They're design failures that happen upstream of the code, in conversations that engineers often aren't invited to.

Brotli compression over gzip is a legitimate gain — typically 15–25% better compression ratios at equivalent decompression speed for text payloads — and the fact that it's still not universally deployed is one of those small professional mysteries. The answer, usually, is that someone enabled gzip in 2014 and nobody went back. Configuration entropy is real. Systems don't degrade suddenly; they drift.

Columnar storage formats are worth understanding mechanistically because they illuminate the broader principle. In a row-oriented store — your standard PostgreSQL table, conceptually — a query that touches three columns out of twenty still reads the entire row off disk for every row that matches. The other seventeen columns ride along for free, in the sense that you're not writing code to fetch them, but not for free in the sense that they still traverse the I/O bus, still consume buffer cache, and still get shipped over the wire if you're not careful about projection. Parquet, Arrow, ORC — these formats store data column by column, so a query touching three columns reads roughly three columns' worth of bytes off disk. For analytical workloads running over wide tables, the savings aren't incremental. They're structural.

The same logic applies to API design at a different layer of abstraction. REST, naively implemented, tends toward coarse-grained resources — GET /user/{id} returns the user. The user has grown because product has added fields, because the mobile app needed something, because someone thought it would be useful to embed the last ten transactions in the user object "since we're already fetching it." GraphQL was supposed to fix this by letting clients declare exactly what they need. In practice, GraphQL introduces its own failure modes — N+1 query problems if the resolver layer isn't careful, complexity limits that become a political battle, schema governance that nobody wanted to own. It's not magic. It's a different set of trade-offs, and whether it reduces your egress depends entirely on whether your clients actually request minimal fields or just request everything because it's easier.

The anti-pattern I see most consistently in service-oriented architectures is what I think of as the enthusiasm problem: teams that are excited about events and streaming build a Kafka pipeline, but they publish full entity snapshots instead of diffs. Every time a user updates their email address, the event contains the entire user record — all forty fields — because that's what was easy to serialize. Every consumer of that topic receives and processes forty fields to get one. At low volume, this is invisible. At scale, it's a bandwidth tax on every consumer, paid continuously, forever, until someone gets annoyed enough to fix it.

The fix — publishing diffs or change events that contain only the modified fields — sounds straightforward. It requires agreeing on a schema that represents "what changed" rather than "what is," which is a subtler modeling problem than it first appears, and it requires every consumer to handle partial updates, which means maintaining local state, which introduces its own consistency questions. There are no free moves here. The decision is always: which set of problems do you want to own?

Streaming processing with something like Apache Flink or Kafka Streams with Arrow-serialized intermediate data can reduce inter-node transfer significantly in large pipeline jobs. The overhead is engineering complexity and operational surface area. Whether that trade-off is worth it depends on your volume. At 10 GB/day, it probably isn't. At 10 TB/day, you do the math.

Kubernetes cross-zone egress is a less discussed cost sink that tends to surface during cost-optimization exercises with a kind of grim surprise. By default, kube-proxy doesn't guarantee that traffic stays within an availability zone. A service pod in us-east-1a routes to any healthy backend pod, including ones in us-east-1b and us-east-1c. At $0.01/GB, this seems negligible. For a high-throughput internal data service doing 50 TB/month of cross-service communication, it's $500/month you didn't plan for, and it compounds as you add services. Topology-aware routing — stabilized in Kubernetes 1.27 via topologyKey hints and the trafficDistribution: PreferClose annotation — helps, but it requires explicit configuration and comes with its own load-distribution caveats that you should read before enabling it in prod.

Service meshes (Istio, Linkerd, Cilium) add observability into this layer that would otherwise require VPC flow log archaeology to reconstruct. The cost is the mesh overhead itself — sidecar proxies, control plane resources, the operational expertise to run them — and whether that overhead is worth it depends on how much you value visibility into your traffic topology versus how much you value simplicity.

The monitoring question is underrated. You can't optimize what you don't measure, and most teams measure compute, storage, and latency exhaustively while treating network egress as something you look at when the bill arrives. VPC flow logs exist. CloudWatch has network metrics. Setting up dashboards that attribute egress bytes to specific services and endpoints takes an afternoon. Not doing it means you find out about expensive data patterns when they show up as line items, which is always the wrong time.

The right time is the design review, which should include a section that asks: what data moves, where does it go, how often, and how large is each payload? This is the kind of question that feels tedious in a design doc and feels urgent six months later when someone is explaining to a VP why the bill doubled.

There's a class of architectural decision that looks like a latency optimization but is actually a bandwidth optimization in disguise. Caching is the canonical example. When you serve a response from an in-memory cache rather than recomputing it from a database, you've reduced latency, yes. But if that computation involved reading 50 MB of data from a distributed store to produce a 200-byte response, you've also eliminated 50 MB of internal data transfer per cache hit. The latency benefit is visible. The bandwidth benefit is invisible and possibly more significant economically, depending on where in your infrastructure the data lives and what that transfer costs.

This is worth naming explicitly in architecture discussions because it changes the framing of the investment. Caching isn't just a performance feature. It's a cost-control mechanism. Its value is proportional to the expense of the operation it replaces, which includes the data movement that operation requires. A cache miss isn't just slower. It's more expensive. That changes how aggressively you should tune hit rates, how much memory you should allocate to the cache tier, and how carefully you should reason about eviction policy.

The teams that have internalized this tend to be the ones who've paid a large egress bill at least once. Experience is an expensive teacher, but it tends to stick.

Content delivery network IT Bandwidth (computing)

Opinions expressed by DZone contributors are their own.

Related

Trending

Latency Is Cheap, Bandwidth Is Not

Bandwidth — not compute — is what quietly blows up cloud bills. Measure bytes. Minimize payloads. Treat caching and topology as cost controls.

Related

Partner Resources