Remember TCP slow start? CIDR notation? For a long time networking knowledge has been pushed to the back of most developers’ minds. But recently, a networking renaissance has kicked off with developers leading the charge. Major services are flirting with HTTP/2 for pipelining and multiplexing. Some mobile developers are even building their own transport stacks to control application experience delivered to end users on high latency or low bandwidth connections. Developers are jumping back into the networking fray.
Let’s talk about the Ops in DevOps for a moment. Brush up on networking, because as you re-architect your applications for IaaS and microservices, the network is becoming an increasingly critical enabler, or inhibitor, of your application experience. Your users are ever more dispersed across networks, just as application services are spread over more and more IaaS regions, API endpoints and external services. You don’t have to look far to find name brand cloud applications that have been knocked out by network failures, from Netflix to Craigslist, and Salesforce to Playstation.
The Changing Structure of Apps and the Network
Three trends that are reshaping application development are also changing the relationship between application and network.
IaaS: As cloud-based infrastructure becomes more common, application flows increasingly traverse public Internet links. Hosting your site on AWS? Azure? Softlayer? Google? Each provider has different peering policies, global POPs and traffic management that can lead to dramatically different end user experiences. Google peers widely, meaning that it is rarely more than one or two networks away from your users. AWS and Microsoft, on the other hand, focus on backhauling traffic across or between continents on their own backbone to have greater control.
Microservices: Developers are increasingly slicing up application services into ephemeral, scale-out workloads. Often based on Linux containers using iptables-based NAT, these services are causing an explosion in intra- and inter-data center network traffic. Within the data center, this causes more variable network I/O and a drive to smarter routing policies, reshaping the switching, routing, and load balancing landscape. Outside the data center, this is creating more strain on peering links and requires careful planning for key inter-DC routes.
Composable apps: Most importantly, applications are being composed of multiple parts. We’ve already seen a rise in APIs, from payments to chat-based support. Some of the APIs are full-fledged services. Take Lumberyard, the new game engine from AWS; developers will find themselves increasingly composing applications from diverse publishers’ disparate network locations. I’ve seen this happen even within the enterprise data center, with integrations surrounding the Salesforce and ServiceNow ecosystems that have become critical business services.
Delivering Your Bits
How do these changes manifest themselves? They directly impact variables that your users and your operations teams care about.
Increasing loss: With traffic traversing best-effort Internet links, pay special attention to network health. Be prepared to adapt routing or peering preferences based on observed loss in the network, and regularly monitor the performance of your ISPs or CDNs.
Increasing latency: As traffic travels long distances between API endpoints, IaaS data centers and individual containers, your app needs to handle highly variable conditions. Test out message queues, API calls and critical transactions across a range of tolerances.
Less reliability: Develop your application to gracefully handle outages of critical services caused by device failures, route leaks or DNS problems. Recent network and infrastructure outages have taken down entire services for hours, from popular IaaS providers to large CDNs.
Developers Meeting the Networking Challenge
Cloud-native architectures get a lot of attention, but you can go deeper on the network side too. Interestingly, many developers are beginning to take on this challenge to optimize the delivery of their applications. Here are a few examples of how you might consider improving your own application experience. The right course of action, if any, will of course depend on your applications and users.
Get network savvy: You can get a surprisingly rich view of how your application is delivered over the network. This isn’t just your grandfather’s ping. New monitoring techniques, such as detailed network path tracing and active probing, identify specific network segments and service providers that have performance degradation. Sophisticated services like ThousandEyes can provide a detailed view of both web and network performance across data centers, ISPs, CDNs, and IaaS providers. There should be no excuse for being in the dark about network performance.
Get automated: Having visibility into both application and network performance is step one. You’ll want to make sure that you tie your monitoring into your on-call ticket apps, like PagerDuty or xMatters, with carefully crafted escalation rules to ensure you aren’t unnecessarily awakened from your beauty sleep. I’ve seen some teams go even further and use alerts to power load balancers or change DNS records that can take racks, pods or entire data centers out of rotation if performance is degraded.
Try new content delivery strategies: Content delivery networks can host a wide variety of content and make it quickly accessible to a global audience. Of course, this means you have another service provider to manage. Some applications, such as Netflix, have built out their own CDN to minimize the network distance to its users. For applications that rely on audio or video streaming, which is particularly sensitive to packet delay and loss, misordered arrival and jittery TCP round-trip times, a CDN, whether built or bought, can make a huge difference.
Experiment with network architectures: Some developers aren’t satisfied with the network services that they can procure from major providers. So they’ve built their own. Riot Games unveiled their architecture of new POPs, dark fiber leases, tweaked routing preferences and peering with major ISPs across the U.S. The result? Higher percentages of League of Legends players experienced under 80ms latency, a desirable threshold for multiplayer gaming. Tie in SDN to control this architecture, and Riot Games expects to push the boundaries even further.
If you dare, build new network stacks: With users on high latency and low bandwidth networks, sometimes the standard HTTP/TCP/IP stack doesn’t cut it. Facebook’s mobile app team shipped their latest Android app for developing countries with its own custom messaging protocol over TLS and TCP, rather than using HTTPS. This reduces data use on 2G networks and works with image servers in the Facebook CDN to deliver exact-sized image delivery that otherwise eats up tons of bandwidth. This is a great tutorial on building the network stack.
Making Your Cloud App-Ready, and Vice Versa
Application development today has a rich set of services and infrastructure that can be used to construct ever-more powerful applications. But this has left many applications more exposed to the whims of the network. Using new network monitoring techniques and architectures, developers can better control the end user experience. So break out an O’Reilly book, chat with your favorite Ops member, and refresh your networking knowledge. It’s time to get your cloud app-ready, and your app truly cloud-ready.