New Relic was all over AWS re:invent this week, with a packed booth full of informative, well-attended presentations from New Relic experts and partners (more on those in this post). But the highlight of the show for us was no doubt a “sold-out” presentation by New Relic director of engineering Kevin McGuire on Application Monitoring in a Post-Server World: Why Data Context Is Critical.
Context is King
In a standing-room-only session (people were being turned away at the door for lack of room), Kevin showed a rapt audience how merely monitoring EC2 instances in a cloud environment won’t reveal everything you really need to understand.
Kevin believes that the scale and lifecycle of containerized computation and EC2 is driving visibility problems around managing that dynamic scale. You need ways of organizing and drilling down to the right EC2s. Context powers this… you need to be able to understand what’s going on in context. Second, you need the application connection to really understand performance. You need to know what your application is doing, not just how the containers are working.
Building on the surprising data New Relic has generated on real-world use of Docker containers that help indicate where the industry is going, Kevin began by sharing info on the very short lifespans of a high percentage of Docker containers. In many cases, he said, “the lifecycle is getting very, very short… The container lives only as long as work needs to be done, and then goes away.” One result, he noted, is that we’re dealing with “a lot more containers.”
Why does this matter? The surprisingly large numbers of increasingly short-lived containers mean that “it’s a big data problem,” he said. To handle this kind of vast scale and brief lifecycle, you need monitoring tools that can measure not just on the container level, but according to the “computational intent.” What are you actually trying to do with those containers?
Monitoring Containers vs. Servers
Traditionally, monitoring by server made sense. “But for 60,000 short-lived containers,” Kevin noted, “It’s not useful at all, actually.”
That’s why New Relic’s Docker Monitoring product is designed so that you can roll up containers by image to see how much computing resources the image is utilizing. “Docker images are the best proxy we have for “computational intent,” the work in common they were all trying to do, Kevin explained.
“So, how do you monitor computation as a service?” Kevin asked. “A list is not useful any more.” You need as much context as we can get to understand what is happening. The ID of the container is not enough, the more we know about that container the more analytics we can do. And we need analytics, he noted, not just raw metrics, to make sense of this new world.
To address those questions, earlier this week New Relic announced a private beta for a set of new monitoring capabilities for Amazon EC2, designed to help manage the scale and dynamic nature of AWS. (To stay up-to-date with the private beta and be notified when the Amazon EC2 monitoring public beta is available, sign up at newrelic.com/aws.)
For example, the beta pulls the metadata from AWS so you can sort servers by EC2 instance type, region, availability zone. Though not in the beta, Kevin noted that one of the many possibilities he sees is the ability to help users differentiate under-provisioned instances from other performance problems, as shown in the screenshots below:
Kevin’s point is that you want to help developers provide the best customer experience for their dollars—delivering the customer experience you want, no more and no less. You don’t want to annoy your app users with poor performance caused by under-provisioning, for example, but you also don’t want to waste money by paying for over-provisioning that doesn’t deliver a noticeably improved user experience.
We want to be able to help you understand what is going on in your infrastructure, Kevin said, because “In the end, what you really care about is your application,” not just your containers. The message clearly resonated, as eager session attendees lined up to talk with Kevin after his presentation.