To gather insights on the state of performance optimization and monitoring today, we spoke to 12 executives from 11 companies that provide performance optimization and monitoring solutions for their clients.
Here's what they told us when we asked, "What technical solutions do you use beyond your own?"
Standard tools. APM = AppDynamics and New Relic. It's open source and commercial. For synthetic, it's Catchpoint and Dynatrace. We don’t see our clients using one overriding set of solutions.
There’s a trend towards diversity of vendors with more granular and specialized offerings. The variety is decomposing with virtualized offerings at all layers. Monitoring tools are moving quickly to keep up with the changes.
On the application performance management front, it's New Relic, Dynatrace, and AppDynamics. For infrastructure, it's DataDog and Signal Effects. Honeycomb.io tracks everything, but then you must dig through the data to figure out what’s important. There are a lot of open-source start-ups by former Facebook and Google employees.
APM solutions include Application Dynamics and New Relic. Older enterprise companies use CA and HP on premises. Infrastructure monitoring uses Nagios and Grafana. Data aggregation uses DataDog. For application delivery, the team is using synthetic monitoring and network infrastructure monitoring.
Tons over the years. Now, it's Combostatuspage.io, an uptime robot. The world of status monitoring is very crowded.
Cloudera, Hortonworks, and MapR provide low-level service use information for system admins. For operations, there are no solutions. They are using homegrown scripts or legacy solutions like Loggly or Splunk. There’s a lack of expertise of what to look for and how to look for it. We see a lot of companies using broader monitoring tools like DataDogand Wavefront to provide time series data.
The application is comprised of a network of interconnected AWS hosted microservices written in JS (node.js). Data storage is managed by a NoSQL solution (MongoDB) and Kafka is employed for the messaging purposes. For the monitoring purposes — as we host the application in the AWS cloud — it would be a folly not to use the Amazon bundled monitoring tool called CloudWatch. It collects the hardware metrics such as CPU or RAM usage, the derived metrics such as Requests Per Second and overall statistics like healthy and unhealthy host count. From the broad variety of performance-testing tools, we found an open-source framework called Gatling to best suit our requirements (scalable, easy to learn, configurable scenario assertions, great detailed reports). The scenarios are composed with Scala (or rather with a simplified Gatling DSL, based on Scala). The framework is integrated with Jenkins and there is a variety of trend graphs (percentage of failed requests, mean response time, etc.)
In addition to our infrastructure performance management platform, our customers also use the device-specific monitoring tools provided by their component vendors such as tools from EMC, HP, and IBM. As these tools are vendor-specific. Our solution offers a significant value, as it is vendor-agnostic and correlates data across all IT infrastructure silos such as the hypervisor, server, network, and storage.
The solutions are different for each business unit. One uses a lot of default Amazon, Azure, and Microsoft tools for network monitoring and server management. They also use Google Chrome tools to keep an eye on some of the application load times. QA automation tools like Ranorex help tie a lot of these and other measurements into our QA strategy. We use several tools for analytics purposes so that we can work with aggregated and anonymized data to help our team as we implement changes to the product. Data visualization is paramount for us to make decisions quickly. We have some home-grown tools but are using Power BI more and more. Then, there is always the slightly lower tech options of just asking people, such as setting up surveys and interviews to learn about how people think we are doing (is the software more stable, faster, more responsive, etc.). No one solution is perfect, so it’s about using the strengths of many tools and the product knowledge of our stakeholders that has helped us to grow.
By the way, here's who we spoke to!