Making APM a Company-Wide Effort

Take a closer look at how application performance management/monitoring (APM) can help manage expectations for performance, availability, and user experience.

Joana Carvalho

Jan. 28, 22 · Opinion

Likes (4)

Comment

Save

4.5K Views

This is an article from DZone's 2021 Application Performance Management Trend Report.

For more:

Read the Report

Today, more than ever, users are unwilling to wait and tolerate failure. Nearly 50 percent of users expect a load time of less than two seconds. Hyperconnectivity has become the new status quo, and with it comes higher pressure on the industry to provide the best service possible. This has also transformed the software application landscape into an intricate net of components — from APIs to CDNs — each of which can easily become the weak link when a problem occurs, leading to poor customer experiences and unhappy end users.

Teams of developers, product owners, test engineers, etc. must work more closely and seamlessly than ever before to solve these issues. This is where application performance management/monitoring (APM) can help manage expectations for performance, availability, and user experience. APM helps teams and companies understand these expectations by gathering software performance data and analyzing it to detect potential issues, alerting teams when baselines aren’t met, providing visibility into root causes, and taking action to resolve issues faster (even automatically) so that the impact on users and businesses is minimal.

State-of-the-Art APM

When the first APM solutions were designed, the most common software architectures were different from what they are today — they were simpler and more predictable. Demand has driven applications to move from a monolithic architecture to a cloud-distributed one, which is often more complex and more challenging to manage and monitor without dedicated tools. This increased complexity has forced APM tools to seek new strategies and monitor a myriad of moving parts now present in the software stack.

In addition, over the last two years, social distancing due to the COVID-19 pandemic has forced a new shopping paradigm, and consumers — for safety and convenience — have turned to this new digital experience. In the US alone, in March 2020, 20-30 percent of the grocery business moved online, reaching a 9-12 percent penetration by the end of 2020 — and the forecast is to continue growing.

Countries like Germany and Switzerland, who are known for their preference to use physical currency, have massively turned to wired or contactless payments, either motivated by the restrictions from governments or to avoid unnecessary contact.

Not only has the pandemic changed the way we conduct ourselves in the world, but it also changed the way we interact with technology for our everyday needs. It is now imperative that software be more predictable and reliable than ever before, as it caters not only to our convenience but also to our safety.

APM will be crucial to fulfilling these requirements, giving DevOps teams insight into problem isolation and prioritization that consequently shortens MTTR (mean time to repair), hence, preserving service availability and experience. The increased demand by businesses to meet shorter time-to-market requirement roll-outs, the acceleration of cloud and containerized migrations, and the evolution of technology stacks all contribute to organizations’ efforts to aggressively keep up with the new wave of users, which has increased the risk of service disruptions and delays.

The future is to combine observability with artificial intelligence to create self-healing applications. Together, with machine learning, real-time telemetry, and automation, it’s possible to foresee application issues based on the system outputs and resolve them before they can have a negative impact. Further, machine learning will help determine motive, predict and detect anomalies, and reduce system noise.

Successful APM Adoption: A Shift in Culture

Monitoring and observability are a crucial part of the software development lifecycle. Not only do they help ensure user satisfaction, as well as prevent and detect anomalies and defects, but they also help inhibit the throw-over-the-wall mentality. This mentality is, in part, a result of observability and monitoring strategies often thought of as the responsibility of the Ops teams; therefore, they typically don’t make it into the development cycle and become an afterthought.

Most companies only use observability and monitoring strategies in production environments, becoming challenging for the development and quality teams to fully get to know their application and take advantage of features like tracing, error prediction, etc. to mitigate defects during the development phase. This is why APM selection, implementation, and configuration should happen side by side with feature development early in the design process to guarantee the robustness of the solution.

The question to be answered is: Whose responsibility and ownership is it? As seen in the image below, the usages of APM solutions provide insight for several stakeholders.

All stakeholders should be involved in defining requirements, setting up expectations, and configuring the metrics and queries that will be responsible for creating the rules, budgets, and visualizations. It is not straightforward to only understand the metrics that should be examined more closely. Since all telemetry is stored and analyzed, and we are constantly bombarded with information, we must be frugal with the metrics chosen.

So what makes a good metric? Daniel Yankelovich summarizes some of the common errors that are made when measuring: "The first step is to measure whatever can be easily measured. This is OK as far as it goes. The second step is to disregard that which can't be easily measured or to give it an arbitrary quantitative value. This is artificial and misleading. The third step is to presume that what can't be measured easily really isn't important. This is blindness. The fourth step is to say that what can't be easily measured really doesn't exist. This is suicide." — Daniel Yankelovich, "Corporate Priorities: A continuing study of the new demands on business," 1972

Characteristics of a Robust Metric

Metrics are used on a daily basis to support decisions and guide successful outcomes. For a metric to be effective and reliable, it should have the following characteristics:

Understandable – When selecting a metric, anyone should be able to understand what each value means. If teams are not able to discuss a metric that they are tracking, it is meaningless.
Actionable – The purpose of tracking a metric is to aid decisions being made. Tracking should always improve behaviors and motivate change. If a value goes over a threshold or a metric starts failing, it should be clear what the impact is and what must result from it.
Comparative/progressive – The ability to compare a metric to other time periods or release versions, for example, can help understand progress, helping spot spikes or long-term trends. It is clearer reading "the throughput is 12 percent higher than last week" than "the throughput is 50 requests per second." The first conveys an improvement, and the second, without context, merely states the value.
Easily measurable, stable, and responsive – If the effort spent to measure the metric is overwhelming and needs the implementation of new complex systems with the single purpose of measuring and computing that metric, it’s probably not worth measuring in the first place. Basing decisions on a metric that is generated inside a black box is not ideal. The complexity of the solution needs to be balanced against the gain — or even lead to the selection of a more realistic metric. When implementing the metric, the future application should also be taken into account to accommodate the volume of data and ability to apply it throughout the platform.
Relevant – It should align with teams and business objectives. All metrics should be tracked, if possible, but if creating custom metrics or collecting something that is not out of the box, vanity metrics are good for team morale, but they shouldn’t be the totality.
Real-time – The time to compute and collect a metric should not be so long that the time to display renders it obsolete.

These metrics, when carefully selected, can become the standards to communicate the status of the system with teams or stakeholders. They can also serve as a motivator when sharing good outcomes, for example, from technical or marketing initiatives.

Keep in mind: It is paramount to have the support at the executive level to guarantee the success of implementing APM systems as well as observability and monitoring techniques.

Embedding the power of application performance monitoring and AIOps, which are becoming an intertwined concept, teams can work together to implement them into their process and gain the ability to:

Detect why applications are slow
Enable end-to-end visibility in an automated way — with little intervention from teams
Access a topological service map visualization that can help spot issues and fix them faster, while also showing all dependencies between application and infrastructure components
Monitor and gather information for application usage from a user perspective — either proactively through synthetic monitoring or passively through RUM (real user monitoring)
Identify and alert on deviation trends and performance issues to help build better contingency plans and predict future occurrences that might affect the user
Profile user actions from front end to back end so that they can be traced to the code, database query, or third-party call

Conclusion

In the end, APM is about providing insights through a diagnostics view that exploits every element of the software so that the end-user experience can be understood and continuously improved. By making observability and monitoring first-class citizens in the development process, teams can focus more on the quality of the solutions and less on firefighting.

This is an article from DZone's 2021 Application Performance Management Trend Report.

For more:

Read the Report

Metric (unit) application Software development teams Database Machine learning Observability IT trends

Opinions expressed by DZone contributors are their own.

Related

Trending