Most Common Issues Affecting Performance and Monitoring
Failure to understand performance and monitoring's implications on business.
Join the DZone community and get the full member experience.
Join For FreeTo gather insights for DZone's Performance and and Monitoring Research Guide, scheduled for release in June, 2016, we spoke to 10 executives, from nine companies, who have created performance and monitoring solutions for their clients.
Here's who we talked to:
Dustin Whittle, Developer Evangelist, AppDynamics | Michael Sage, Chief DevOps Evangelist,Blazemeter | Rob Malnati, V.P. Marketing and Pete Mastin, Product Evangelist, Cedexis | Charlie Baker, V.P. Product Management, Dyn | Andreas Grabner, Technology Strategist, Dynatrace | Dave Josephson, Developer Evangelist, and Michelle Urban, Director of Marketing, Librato | Bob Brodie, CTO, SUMOHeavy | Christian Beedgen, CTO and Co-Founder, Sumo Logic | Nick Kephart, Senior Director Product Marketing, ThousandEyes
We asked these executives, "What are the most common issues you see affecting performance and monitoring?"
Here's what they told us:
- The enterprise not scaling is common. There are ten common problems I see causing apps to crash: 1) database access: loading too much data inefficiently; 2) microservice access: inefficient access and badly designed Service APIs; 3) bad frameworks: bottlenecks under load or misconfiguration; 4) bad coding: CPU, Sync and Wait Hotspots; 5) inefficient logging: even too much for Splunk & ELK; 6) invisible exceptions: frameworks gone wild!; 7) exceptions: overhead through Stack Trace generation; 8) pools & queues: bottlenecks through wrong sizing; 9) multi-threading: locks, syncs & wait issues; and, 10) memory: leaks and garbage collection impact. Take and shift everything left in development to correct problems earlier before you check the code in to production. We can detect the patterns of problems and prevent them.
- People do not understand the implications of performance on their business. We correlate performance to KPIs. We’re able to see each path of the buyer’s journey and determine the impact on the performance of the site or the app. Page load time + page resources = conversion rate. Most people have no clue how to measure performance. Webpages continue to get more complex.
- Customers with issues know we can help them at the business level. We discover what the problems are and use tools to dig in and analyze. We’ll use our tools on our customers’ systems for a month to identify the technical problems and the business problems are typically cultural. Businesses try to solve problems from the top down. A lot of top down requests are supposed to make things work better but they don’t. People that are affected are afraid to speak up. You need to talk to CSRs, helpdesk and production to learn how people are solving problems. We build reports that enable clients to manage better and improve their workflow.
- The internet is evolving very quickly and the decisions you make with regards to the internet today will not be relevant in a week, a month, and certainly not in a year. Everything is expanding nodes in a dynamic environment. It’s constantly changing and your work is never done. We make it as easy to manage as much as possible avoiding noise at the micro level while identifying problem areas that need to be addressed. All of this connectivity creates a challenge.
- The constraints are always changing; we don’t see a common issue. Maybe databases but the stack is so deep and there are so many elements it’s difficult to identify a consistent area of concern. Some are application performance and others are team performance. People are consistently surprised by what becomes a problem. We’re able to find problems more regularly with a combination of performance monitoring tools.
- The database does not account for the amount of traffic for capacity planning. Look at JavaScript in the end user experience. How long does it take for a page to load? See what’s happening in the browser itself. You can see where the bottlenecks are. This makes it possible to identify bad design.
- Latency. There are so many places it can hide in distributed systems architecture – load balancing, queues, key value stores. Understand what works and what introduces latency. Be able to quantify the latency of the pieces you are working on and predicting where problems may be given the architectural design.
- Customers want to solve the domain of operational monitoring in one fail swoop but you need two different types of products to do that because they’re two different jobs. We focus on the network but provide information in the service level of the application. We can note code level issues like APM be we don’t go as deep as an APM solution.
- Systems are glued together from many different parts. There’s no longer a single box. There are micro services and multiple platforms, a distributed system that you don’t understand or control. The smoking gun is almost always something you didn’t expect – much more subtle like a software cache missing, running out of memory, very application specific, a server can’t talk to another server due because the LDAP is down, service degradation.
What are the most common issues you see affecting performance and monitoring?
Opinions expressed by DZone contributors are their own.
Comments