Opportunities to Improve Performance and Monitoring
Opportunities to Improve Performance and Monitoring
Automatic reaction to and correction of issues, and having more elegant, thoughtful design and testing results in an optimal UX.
Join the DZone community and get the full member experience.Join For Free
Sensu is an open source monitoring event pipeline. Try it today.
To gather insights on the state of performance optimization and monitoring today, we spoke to 12 executives from 11 companies that provide performance optimization and monitoring solutions for their clients.
Here's what they told us when we asked, "Where do you think the biggest opportunities are for improvement in performance optimization and monitoring?"
Awareness. Continue to focus on better application design and graceful degradation. Embracing degradation will help improve how we handle it. We need better sharing of information and integrated IT operations teams representing different domains improve vision to see how everything works together. Have mutually agreed upon metrics to share regarding performance.
As architecture and digital strategies decompose into finer grains, there’s an evolution in monitoring and managing. Keep it meaningful with useful data. Increase anomaly detection. Highlight and cross-correlate problems. Automatically react to issues. Know the difference between mitigating and fixing. Reroute users and then fix the problem. Detect and mitigate in real time. Access, evaluate, and fix without the building being on fire. Manage the impact of failure.
Collect more and more data. Know how to identify a dynamic system to correlate what the problem may be before it affects the customer. When adopting a DevOps mindset, expand beyond IT to security and customer support. See how the entire pipeline is configured.
Make the data presented more manageable with automated analysis. There are not enough data scientists. Customers love the monitoring product, but it needs to be more accessible to a wider group of business users. There is not a depth of training or skill set in heuristics and analysis for root cause analysis and benchmarking. We need the ability to know the hosting provider is down or DDOS is the root cause faster with less skilled people. Gain insights that provide a direct benefit to the business like knowing whether to put in emergency contingency plans versus waiting for a self-correction.
Design applications with higher level programming and better tools. Companies always need something done yesterday. Database architectures like Mondo DB and Redis are wonderful and have changed how you scale the application. They have alleviated the need that all databases are automatic.
One day, you’ll be able to put APM on auto-pilot to identify the root cause of the problem and proceed to fix the problem. We’re going to the full-stack horizontally. We started with Hadoop, now Spark, adding Kafka, Cassandra, and MongoDB. Companies are mixing and matching big data solutions. We need visibility from ingestion to usage to visualization.
Clever load tests could give a better answer to the SUT’s (system under test) performance questions. To compose such a test, it is beneficial to run in-house real-user testing and analyze their behavior and usage patterns. For example, it might not be immediately clear that the application usage is higher on weekends and evenings and lower during the night (or vice versa!). Or, having some sort of functionality on a time-recurring basis (such as synchronizations of various kinds) could introduce an unwanted usage spike and is likely better off distributed evenly. Looking ahead, potentially, machine learning could completely change the way performance testing is done and measured.
The key is to enable and empower DevOps and Continuous Delivery models, as these are increasingly analogous to “agile business.” All monitoring products will need to monitor across the hybrid data center, including both monitoring the on-premise deployed applications as well as those deployed in the public cloud. By design, this requires and forces cross-silo collaboration. Improvements centered on collaborative enablement, ease of use, and learning-based analytics will be a key focus for all monitoring and optimization vendors.
First things to tackle when improving performance:
- Use database indexes (crucial).
- Use the latest version of the servers your application relies on. For instance, if a new database version is released with improved performance, use that one.
- Use the latest version for your application dependencies. You can gain performance by just updating a database driver to the latest version.
- Use caching. Cache in memory the most accessed data structures. No matter how fast a DB is, keeping things in memory will be faster.
- No matter what caching technology you use, it will eventually need to read from the disk or write something to the disk. Use fast disks. Use SSD, unless you cannot afford it — then, use whatever you can. After all, memory disk is the most important component performance wise.
- Stress-test your product. Go for the parts that perform badly. Sometimes underperforming modules are not mission critical, so it is better to focus on the parts of the product that are most used to have the biggest bang for your buck.
I see a big opportunity in the field of monitoring the “cattle.” Applications are shifting to being deployed in a way where they are provisioned onto temporary, elastic number of anonymous servers. Monitoring and triaging this type of applications is difficult and orchestration tools only provide rudimentary health monitoring (mainly to identify failed nodes and scale them up). Many of the popular monitoring tools nowadays are not suited for this new method so there is a big opportunity here.
Better overall user experience and happier partners. Fewer subjective conversations about how something is perceived. Alignment both internally and externally. When our monitoring and optimization techniques allow us to set appropriate baselines that we can validate with our community, certain conversations are so much easier. There are far less often conversations about a release seeming slower or buggier or whatever. Data allows everyone to review and validate what is really happening. With this information, everyone can work together to figure out what is causing perception issues and more efficiently work toward a resolution.
By the way, here's who we spoke to!
- Josh Gray, Chief Architect, Cedexis.
- Jeff Bishop, General Manager, ConnectWise Control.
- Bryan Jenks, CEO and Co-Founder, DropLit.io.
- Doru Paraschiv, Co-Founder, IRON Sheep TECH.
- Yoav Landman, Co-Founder and CTO, JFrog.
- Jim Frey, V.P. Strategic Alliances, Kentik.
- Eric Sigler, Head of DevOps, PagerDuty.
- Nick Kephart, Senior Director Product Marketing, ThousandEyes.
- Kunal Agarwal, CEO, Unravel Data.
- Len Rosenthal, CMO, Virtual Instruments.
- Alex Rysenko, Lead Software Engineer, and Eugene Abramchuk, Senior Performance Engineer, Waverley Software.
Opinions expressed by DZone contributors are their own.