System deployments and upgrades usually need to perform many actions. If we can detect and improve time-consuming steps, we will get better customer satisfaction or a shorter maintenance time window. However, when tons of steps are involved, how we can easily examine and figure out the bottleneck?
Permanent Link: http://dennyzhang.com/list_slowest_steps
You may think, if we can make sure all critical actions log messages use the same timestamp format, we could get the time elapsed for each step.
The answer is Yes and No. Usually deployment may run automation scripts of several components or modules. e.g some are in bash, some in Chef/Puppet/Ansbile, some in Python, etc. It's hard to enforce the practice, especially for the timestamp format convention. The good thing is that every professional tool and engineer will do effective logging for all critical actions, if not all. So the missing part here is how to attach the unified timestamp.
Fortunately Jenkins has a useful plugin called Timestamper. It can add timestamps to the Console Output of Jenkins jobs.
Here is the idea:
- Automate deployment procedure as a bash script and run it as a Jenkins job.
- Enable Jenkins Timestamper plugin properly for this Jenkins job.
- Caculate time performance of each step by parsing the Jenkins Console Output line by line.
- Sort steps by time performance with a descending order.
For better user experience, I've defined a Jenkins job: DiagnosticJenkinsJobSlow. Below is a real example of how it works.
Like our blog posts? Discuss with us on LinkedIn, Wechat, or our Newsletter.