Over a million developers have joined DZone.

9 Key Tips for Production Environment Maintenance

Denny Zhang explains his nine key tips for maintaining production environments and providing developers with meaningful feedback.

· DevOps Zone

Discover how to optimize your DevOps workflows with our cloud-based automated testing infrastructure, brought to you in partnership with Sauce Labs

To break silos and improve availability, DevOps and Ops should be actively collecting useful feedback of production environment maintenance on a regular basis. They should also enable developers to easily access this feedback to improve the feedback loop together as a team effort.

How to Provide Developers with Meaningful Feedback


Image from http://dennyzhang.com/continuous_feedback

1. Monitor at Both the OS and Process Levels

2. Detect Resource Leaks in Your Applications

  • Memory leak: This defect is a close friend of service outages. If memory usage keeps rising steadily, ring a bell to your dev team.
  • Stale file handlers: Files may have been deleted somehow, but your application still holds the file handlers or even reads and write those files. Detect it with "pmap -x $pid | grep deleted".
  • Overwhelming network sockets: Either your application can't serve requests fast enough or it has issues reclaiming socket fd. Check this by "lsof -p $pid | grep -iE 'TCP|pipe|socket|anon_inode'". If lots of TCP sockets are in the WAIT_CLOSE state, it's a bad sign.

3. Always Be on Top of Logfiles

Believe or not, I've seen applications diligently recording hundreds of messages to logfiles every second. This eats up disk quickly, even before logrotate takes effect.

For application logging, alert developers about any major errors or exceptions found. For syslogs, DevOps/Ops are usually the only gatekeepers.

4. Monitor DB Slow Query

This usually incurs random or constant performance penalties to your applications. If we can grab this information for developers, it can be a very valuable input for developers' trouble shooting.

5. Change History Of Production Environments

A clear and full change list of production environments may empower developers to identify root causes quickly. See how to Automatically Track All Change History.

6. Observe Machine Reboots and Service Restarts

Not all developers know or remember that the /tmp directory won't survive a machine reboot. This turns into issues when it does reboot. Scan the source code for /tmp and alert developers if necessary.

Restarting services can be scary. Since a service stop is doing a magic clean shutdown, it might close requests in processing, flush the data to a disk, etc. A service start might be slow or even fail due to complicated service dependencies. Some behaviors might not align to developers' assumptions. For example, it might take too long to stop or start service, miss tricks when it's stuck for a long time, etc. Thus, DevOps/Ops should observe this carefully and pass it to developers.

7. Enable Coredump When Applications Crash

Coredump helps developers understand which thread and which function cause a crash.

8. Examine JVM for Key Metrics

For Java application operation, the JVM toolkit can help detect suspicious issues. Be familiar with tools like jps, jstack, jmap, etc.

9. Simulate Production Environments at a Reasonable Cost

The last but not the least. If DevOps can simulate production environments quickly, developers can have a safe playground to do tests or dry-run patches. Some common obstacles to achieve this are:

  • Budget concern: We may need to start enough VMs, in order to get a min production environment.
  • Automate to automate: Not only to automate cluster deployment, but also to automate data export and import.
  • Simulate production environments as much as possible: This is the most difficult part, and it varies across projects.

More Reading: Generate Common DB Data Report By ELK

Like our blog posts? Discuss with us on LinkedIn, Wechat, or Newsletter.

Download “The DevOps Journey - From Waterfall to Continuous Delivery” to learn learn about the importance of integrating automated testing into the DevOps workflow, brought to you in partnership with Sauce Labs.


Published at DZone with permission of Denny Zhang, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}