If you work for a company which has separate development and operations team, you know how important it is for both teams to communicate to build a more robust architecture. By a robust architecture I mean a well written application and a very effective monitoring solution. Based on my experience as working for the operations team of a huge IT company, I am listing a few guidelines / best practices that I feel developers should follow to help us do a better job and help create a robust architecture.
1. Frequent meetings with the operations team regarding the application architecture
Nothing helps the operations team more than interacting with the developers. The developers should involve the operations team in architecture review and product discussion meetings. This will bring out the challenges that the operations team might face in bringing out good monitoring solutions for the product. It will also bring to light the issues that the developers might face with deploying the new features / product in the current environment as nobody has a better understanding of the production setup than the operations team.
Such meetings also help the operations team understand the known issues with the product. You don't want the operations team paging the developers in the middle of the night for a known issue just because they were not aware of it.
2. Effective logging with support for log rotation
In the world of operations, logging is very important. To debug any issue, the first thing that we see are the logs. So it is imperative for the developers to create an application that logs all it's activities. Ideally, the application should have an application log and an error log. Another important aspect of logging is log rotation. The application should be capable of rotating it's logs. This has two benefits:
- Since the log gets rotated, the current log that the application is writing to will be small in size. This means it takes lesser time to write to these logs and hence the application runs pretty smooth. Opening and then writing to huge logs is a burden which can be avoided.
- Smaller log files means that it is easier and quicker to find what we are looking for.
- Purging of old log files is easy. If we have one single log file, we may need to stop the application first and then delete the log file. Also, consider you want to delete all the log entries before a specified date / time, it is very difficult to do so in the case of a monolithic log file.
A good way to make your application support logging is to make it handle the SIGHUP signal. Whenever the application receives this signal, it halts all processing and creates a new log file while renaming the current log file to something like app.date.log. This would also solve the issue of application downtime. Ideally, I would prefer to have an external log rotation program send SIGHUP to the application that forces the it to rotate it's logs. The log rotation program could have additional features like the no. of days it should retain the logs, compression of the logs etc.
3. Expose key aspects of the application as an api
Consider a web crawler as an example. Lets say that the crawler crawls atleast 200 URL's per minute. If we can monitor this metric, we could put monitoring in place for this. This will be possible only if developers expose this as an api which the operations personnel can call in as a part of their monitoring script.
A lot many more points can be added to this article based on what type of application you are providing support for. I hope this article has listed a few key points that should be kept in mind while building a product / application.