Logging As a Last Resort
Design your application in a way that it reports errors as part of its normal operation. One should only need to look at logs if all else fails.
Join the DZone community and get the full member experience.Join For Free
In software development one often finds themselves investigating issues. Depending on the type of the application, various sources of informations can be available:
- verbal description of the problem
- logs (application, framework, container, application server, OS, ...)
- thread and heap dumps
In an ideal world, the exact inputs that caused an issue and the code that failed would be immediately available. In a typical case though, it can take hours of digging through logs and code to understand what happened. Would it not be nice to avoid that?
Don't Exclusively Rely on Logs
Logging Well Is Hard
Logs: looking for a needle in a hay stack. There might be a lot of hay. And no needle.
It is easy to log too much inconsequential data, ending up with megabytes of junk on disk. It is also easy to log too little. In fact, when writing code, it is very difficult to know exactly how much to log. Should I log the whole payload or just some fragment (e.g. the record ID). How big is a typical record? Log levels help somewhat (log the full payload at debug level) but then it is up to OPS to configure the application properly. And if at some point the operator decides that the logs are taking up too much disk space and reduces the logging level, there might be insufficient information to understand what happend during post mortem.
Accessing Logs May Be Hard
Getting logs from the production environment may require help from an Ops person. Just to get you the .tar.gz of today's logs, someone may need to retrieve system credentials, connect to a machine, locate the logs, and transfer the logs to an intermediate server, before sending them to you.
In case a log aggregation system is available, things might be easier. Then again, getting to play around with the production instance might also require jumping on a VPN, getting user credentials, and perhaps even some training. Again, from an Ops person. All of this takes valuable time: yours and somebody else's. And while it maight be fun (or unavoidable) to do this, there are usually other, better, ways to spend time.
What to Do Instead
Make Failure Data Easy to Get
Design your application with the expectation that something will fail and treat it as a normal thing. Use any available means to communicate to the user why their action resulted in a failure. What this means exactly depends on the type of application and its runtime environment.
- If the application runs in a managed platform, leverage what the platform provides. If it offers alerting or events, perhaps these can be used to explain an abnormal situation.
- In case of REST APIs, use appropriate HTTP status codes and informative error messages.
Make Failure Data Useful
Can your application explain itself in case of failure? Can it tell you:
- The context. What is the high-level task that failed?
- Input values. What data was provided by the user or upstream systems?
- Application state: What intermediate results were calculated successfully before failing?
- Failure reason. What caused the error?
Logging is an essential part of any application. But relying solely on logs can be more painful than necessary once a problem occurs. Using other available means to communicate application errors, in addition to logs, can provide more confidence in the application, reduce the time needed to troubleshoot, and allow for faster application improvement.
Opinions expressed by DZone contributors are their own.