To trouble-shoot software, logging of some kind is essential. But for most systems, it is simply not possible to log everything that happens. Many systems and logging frameworks let you limit the amount of data by giving a logging level (e.g. error, warning, info, debug) and by specifying where in the code logging should be done. An alternative way of limiting the data is to use session-based logging. You then get all data pertaining to a specific session, but nothing for any of the other sessions.
For example, at Symsoft we make systems for mobile phone operators to process calls and text messages (SMS). A typical session in our case is a phone call, or the sending of an SMS. With session-based logging, one log entry contains everything that happens during the session. It contains the initial message that triggers the processing in the system, all log statements relating to the internal logic, and all external signaling (and responses) that are part of the processing. Here is an example log of the sending of an SMS. This is an extract:
-- 6, smrouter: MsRouter: Analyse SMS control for service key 1565 (Collected Info (orig)) -- 7, smrouter: MsRouter: Invoking SMS control function SM Quota Control -- 7, smquotacontrol: Process message -- 8, smquotacontrol: Unconfigured service key 1565 -- 8, smrouter: MsRouter: SMS Control Continue -- 8, smrouter: MsRouter: Analyse destination 46704123456 [Numbering plan, MSISDN] [Type of number, INTERNATIONAL], RI=0
The numbers to the left are the milliseconds since the session started. In the extract above, we can see that the software module smquotacontrol was involved, and that the service key 1565 was not configured. We can also see that the destination phone number is 46704123456. In the complete log of this SMS sending, we can for example see the complete contents of all SS7 messages sent and received during the delivery, the text of the SMS, that the delivery was successful, and that the whole process took 128 milliseconds.
At the start of the session, a check is made to see if logging should be done (based on, for example, the subscriber phone number, user id or similar). The logging statements look like regular logging statements:
There is no logging level – the logging is always done at the most detailed level. At the end of the session, the complete session log is output.
Session-based logging means that you get very detailed information on everything that happens in a certain session. There is no need to grep through lots of log statements, looking for the parts you are interested in. Conversely, there are no missing logging statements that would help understand what happened – everything is included.
In an SMS processing system, there can be several thousand SMSs handled each second. Outputting detailed logs for each SMS is clearly not feasible. But with session-based logging, you can set a trigger for e.g. phone number 46702000029. As soon as that subscriber sends an SMS, a detailed log of the complete session is output.
With this kind of logs, troubleshooting is often quite straightforward. You can see all the data that came in. You can follow the logic, and see if messages are sent out over external interfaces. If there is no reply, it is immediately evident (the session will time out after, say, 15 seconds, print a message and exit). The time-stamps make it easy to see if any part of the processing took too long. In short, it is a great tool for seeing what is going on in a live system.
Some logging frameworks allow you to attach a session identifier (for example a user id or a phone number) to the executing thread, and to use that as a filtering criteria for what to output. This is a move in the right direction. However, the session concept often extends beyond the current thread. For example, when sending out an external message over an interface (for example a HTTP request), the response is typically processed by another thread. To capture the complete session, there needs to be a mechanism to transfer the session identifier to the next thread serving the session.
In our system at work, we have a useful variation of the session logging calledTrace On Error. When activated, if an exception (e.g. a NullPointerException) is thrown, the complete trace of the session up to that point is output, in addition to the stack trace. It works by starting session logging for every session, but only outputting the result if an exception occurs. If not, all the collected session information (basically a bunch of Strings) is discarded. This puts some pressure on the heap and the garbage collection, but not an awful lot. We have used this mechanism on live system several times in order to get more information on difficult to solve bugs.
I should also mention that in addition to session-based logging, we also have traditional, statement-based logging. This type of logging is used for events like configuration changes, periodic outputs etc.
Is your system using session-based logging, or individual logging statements (using a logging framework, or custom made), or something else? Let me know in the comments.