In the previous post, we explored the various monitoring categories in an enterprise architecture. We also looked at how error tracking solutions can help quickly identify issues in the architecture.
To reiterate, these are the categories that encompass monitoring as a whole.
- Error or exception tracking (covered in a previous post)
- Log management and tracking (covered in this post)
- Application performance monitoring
- Distributed tracing
- User Experience tracking
- Infrastructure monitoring
In this post, we will look into another related area: log management.
What Is a Log Management Solution?
A log management solution enables components of a software architecture to log data, collect the logged data, manage or retain logged data, and visualize or report on it as needed.
When designing a service-oriented architecture or picking components for an enterprise service bus, one of the more common mistakes is to ignore the architecture for centralized standardized logging.
Some of the key drivers in enterprise architecture to adopt a centralized log management solution tends to be around operations, auditing and security (in that order). Often this decision of choosing a centralized log management solution happens much later after all other business aspects of the architecture are dealt with.
This can lead to long troubleshooting windows, sub-optimal logging, lack of standardized logging mechanisms or nomenclature for logging data, poor performance, lack of clear audit trails and more.
What to Look for in a Log Management Solution
When choosing a log management and tracking solution, here are some key elements that you should consider in the architecture.
- Volume: Volume of logging, frequency, rotation, and sizing.
- Sources: Source of logs (mobile devices vs. on-prem vs. cloud).
- Encryption: Requirements for encryption of logging data or data at rest.
- Reliability: Understand what it takes to log reliably.
- Review the reliability requirements of logging systems and related infrastructure.
- Define how and whether logging data needs to be backed up.
- Security: Restricting access to logging data to only those that should have access.
- Libraries: Choice of logging libraries or systems across various languages.
- Management: Management interfaces for the control of these loggers (turning them on or off) and log levels at runtime in production.
- Log levels: Default log levels in production (INFOs) and performance implications of logging in production in hot paths.
- Format: Outline the format of logged data across the enterprise.
- Auditing: Functional audit requirements (i.e. login events, call detail records, or other related kinds).
- Searching and reporting: Understand what it takes to easily search through and report on logged data.
- Alarming:Certain log errors or warnings may require alarming. Consider using an error tracking solution instead for this.
- Sampling: Ability to sample logs (i.e. sample 10% of users and enable logging for them).
With the above considerations, it may be wise to investigate a centralized structured log management solution if the budget permits it.
What Information to Log and at What Level
Once a log management solution is picked, it is also important to have a framework that lets developers choose what information to log and at what level. This again is often overlooked and leads to random choices of logging data and levels.
Here are some aspects that should be documented as part of the architecture on what gets logged:
- Decide on a common logging format output across services. Preferably use the same configuration for logging across the entire architecture.
- Outline information that must not be logged. For example, any PII (Personally Identifiable Information) cannot be logged or should be encrypted before being logged.
- Ensure contextual data associated with errors is readily available in the logs. The last thing you want is needing to turn on debug logging for an error that has already happened.
- Take care in defining what information is too verbose to be logged at an info level as opposed to a debug level, especially when info may be the default level.
While there are several tools out there that allow you to log data, Trakerr.IO is a great platform to use in production that not only lets you log events in a structured way but also lets you capture other related information like errors and performance data all under a single platform.
Trakerr.IO offers many of the requirements that we outlined earlier for a log management solution by plugging into logging libraries in many popular languages.
What Kind of Data to Log to Trakerr.IO
We recommend getting started logging only critical events in a structured way with Trakerr.IO. Why? Because if you decide to perform debug or trace level logging on your production service for every call and performance is critical, it may end up impacting performance at large scale. Depending on your use-case it may still be possible to log everything to Trakerr.IO.
Each event in Trakerr.IO is identified by an event type and classification that can be supplied through the API/SDK. The event type and classification are completely customizable and can represent anything within the application.
So you may ask, what defines a critical event or something that may be useful to log to Trakerr.IO? Here are a few examples of what may be critical:
- Page view events.
- User click events.
- Application installation events.
- Application start/shutdown events.
- Database connections or calls.
- User login/logoff events.
- Any troubleshooting related event
With each of the above, Trakerr.IO’s SDK lets you also log performance, user, and session data along with the log statement.
For example, you can log that a database call took place along with the operation time in milliseconds for that operation to complete and which database was used to make that call.
This provides a more structured log statement that can be logged in a standardized way across different micro-services.
Once this data is logged to Trakerr.IO, you now have powerful search, segmentation, and alarming capabilities.
Searching now is super easy with this data being indexed.
You can now also view related events for a specific user or session.
Capture additional data along with each event, including OS information, IP, CPU, and memory utilization automatically when using the Trakerr.IO SDK.
Trakerr can integrate with many popular tools to make your life simpler.
Find out more — sign up for a free trial!