DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Split the Monolith: What, When, How
  • 5 Simple Tips to Keep Dockerized Apps Secure
  • The Role of Data Brokers in Software Development: Navigating Ethics and Privacy Concerns
  • Docker Image Building Best Practices

Trending

  • Customer 360: Fraud Detection in Fintech With PySpark and ML
  • Teradata Performance and Skew Prevention Tips
  • Beyond Linguistics: Real-Time Domain Event Mapping with WebSocket and Spring Boot
  • How AI Agents Are Transforming Enterprise Automation Architecture
  1. DZone
  2. Data Engineering
  3. Data
  4. Best Practices for Efficient Log Management and Monitoring

Best Practices for Efficient Log Management and Monitoring

Check out these best practices for efficient log management and monitoring practices.

By 
Twain Taylor user avatar
Twain Taylor
·
Stefan Thies user avatar
Stefan Thies
·
May. 09, 19 · Presentation
Likes (1)
Comment
Save
Tweet
Share
9.3K Views

Join the DZone community and get the full member experience.

Join For Free

When managing cloud-native applications, it's essential to have end-to-end visibility into what's happening at any given time. This is especially true because of the distributed and dynamic nature of cloud-native apps, which are often deployed using ephemeral technologies like containers and serverless functions.

With so much flux and complexity across a cloud-native system, it's important to have robust monitoring and logging in place to control and manage the inevitable chaos. This post discusses what we consider to be some of the best practices and standards to follow when logging and monitoring cloud-native applications.

1. Use a Managed Logging Solution Vs. Building Your Own Infrastructure

First off, logging should reflect your applications. In a world of cloud-native applications, logging solutions should be built on the same principles as high availability, distributed processing, and intelligent failover that consequently lay the foundation for the applications themselves. This is what differentiates modern cloud-native apps from legacy monolithic apps.

The tools to implement this approach include Elasticsearch, Fluentd, Kibana (which, together, are often called the EFK stack), and others. They are architected to handle large-scale data analysis and deliver results in real time. They facilitate complex search queries over data and enable open API-based integration with other tools. However, though the raw materials are available, bringing it all together and making sure it meets your purposes is a whole other challenge.

Rather than build out this system on your own, it makes sense to use a managed logging solution that is built and scaled by a vendor. We go over that in detail in 5 Reasons to Run Elastic Stack in the Cloud. With ready-made integrations, all you need to do is connect your sources and destinations, and you're all set to analyze application logs the easy way. This leaves you free to spend more time monitoring and logging your application rather than building out logging infrastructure.

2. Know What Logs to Monitor, and What Not to Monitor

Know what not to log. Just because you can log something doesn't mean you should — and logging too much data can make it harder to find the data that actually matters. It also adds complexity to your log storage and management processes because it gives you more logs to manage.

Thus, consider carefully what you actually need to log. Any types of production-environment data that are critical for compliance or auditing purposes should certainly be logged. So should data that helps you troubleshoot performance problems, solve user-experience issues, or monitor security-related events.

On the other hand, there are categories of data that you do not need to log, such as data from test environments that are not an essential part of your software delivery pipeline. There are also some kinds of data that you should not log for compliance or security reasons. For example, if a user has enabled a do-not-track setting, you should not log data associated with that user. Similarly, you should avoid logging highly sensitive data, such as credit card numbers, unless you are certain that your logging and storage processes meet the security requirements for that data.

3. Implement Log Security and Retention Policy

Logs contain sensitive data. A log security policy should review sensitive data — like personal data of your clients or internal access keys for APIs. Make sure that sensitive data gets anonymized or encrypted before you ship logs to any third party. GDPR log management best practices teach you about good practices for data protection of sensitive data and personal data in web server logs. The secure transport of log data to log management servers requires the setup of encrypted endpoints for TLS or HTTPS on client and server side.

Logs from different sources might require different retention times. Some applications are only relevant for troubleshooting for a few days. Security-related or business transaction logs require longer retention times. Therefore, a retention policy should be flexible, depending on the log source.

4. Log Storage

Planning the capacity for log storage should consider high load peaks. When systems run well, the amount of data produced per day is nearly constant and depends mainly on the system utilization and amount of transactions per day. In the case of critical system errors, we typically see accelerated growth in the log volume. If the log storage hits storage limits, you lose the latest logs, which are essential to fix system errors. The log storage must work as a cyclic buffer, which deletes the oldest data first before any storage limit is applied.

Design your log storage so that it's scalable and reliable — there is nothing worse than having system downtimes and a lack of information for troubleshooting, which in turn, can elongate downtime.

Log storage should have a separate security policy. Every attacker will try to avoid or delete his traces in log files. Therefore you should ship logs in real-time to the central log storage. If the attacker has access to your infrastructure, sending logs off-site, e.g., using a logging SaaS will help keep evidence untampered.

5. Review & Constantly Maintain Your Logs

Unmaintained log data could lead to longer troubleshooting times, risks of exposing sensitive data or higher costs for log storage. Review the log output of your applications and adjust it to your needs. Reviews should cover usability, operational and security aspects.

Create Meaningful Log Messages

Readable and useful log messages are key for faster troubleshooting. If logs contain only some error codes or 'cryptic' error messages it can be difficult to understand. As a developer, you can save your organization a lot of time by providing a meaningful log message.

Use Structured Log Formats

The log format should be structured (e.g., JSON or key/value format) having various fields like timestamp, severity, message and any other relevant data fields like process ID, transaction ID, etc. If you don't use a unique log format for all your applications, normalize the logs in the log shipper. Parse logs and store logs in a structured format.

Make Log Level Configurable

Some applications logs are too verbose and other application logs don't provide enough information about the activities. Adjustable log levels are the key to configure the verbosity of logs. Another topic for log reviews is the challenge to balance between logging relevant information and not exposing personal data or security-related information. If so, make sure that those messages can be anonymized or encrypted.

Inspect Audit Logs Frequently

Acting on security issues is crucial - so you should always have an eye on audit logs. Setup security tools such as auditd or OSSEC agents. The tools implement real-time log analysis and generate alert logs pointing to potential security issues. On top of such audit logs, you should define alerts on logs in order to be notified quickly on any suspicious activity. For more details, check out a quick tutorial on using auditd, plus you'll find some complementary frameworks too.

Use a Checklist for Log Reviews:

  • Is the log message meaningful for users?
  • Does the log message include context for troubleshooting?
  • Are the log message structured and include
    • Timestamp
    • Severity/log level
    • Message
    • Additional troubleshooting information in separate fields
  • Are third-party logs parsed and structured (configure log shipper)?
  • Are log levels configurable?
  • Does the log message include personal data or security-related data?
  • Inspect audit logs and adjust log alert rules
  • Setup alerts on logs

6. Don't Do Log Analysis in a Silo: Correlate All Data Sources

Connect the dots. Logging is one part of an entire monitoring strategy. To practice truly effective monitoring, you need to complement your logging with other types of monitoring like monitoring based on events, alerts, and tracing. This is the only way to get the whole story of what's happening at any point in time. Logs are great for giving you high-definition detail on issues, but this is useful only once you've seen the forest and are ready to zoom into the trees. Metrics and events at an aggregate level may be more effective, especially when starting to troubleshoot an issue.

Don't look at logs in a silo - Complement them with other types of monitoring like APM, network monitoring, infrastructure monitoring, and more. See APM vs. Log Management for more detail. This also means that the monitoring solution you use should be comprehensive enough to provide all your monitoring information in one place, or flexible enough to easily integrate with other tools that provide this information. This way, as a user, you have a single-pane view of your entire stack.

7. View Logging as an Enabler of GitOps

For busy DevOps teams, it's easy to view logging as a nice-to-have, or an add-on that you can embrace once you've figured out automated CI/CD pipelines and are releasing more frequently. However, another way to look at logging is to see it as an enabler of DevOps and CI/CD. To practice automation at every step of the development pipeline, you need the visibility to know where issues are introduced, and what the main sources of these issues are - faulty code, dependency issues, external attacks, insufficient resources, or something else. The causes can be innumerable, but logging gives you the insight you need to find and fix these issues.

As continuous integration increasingly becomes about enabling GitOps at the very start of the pipeline, there's a need to not compromise on quality and security authentication in the name of automation and speed.

8. Get Real-Time Feedback on Any Type of Events

Automated testing and new approaches like headless testing are making it possible to get real-time feedback on every single code change in a developer environment, even before a commit. As testing shifts left, and there is an increasing focus on the start of the pipeline, logging is essential to gain visibility and enable GitOps. Without the appropriate testing and logging, you'll be left with runaway releases and deployment hell.

9. Use Logging to Identify Automation Opportunities and Trends

Logging helps to catch issues early on in the pipeline and saves your team valuable time and energy. It also helps you find opportunities for automation. You can set up custom alerts to trigger when something breaks, and even set up automated actions to be initiated when these alerts are triggered. Whether it's through Slack, a custom script, or a Jenkins automation plugin, you can drive automation in your GitOps process using logs. For all these reasons, you need to view logging as an enabler and driver of GitOps rather than an add-on.

Conclusion and Next Steps

In conclusion, logging is an essential part of building and managing cloud-native apps. For logging to be successful, it should reflect the state of your applications and be able to scale along with them. Logging should never be done in a silo. This is why a monitoring solution for cloud-native applications should consider other types of monitoring and metrics. Logging can often be viewed as an afterthought, but teams that want to go all the way with GitOps see logging as a driver and enabler of observability, and hence, as indispensable.

Data (computing) application Continuous Integration/Deployment Personal data security IT

Published at DZone with permission of Twain Taylor, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Split the Monolith: What, When, How
  • 5 Simple Tips to Keep Dockerized Apps Secure
  • The Role of Data Brokers in Software Development: Navigating Ethics and Privacy Concerns
  • Docker Image Building Best Practices

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!