DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workkloads.

Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Surviving the Incident
  • Detect Log4j Vulnerability Using ACS
  • How To Detect and Secure Your Java App From Log4j Vulnerabilities
  • How to Check if a Java Project Depends on A Vulnerable Version of Log4j

Trending

  • DZone's Article Submission Guidelines
  • Docker Base Images Demystified: A Practical Guide
  • The Modern Data Stack Is Overrated — Here’s What Works
  • Unlocking AI Coding Assistants Part 4: Generate Spring Boot Application
  1. DZone
  2. Coding
  3. Java
  4. Aggregating Error Logs to Send a Warning Email When Too Many of Them – Log4j, Stat4j, SMTPAppender

Aggregating Error Logs to Send a Warning Email When Too Many of Them – Log4j, Stat4j, SMTPAppender

By 
Jakub Holý user avatar
Jakub Holý
·
Oct. 19, 11 · Interview
Likes (1)
Comment
Save
Tweet
Share
16.2K Views

Join the DZone community and get the full member experience.

Join For Free

our development team wanted to get notified as soon as something goes wrong in our production system, a critical java web application serving thousands of customers daily. the idea was to let it send us an email when there are too many errors, indicating usually a problem with a database, an external web service, or something really bad with the application itself. in this post i want to present a simple solution we have implemented using a custom log4j appender based on stats4j and an smtpappender (which is more difficult to configure and troubleshoot than you might expect) and in the following post i explore how to achieve the same effect with the open-source hyperic hq monitoring sw.

the challenge

we faced the following challenges with the logs:

  • it’s unfortunately normal to have certain number of exceptions (customers select search criteria yielding no results, temporary, unimportant outages of external services etc.) and we certainly don’t want to be spammed because of that. so the solution must have a configurable threshold and only send an alert when it is exceeded.
  • the failure rate should be computed for a configurable period (long enough not to trigger an alert because of few-minutes outages yet short enough for the team to be informed asap when something serious happens).
  • once an alert is send, no further alerts should be send again for some time (ideally until the original problem is fixed), we don’t want to be spammed because of a problem we already know about.

the solution

we’ve based our solution on lara d’abreo’s stat4j , which provides a custom log4j appender that uses the logs to compute configurable measures and triggers alerts when they exceed their warning or critical thresholds. it is couple of years old, alpha-quality (regarding its generality and flexibility) open-source library, which is fortunately simple enough to be modified easily for one’s needs.

so we have tweaked stat4j to produce alerts when the number of alerts exceeds thresholds and keep quiet thereafter and combined that with a log4j smtpappender that listens for the alerts and sends them via e-mail to the team.

stat4j tweaking

the key components of stat4j are the stat4jappender for log4j itself, calculators (measures) that aggregate the individual logs (e.g. by counting them or extracting some number form them), statistics that define which logs to consider via regular expressions and how to process them by referencing a calculator, and finally alerts that log a warning when the value of a statistics exceeds its limits. you can learn more in an article that introduces stat4j .

we have implemented a custom measure calculator, runningrate (to count the number of failures in the last n minutes) and modified stat4j as  follows:

  • we’ve enhanced alert to support a new attribute, quietperiod , so that once triggered, subsequent alerts will be ignored for that duration (unless the previous alert was just a warning while the new one is a critical one)
  • we’ve modified the appender to include the log’s throwable together with the log message, which is then passed to the individual statistics calcualtors, so that we could filter more precisely what we want to count
  • finally we’ve modified alert to log alerts as errors instead of warnings so that  the smtpappender wouldn’t ignore them

get our modified stat4j from github (sources or a compiled jar ). disclaimer: it is one day’s hack and i’m not proud of the code.

stat4j configuration

take the example stat4j.properties and put it on the classpath. it is already configured with the correct calculator, statistics, and alert. see this part:

...
### jakub holy - my config
calculator.minuterate.classname=net.sourceforge.stat4j.calculators.runningrate
# period is in [ms] 1000 * 60 * 10 = 10 min:
calculator.minuterate.period=600000

statistic.runningerrorrate.description=errors per 10 minutes
statistic.runningerrorrate.calculator=minuterate
# regular expression to match "<throwable.tostring> <- <original log message>"
statistic.runningerrorrate.first.match=.*exception.*

# error rate
alert.toomanyerrorsrecently.description=too many errors in the log
alert.toomanyerrorsrecently.statistic=runningerrorrate
alert.toomanyerrorsrecently.warn= >=3
alert.toomanyerrorsrecently.critical= >=10
alert.toomanyerrorsrecently.category=alerts
# ignore following warnings (or criticals, after the first critical) for the given amount of time:
# 1000 * 60 * 100 = 100 min
alert.toomanyerrorsrecently.quietperiod=6000000

the important config params are

  • calculator.minuterate.period (in ms) – count errors over this period, reset the count at its end; a reasonable value may be 10 minutes
  • alert.toomanyerrorsrecently.warn and .critical – trigger the alert when so many errors in the period has been encountered; reasonable values depend on your application’s normal error rate
  • alert.toomanyerrorsrecently.quietperiod (in ms) – don’t send further alerts for this period not to spam in a persistent failure situation; the reasonable value depends on how quickly you usually fix problems, 1 hour would seem ok to me
  • notice that statistic.runningerrorrate.first.match is a regular expression defining which logs to count; “.*” would include any log, “your\.package\..*exception” any exception in the package and so on, you can even specify logs to exclude using a negative lookahead ( (?! x ))

log4j configuration

now we need to tell log4j to use the stat4j appender to count error occurences and to send alerts via email:

log4j.rootcategory=debug, console, fileappender, stat4jappender
...
### stat4jappender & emailalertsappender ###
# collects statistics about logs and sends alerts when there
# were too many failures in cooperation with the emailalertsappender

## stat4jappender
log4j.appender.stat4jappender=net.sourceforge.stat4j.log4j.stat4jappender
log4j.appender.stat4jappender.threshold=error
# for configuration see stat4j.properties

## emailalertsappender
# beware: smtpappender ignores its thresholds and only evers sends error or higher messages
log4j.category.alerts=error, emailalertsappender
log4j.appender.emailalertsappender=org.apache.log4j.net.smtpappender
log4j.appender.emailalertsappender.to=dummy@example.com
# beware: the address below must have a valid domain or some receivers will reject it (e.g. gmail)
log4j.appender.emailalertsappender.from=noreply-stat4j@google.no
log4j.appender.emailalertsappender.smtphost=172.20.20.70
log4j.appender.emailalertsappender.buffersize=1
log4j.appender.emailalertsappender.subject=[stat4j] too many exceptions in log
log4j.appender.emailalertsappender.layout=org.apache.log4j.patternlayout
log4j.appender.emailalertsappender.layout.conversionpattern=%d{iso8601} %-5p %x{clientidentifier} %c %x - %m%n

comments

  • #8 specify the stat4j appender
  • #9 only send errors to stat4j, we are not interested in less serious exceptions
  • #14 “alerts” is the log category used by stat4jappender to log alerts (the same you would create via logger.getlogger(“alerts”)); as mentioned, smtpappender will without respect to the configuration only process errors and higher

issues with the smtpappender

it is quite tricky to get the smtpappender working. some pitfall:

  • smtpappender ignores all logs that are not error or higher without respect to how you set its threshold
  • if you specify a non-existing from domain then some recipient’s mail servers can just delete the email as spam (e.g. gmail)
  • to send emails, you of course need mail.jar (and for older jvms also activation.jar), here are instructions for tomcat

and one $100 tip: to debug it, run your application in the debug mode and set a method breakpoint on javax.mail.transport#send (you don’t need the source code) and when there, set this.session.debug to true to get a very detailed log of the following smtp communication in the server log.

sidenote

the fact that this article is based on log4j doesn’t mean i’d personally choose it, it just came with the project. i’d at least consider using the newer and shiny logback instead :-) .

conclusion

stat4j + smtpappender are a very good base for a rather flexible do-it-yourself alerting system based on logs and e-mail. you can achieve the same thing out-out-the-box with hyperic hq plus much much more (provided that you get your admins to open two ports for it), which i will describe in the next blog post.

links

  • an alternative for preventing the smtpappender from spamming in persisten failure situations (aside of its built-in buffer size): log4j-email-throttle
  • eventconsolidatingappender – announced via mailing list in 2/2011 – “the purpose of this appender is to consolidate multiple events that are received
    by a single logger within a specified number of seconds into a single event; this single consolidated event is then forwarded to a ‘downstream’ appender”

from http://theholyjava.wordpress.com/2011/10/15/aggregating-error-logs-to-send-a-warning-email-when-too-many-of-them-log4j-stat4j-smtpappender/

Log4j

Opinions expressed by DZone contributors are their own.

Related

  • Surviving the Incident
  • Detect Log4j Vulnerability Using ACS
  • How To Detect and Secure Your Java App From Log4j Vulnerabilities
  • How to Check if a Java Project Depends on A Vulnerable Version of Log4j

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!