DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • Your Network, Your Rules: Take Charge With Own DNS
  • Understanding DNS Records: What They Are and Why They Matter
  • Dynamic File Upload Component in Salesforce LWC
  • Balance Trade-Offs in Microservices Architecture

Trending

  • Securing the Future: Best Practices for Privacy and Data Governance in LLMOps
  • Is Big Data Dying?
  • Bridging UI, DevOps, and AI: A Full-Stack Engineer’s Approach to Resilient Systems
  • What’s Got Me Interested in OpenTelemetry—And Pursuing Certification

Handle Errors in your Batch Job… Like a Champ!

By 
Mariano Gonzalez user avatar
Mariano Gonzalez
·
Jun. 26, 14 · Interview
Likes (0)
Comment
Save
Tweet
Share
15.3K Views

Join the DZone community and get the full member experience.

Join For Free

Fact: Batch Jobs are tricky to handle when exceptions raise. The problem is the huge amounts of data that these jobs are designed to take. If you’re processing 1 million records you simply can’t log everything. Logs would become huge and unreadable. Not to mention the performance toll it would take. On the other hand, if you log too little then it’s impossible to know what went wrong, and if 30 thousand records failed, not knowing what’s wrong with them can be a royal pain. This is a trade-off not simple to overcome.

We took the feedback we got from the early releases around this issue, and now that Mule 3.5.0 and the Batch Module finally went GA with the May 2014 release, small but significant improvements took place to help you with these errors. Let’s take a look at some simple scenarios.

Before we start

This post assumes you’re familiar with the new Batch module included in Mule 3.5. If that’s not the case, here’re some links you might want to read before continuing:

  • When ETL met the ESB: Introducing the Batch Module
  • Batch Module Reloaded
  • Batch Module documentation

Records Failing in a Step

So, let’s take a simple scenario. I have a simple step and records tend to fail there.  This is the simplest manifestation of the trade-off we talked about at the beginning of this post. Logging a whole lot of unmanageable  text or log so little that is barely useful. The strategy we took here is to not log exceptions but to log stack traces. Yeap, you read correctly. When we found an exception in a step we perform the following algorithm:

  • Get the exception’s full stack trace
  • Strip that stack trace from any messages. Even if all records always throw the same exception, those messages probably contain record specific information. E.g: ‘user Walter White is not eligible because he’s wanted by the DEA’. The user Jesse Pinkman would throw a slightly different message, but bottom line is that the exception is actually the same.
  • Verify if this stack trace was already logged in the current step. If this is the first time that stack trace is found on the current step, then we log it adding a nice message saying that we will not log it again for the current step. Otherwise, if we already saw that stack trace in the given step, then we simply ignore it but keep count of how many times that exception was found

Let’s see an example of how the WantedByDEAException would be logged:

This is the first record to show this exception on this step for this job instance. Subsequent records with the same failureswill not be logged for performance and log readability reasons:

********************************************************************************
Message               : null (org.mule.blog.WantedByDEAException). Message payload is of type: String
Code                  : MULE_ERROR--2
--------------------------------------------------------------------------------
Exception stack is:
1. null (org.mule.blog.WantedByDEAException)
  org.mule.blog.FraudValidator:11 (http://www.mulesoft.org/docs/site/current3/apidocs/org/mule/blog/WantedByDEAException.html)
2. null (org.mule.blog.WantedByDEAException). Message payload is of type: String (org.mule.api.MessagingException)
  org.mule.execution.ExceptionToMessagingExceptionExecutionInterceptor:32 (http://www.mulesoft.org/docs/site/current3/apidocs/org/mule/api/MessagingException.html)
--------------------------------------------------------------------------------
Root Exception stack trace:
org.mule.blog.WantedByDEAException
	at org.mule.blog.FraudValidator.process(FraudValidator.java:11)
	at org.mule.execution.ExceptionToMessagingExceptionExecutionInterceptor.execute(ExceptionToMessagingExceptionExecutionInterceptor.java:24)
	at org.mule.execution.MessageProcessorNotificationExecutionInterceptor.execute(MessageProcessorNotificationExecutionInterceptor.java:58)
    + 3 more (set debug level logging or '-Dmule.verbose.exceptions=true' for everything)
********************************************************************************

GOTCHA: This is done on a by step basis. If another step also throws the same exception, it WILL be logged once more for that step.

Exception Summary

The above is good. We get to see the exception without clogging the logs. However, because I only see the exception once, I don’t really know how many times it happened in total. For that we added an exception summary which is logged when the on-complete phase is reached. This summary shows for every exception type, how many times it happened in each step. Suppose that now we have a job with two steps, and both are capable of throwing BatchException. Then the exception summary would look like this:

**************************************************************************************************
*             - - + Exception Type + - -             *  - - + Step + - -   *  - - + Count + - -  *
**************************************************************************************************
* com.mulesoft.module.batch.exception.BatchException * Batch_Step          * 10                  *
* com.mulesoft.module.batch.exception.BatchException * Second_Batch_Step   * 9                   *
**************************************************************************************************

Here we can see that the first step failed 10 times and the second failed 9.

Getting more out of the logs

As this post starts saying, the motivation for all of this is the trade-off between large chunks of data and logging efficiency, that’s why all the items described above work with Mule’s default configuration of only logging INFO level messages (if you’re not familiar with logging levels, take a look at this) .

There’re special cases like debugging or when you’re motivation to use batch is not the ability to handle large datasets but recoverability, resilience, management, etc (see this post for more examples). For those cases, you can get a whole lot of extra and extremely verbose information by setting the batch category to DEBUG level in your app’s log4j configuration like this:

log4j.logger.com.mulesoft.module.batch=DEBUG

CAUTION: Use the DEBUG level with care. Definitively not advised for production environments unless you’re absolutely sure that your jobs will not handle significantly large data sets.

Error Handling MEL functions

The early access release of the Batch module already shipped with some error handling MEL functions that can be used in the context of a batch step. Let’s refresh that real quick:

  • #[isSuccessfulRecord()]: Boolean functions which returns true if the current record has not thrown exceptions in any prior step.
  • #[isFailedRecord()]: Boolean functions which returns true if the current record has thrown exceptions in any prior step.
  • #[failureExceptionForStep(String)]: This function receives the name of a step as a String argument. If the current record threw exception on that step, then it returns the actual Exception object. Otherwise it returns null

Again, the above is just fine. However, we quickly learn that those felt short in some cases. Let’s consider the following scenario:

  • A batch job that polls files containing contact information. Those contacts are transformer using DataMapper and fed into the job
  • The first step transforms them again and inserts them into Salesforce
  • The second step transforms again and inserts them into Google Contacts

So far so good. The challenging new requirement is to be able to write into a JMS dead-letter queue per each record that has failed. To keep it simple, let’s just say that the message will be the exception itself. In case you haven’t noticed, there’s a gotcha in this requirement: each record could have failed in both steps, which means that the same record would translate into two JMS messages.

To support this and other similar scenarios, we added these additional functions which also require to be executed in the context of a batch step:

  • #[getStepExceptions()]: Returns a java Map<String, Exception> in which the keys are the name of a batch step in which the current record has failed and the value is the exception itself. If the record hasn’t failed in any step, this Map will be empty but will never be null. Also, the Map contains no entries for steps in which the record hasn’t failed.
  • #[getFirstException()]: Returns the Exception for the very first step in which the current record has failed. If the record hasn’t failed in any step, then it returns null.
  • #[getLastException()]: Returns the Exception for the last step in which the current record has failed. If the record hasn’t failed in any step, then it returns null.

So let’s take a look at how this application would look like:

As you see, a pretty standard Batch Job. Because this is an error handling post, I won’t go into detail of the whole app and I’ll just focus on the last step. Let’s take a closer look:

Since the goal of this step is to gather failures, it makes sense to skip the ones who haven’t got any, thus we set the accept policy to failures only.

The first thing we do is to use the set-payload processor and the getStepExceptions() MEL function together so that the message payload is set to the exceptions map.

Remember how we said before that a given record could have many exceptions? That’s the nice thing about this map, it gives you access to the exceptions found in all steps. Since our goal is to send the exceptions through JMS and we don’t care about the steps, we can simply use a foreach to iterate over the map’s values (the exceptions) and send them through a JMS outbound endpoint:

And we’re done! We have happily aggregated exceptions!

Summary

The above features were introduced in the GA release of Mule 3.5.0. We tried our best to help you find equilibrium between useful and meaningful messages without generating undesired verbosity. Have better ideas or suggestions? Please reach out!

Record (computer science) Trade-off

Published at DZone with permission of Mariano Gonzalez, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Your Network, Your Rules: Take Charge With Own DNS
  • Understanding DNS Records: What They Are and Why They Matter
  • Dynamic File Upload Component in Salesforce LWC
  • Balance Trade-Offs in Microservices Architecture

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!