DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • Leveraging Natural Language Processing for Enhancing Sales Chatbots
  • AI Agents For Automated Claims Processing
  • SmartXML: An Alternative to XPath for Complex XML Files
  • Efficient Asynchronous Processing Using CyclicBarrier and CompletableFuture in Java

Trending

  • Traditional Testing and RAGAS: A Hybrid Strategy for Evaluating AI Chatbots
  • Navigating and Modernizing Legacy Codebases: A Developer's Guide to AI-Assisted Code Understanding
  • Tired of Spring Overhead? Try Dropwizard for Your Next Java Microservice
  • A Guide to Auto-Tagging and Lineage Tracking With OpenMetadata
  1. DZone
  2. Data Engineering
  3. Data
  4. Processing 200,000 Records in Under 3 Hours

Processing 200,000 Records in Under 3 Hours

A developer tells how he was able to use MuleSoft, Salesforce, and some integration known to solve a time crunch issues for his team.

By 
Jason Estevan user avatar
Jason Estevan
·
Mar. 20, 18 · Analysis
Likes (8)
Comment
Save
Tweet
Share
20.8K Views

Join the DZone community and get the full member experience.

Join For Free

"Pay Per Use" is a common billing strategy where customers are charged depending on how many times they use a certain feature. Recently, our current monthly batch process that loads usages records into Salesforce was failing due to the ESB system being overwhelmed. The system was originally designed to handle 20,000 records a month but we had just crossed the 200,000 record count last month, a 10x increase of what was expected. While a great problem to have for the business, as a developer and architect I had to find a way to keep a handle on this load and keep the process running.

Our current ESB system (name omitted) which runs in a 4 server cluster simply couldn't handle the size of the input. During the first hour it would process a consistent 22 transactions a second; however the longer it ran the slower it got. Often we would find it crawling along as low as 2 transactions a second. At this point, we would need to quickly restart the whole ESB before timeouts starting occurring and records were lost.

I pitched an idea to my Product Owner to allow me to move this process to MuleSoft's Anypoint Platform. He agreed as long as I could get all 200,000 usage records loaded under 3 hours. Challenge accepted!

Below is my first attempt at rewriting the batch job using Mule 4. For simplicity, I've removed  the components that don't demonstrate the major time saving actions I took. 
Image titleThe job starts when a newly uploaded CSV file is detected. After parsing each line in the file, which represents a usage record, we look up the customer's cost per usage (rate) in our local database. Then we combine the rate information to the payload and upload it to Salesforce. 

Results: Processing 200,000 usage records took 3 hours and 25 minutes. The run was stable unlike the current process but it needed to be faster.

Image title

After our first run, our database logs were full of messages complaining about the driver not being able to "to create a connection." Turns out, by default, Mule uses 16 threads to process records simultaneously. However, our database configuration was set to only allow a single connection at a time, causing requests by multiple threads to wait until the connection was free to query the database. As shown above, I increased the database connection pool to 5 and reran the process.

Results: Processing 200,000 usage records took 2 hours and 47 minutes ( - 38 minutes ). Much improved but it needed to be faster.

Image title

Creating timestamps entering and exiting each batch step allowed me to figure out where most of the processing time was spent and it was definitely on the call out to SalesForce. After browsing through the Salesforce Connector documentation I found an operation to upload objects in bulk. Awesome! I dragged that component onto the batch step, wrapped it inside a Batch Aggregator Scope, and reran the process again. Fingers crossed!

Results: Processing 200,000 usage records took 1 hour and 54 minutes ( - 53 minutes ).

I did it! The process handled 200,000 records in under 2 hours, averaging 29 transactions a second. This was much lower than the 3 hours the business needed which will accommodate future growth with no server crashes. I went back to my Product Owner and told him we no longer have to worry about incorrect customer billing. He was quite impressed with the results and shared the good news along with praises of my work across the organization. Thank you, Mule, for making me look good. 

Database connection Processing

Opinions expressed by DZone contributors are their own.

Related

  • Leveraging Natural Language Processing for Enhancing Sales Chatbots
  • AI Agents For Automated Claims Processing
  • SmartXML: An Alternative to XPath for Complex XML Files
  • Efficient Asynchronous Processing Using CyclicBarrier and CompletableFuture in Java

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!