Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

5 Extremely Stupid Mistakes I Recently Made

DZone's Guide to

5 Extremely Stupid Mistakes I Recently Made

We all make mistakes from time to time... and sometimes we make multiple mistakes all at once! MVB Grzegorz Ziemonski provides us with an enjoyable and candid look back at a critical Java application release and the mistakes made along the way.

Free Resource

Today’s data climate is fast-paced and it’s not slowing down. Here’s why your current integration solution is not enough. Brought to you in partnership with Liaison Technologies.

Michael Tharrington from the DZone crew recently suggested to me that I should write about a few of the mistakes that I've made and learned from as a developer. Well, the moment couldn't be more perfect—my team just went live with a critical application and everything that could go wrong, went wrong! Of course, none of the solutions were written solely by me—some were created when we were pair programming and everything went through a rigorous code review process, making this even more ridiculous.

Background

We were developing a simple web app exposing a REST API that communicates with an external provider and persists some of the results so that we don't do too many calls because we have to pay for each of them. The application mostly relies on Spring Boot web and data capabilities. Two instances of the application are deployed behind a load balancer on the company's servers.

Mistake 1 - Insulting the Concurrency Lords

The call to an external provider takes a long time, for which our client applications are not willing to wait. Because of this, we agreed to do background processing—the first time we're called we return a DONT_KNOW_YET_COME_BACK_LATERmessage1,... the next time, we return a persisted result. Of course, the question: What if we get two of the same calls in a short enough time frame? popped up—we have to somehow protect ourselves from calling the external provider twice with the same data. Here's what I did:

private Set<Long> currentlyProcessedIds = ConcurrentHashMap.newKeySet();

// further down in the code:
if (currentlyProcessedIds.add(id) {  
    doTheJob(id);
    currentlyProcessedIds.remove(id);
}

Can you spot what's wrong with this piece of code? Take your time if my stupidity isn't visible to you right away.

Of course, the whole thing breaks if doTheJob throws an exception! Dear Concurrency Lords, such an obvious violation... four pairs of eyes looking at this and nobody noticed that we could block an id from being correctly processed ever again. Here's the more concurrently-correct version:

if (currentlyProcessedIds.add(id) {  
    try {
        doTheJob(id);
    } finally {
        currentlyProcessedIds.remove(id);
    }
}

Mistake 2 - Load Was Balanced, Solution Was NOT

If you're smart enough, you've probably noticed what's wrong already. I intentionally wrote "concurrently-correct" above, because this code is still far from correct. We have two independent instances of the application behind a load balancer. This means that the problem is not only concurrent. It's also distributed!

We haven't implemented a distributed solution yet, but we obviously need to synchronize the currently processed set between the instances 2.

Mistake 3 - Morning Coffee Performance

The idea of background processing came from my observation that the old solution relied mostly on database performance and, even in the worst cases, wasn't as slow as the external call. At start, we suggested to the guys responsible for the client application to just time out the first (long) call and then try again later. But this wouldn't be enough when the whole system is under a heavy load. I took responsibility for measuring how long of a time frame there is for our application to respond. I found all database queries in the old solution and ran them one by one against our databases. I collected the results, made some calculations, and produced a nice table on the company's wiki.

What could go wrong there? Well, me running queries by hand while sipping the morning coffee3 obviously produced other performance results than all systems running under heavy load. This meant my calculations were overly optimistic and the first thing we saw after go-live was a wall of timeouts on the client application side.

Mistake 4 - Off by Three Orders of Magnitude

System load was only one of the reasons we overestimated our performance. The second one is even more embarrassing. All of the testing between our application and the client's was done on a limited data set—around 1000 records. The results were promising, no warning signs.

One little detail that we somehow omitted was that just before the go-live we were supposed to fill the database with data migrated from the old solution—a million records! Increasing the data set by three orders of magnitude revealed that we forgot to set up a crucial database index—that's something you'd much rather do before you go live with a critical application!

Mistake 5 - Metrics Ain't Statistics

Business people asked us to prepare a report about requests that we make to the external provider. Since we make a lot of the calls and these are expensive, we wanted to give the business a way to calculate expected costs and validate the invoice that we get from the provider. Of course, business guys ain't into SQL so they asked for an excel report sent via email. Oh, mama... us generating an excel file and sending emails? It's 2016. Look, we have this pretty board that we use for metrics. We'll just add a counter for requests there for you and you have what you need!

Since we had those initial problems mentioned above, we issued some hotfixes and released the application again. After redeployment, the dashboard with counters went (seemingly) mad. What happened to the number of requests? It turned out that metrics are kept on the application side and sent at regular intervals to the metric service. This means that each time we restart the application, the counter value is gone. Holy crap!

Luckily, we have a means of providing correct information about the requests without the nice counter on the dashboard, but we're still trying to implement a real counter to keep statistics for the business.

Summary

Just before the go-live, I told my colleagues that I'm worried because since development began up until the end of the testing phase, we'd had no major issues. Everything went so smoothly, no major impediments in our way. It turned out I was right, something wrong was bound to happen! While the mistakes I mention in this post may seem obvious now, they weren't so obvious when we were developing and testing things in a hurry. It's a bit embarrassing to write about it now, so I hope you at least had some fun and won't make the mistakes we did!

  1. Actually, it's "NEW_CUSTOMER", but I didn't want to confuse you with details.
  2. Some part of me wants to give up on this safety mechanism at all, but you know.. safety first!
  3. I don't actually drink coffee, just wanted to make it funnier.

Is iPaaS solving the right problems? Not knowing the fundamental difference between iPaaS and iPaaS+ could cost you down the road. Brought to you in partnership with Liaison Technologies.

Topics:
solution ,concurrency ,retrospective ,exception ,spring boot ,rest api ,web app ,mistakes

Published at DZone with permission of Grzegorz Ziemoński, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}