Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Bringing down servers, one character at a time

DZone's Guide to

Bringing down servers, one character at a time

· DevOps Zone
Free Resource

The DevOps Zone is brought to you in partnership with Sonatype Nexus. The Nexus Suite helps scale your DevOps delivery with continuous component intelligence integrated into development tools, including Eclipse, IntelliJ, Jenkins, Bamboo, SonarQube and more. Schedule a demo today

Last week, I mentioned a bug that I caused and a few people asked me details about it, so here is what happened.

The code in question is parsing a calendar and processing meetings. Before showing the buggy code, here is what a shortened version of the fix ended up looking like:

while (maxDays < 7 && (eventCount < 20) || attendeeCount < 40) {
    // process meetings
    // increment maxDays whenever we cross a day boundary
    // possibly increment eventCount or attendeeCount based on the meeting
}

The bug went unnoticed in production for a few days until suddenly, the servers started pinning all their CPU's and shortly thereafter, depleting the memory of each instance. A restart was the only remedy.

This code runs a loop through someone's calendar and stops until we've gathered enough information. The maxDays variable is what I call a "safety valve", a condition used to guarantees that the code doesn't run amok in case its normal processing flow never finishes. And yet, the crash seems to indicate that the code was doing exactly that.

With this introduction out of the way, here is now the line that caused the bug:

Actually... you've already seen it, it's the code I pasted above. Did you spot the problem?

That's right, a simple set of parentheses in the wrong place. Because they are surrounding the wrong expression, they completely defeat the purpose of the safety valve, which only triggers for people with empty calendars (explaining why it took some time to trigger that bug).

Two characters to bring down an entire server...

Can we learn anything about this bug, besides the fact that testing needs to cover edge cases? A static analysis tool can't be of much help here since we're completely in the realm of runtime behavior, but is there some way that an automated tool could have warned me about the flaw of this termination expression?

The DevOps Zone is brought to you in partnership with Sonatype Nexus. Use the Nexus Suite to automate your software supply chain and ensure you're using the highest quality open source components at every step of the development lifecycle. Get Nexus today

Topics:

Published at DZone with permission of Cedric Beust, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}