Over a million developers have joined DZone.

A "Principled", Blameful Post-Mortem

DZone user Joe Wolf dissects a production bug he went through and finds there's a lot of blame to go around in the post-mortem.

· DevOps Zone

Discover how to optimize your DevOps workflows with our cloud-based automated testing infrastructure, brought to you in partnership with Sauce Labs

(I also considered calling this post "Why I Hate Dates: Reason #761.")

I encountered a production bug a few months back. An error was occuring during a system-to-system transaction executed through an HTTP POST. The server responded with a 200 OK, but the client failed to parse the XML payload, resulting in a failed transaction. The failure happened only on occassion, however, and there was no discernable pattern for when they occurred.

The server-side implemented the HTTP endpoint through a Spring Surf Webscript (which immediately tells you that this is an Alfresco Share customization — nothing else in the known universe is built with Spring Surf). Surf Webscripts use Freemarker for rendering the view. The template for formatting the sometimes problematic XML payload looked innocuous:

 <document><#setting number_format="0"><#setting time_zone="GMT">
    <objectId>${doc.objectId!}</objectId>
    <sequenceId>${doc.sequenceId!-1}</sequenceId>
    <filingDate>${doc.filingDate?datetime?iso_ms("GMT")}</filingDate>
</document>

Although I had skimmed through the Freemarker documentation while writing the template, I failed to notice an important paragraph pertaining to date formatting:

Note that when using ms, the milliseconds are displayed as fraction seconds (following the standard) and will not have trailing 0-s. Thus, if the the millisecond part happens to be 0, the whole fraction second part will be omitted.

I assumed that datetime?iso_ms("GMT") would render the milliseconds value of the datetime using exactly three digits. Instead, as clearly stated in the manual, Freemarker truncates trailing 0s, causing timestamps such as 12:34:56.700 to be rendered as 12:34:56.7. That's fine for HTML output meant for human eyeballs, but not-so-fine for a finicky client parser that, as I did, assumed all dates would be of the same, fixed length.

As a consequence of the truncation, when the HTTP POST happened at a time when the millisecond value was divisible by 10, the date string would contain fewer characters, the client's parser would consequently choke, and the transaction would fail.

The Principles

After the fix was checked in, I performed an individual, blameful post-mortem (i.e. I couldn't stop thinking about this bug). I reckoned that the bug could be considered the consequence of violating of number of software development principles. In no particular order...

Read The Freemarker(?) Manual

RTFM suggests that incorrect expectations about a component's behavior are due to a failure to check its manual beforehand. Indeed, the Freemark documentation clearly explained the truncation behavior of the "iso_ms" dateformat milliseconds. Further, if I were truly educated on the ISO 8601:2004 standard, I would have known that it does not specify a convention for formatting fractional seconds. Therefore, any time I use a library offering an "ISO" formatting option, I must take note on how that library handles fractional seconds.

The Principle of Least Astonishment

Also referred to as the Principle of Least Surprise, POLA advocates that "a component of a system should behave in a manner consistent with how users of that component are likely to expect it to behave." Maybe I've lead a sheltered development career, but anytime I've seen formatted dates with millisecond precision, it's always been with exactly three digits. I thought that, by convention, time elements more precise than hours were never truncated.

As a quick sanity check, I looked at the JavaDocs for Java 8's DateTimeFormatter.ISO_INSTANT; by my read, it uses 3, 6, or 9 digits to represent the fractional seconds, depending on the precision (milli, micro, or nano). Vindication! So, I argue that not truncating is the least astonishing thing to do.

Keep It Simple, Stupid

The KISS principle contends that "most systems work best if they are kept simple rather than made complicated." Which is simpler? Formatting milliseconds with three fixed digits or a varying number of digits depending on trailing zeros? Which approach is simpler to document? Which is simpler to implement? To get some insight into the latter question, I looked at the Freemarker code that truncates the millisecond values (it's part of a fairly large method):

     if (accuracy >= ACCURACY_MILLISECONDS) {
          x = cal.get(Calendar.MILLISECOND);
          int forcedDigits = accuracy == ACCURACY_MILLISECONDS_FORCED ? 3 : 0;
          if (x != 0 || forcedDigits != 0) {
               if (x > 999) {
                    // Shouldn't ever happen...
                    throw new RuntimeException(
                              "Calendar.MILLISECOND > 999");
               }
               res[dstIdx++] = '.';
               do {
                    res[dstIdx++] = (char) ('0' + (x / 100));
                    forcedDigits--;
                    x = x % 100 * 10;
               } while (x != 0 || forcedDigits > 0);
          }
     }

I'm impressed by its terseness, but much less so by its readability. In fact, I had to extract the code into a separate method in order to test and verify its actual behavior. Here's my quick attempt at a fixed length formatting implementation, without any refactoring:

     if (accuracy >= ACCURACY_MILLISECONDS) {
          x = cal.get(Calendar.MILLISECOND);
          int forcedDigits = accuracy == ACCURACY_MILLISECONDS_FORCED ? 3 : 0;
          if (forcedDigits != 0) {
               if (x > 999) {
                    // Shouldn't ever happen...
                    throw new RuntimeException(
                              "Calendar.MILLISECOND > 999");
               }
               res[dstIdx++] = '.';
               for (int i = forcedDigits - 1; i >= 0; --i) {
                    int power = Math.pow(10, i);
                    int place = x / power;
                    res[dstIdx++] = (char) ('0' + place);
                    x = x - (place * power);
               }
          }
     }

OK, so the readability isn't any better, but to this implementation's credit, microsecond and nanosecond accuracy could be easily accommodated by setting forcedDigits to 6 or 9, respectively (and removing the odd x > 999 check). The original implementation would need to be rewritten or, worse, copy-pasted. All things considered, I feel the fixed-size formatting is the simpler approach.

The Robustness Principle

The Robustness principle is best stated as "be conservative in what you send, be liberal in what you accept." Given the system's use of ISO 8601-formatted dates, the client should have, in principle, been more lenient when parsing fractional seconds since the specification does not perscribe a particular approach nor did we establish an approach for our system. A lenient client may have been able to recover from a parsing failure and attempted a fallback strategy (e.g. ignore anything after the seconds field) to prevent failing the entire transaction.

Pick The Right Tool For The Job

Picking the right tool for the job is the antidote for the The Law of the Instrument, which causes one with a hammer to treat everything as a nail. In this case, Freemarker was the Spring Surf creators' hammer, a single general-purpose technology for rendering Webscript output in HTML, XML, JSON, or whatever format you want. But structured formats like XML require a certain strictness that is not suited for a glorified String interpolator (and might I add, I do not believe XSLT is well-suited for generating HTML pages of a web application).

It would have been more appropriate to allow Webscript authors the choice of "The Right Tool for the Job" pertinent to the desired output format. Had I the option to select a legitimate XML marshalling technology, such as JAXP, I would have ended up relying on XMLGregorianCalendar, which uses three digits for milliseconds, and never had encountered the bug.

Conclusion

My blameful post-mortem found fault with myself, the client-side development team, the Freemarker developers, and the Spring Surf designers. While we're at it, I'd also would like to lay a little blame on the ISO 8601 specification authors for not being prescriptive about a fractional seconds format.

Truthfully, though, software development is just as much about assembling code you didn't write as it is writing new code from scratch. Even if the architecture constrains you to assemble from some less-than-desirable technologies created by others who may not consistently follow software development principles, a responsible developer must correctly understand and use them--and follow sound software development princples his or herself.

Thus, the only party truly responsible for this bug's occurrence is... the QA team. Man, they really dropped the ball on this one!

Download “The DevOps Journey - From Waterfall to Continuous Delivery” to learn learn about the importance of integrating automated testing into the DevOps workflow, brought to you in partnership with Sauce Labs.

Topics:
date formatting ,principles ,bug fix ,freemarker ,java ,xml

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}