Over a million developers have joined DZone.

Hidden Precision in Big Data

DZone 's Guide to

Hidden Precision in Big Data

Martin Fowler walks us through the dangers of hidden precision, which exist despite the common misconception that more precision is better.

· Big Data Zone ·
Free Resource

Sometimes, when I work with some data, that data is more precise than I expect. One might think that would be a good thing — after all, precision is good, so more is better. But hidden precision can lead to some subtle bugs.

const validityStart = new Date("2016-10-01");   // JavaScript
const validityEnd = new Date("2016-11-08");
const isWithinValidity = aDate => (aDate >= validityStart && aDate <= validityEnd);
const applicationTime = new Date("2016-11-08 08:00");

assert.notOk(isWithinValidity(applicationTime));  // NOT what I want

In the above code, I intended to create an inclusive date range by specifying the start and end dates. However, I didn't actually specify dates, but instances in time. So I'm not marking the end date as November 8th — I'm marking the end as the time 00:00 on November 8. As a consequence, any time (other than midnight) within November 8 falls outside the date range that's intended to include it.

Hidden precision is a common problem with dates because it's (sadly) common to have a date creation function that actually provides an instant like this. It's an example of poor naming, and indeed general poor modeling of dates and times.

Dates are a good example of the problems of hidden precision, but another culprit is floating point numbers.

const tenCharges = [
  0.10, 0.10, 0.10, 0.10, 0.10,
  0.10, 0.10, 0.10, 0.10, 0.10,
const discountThreshold = 1.00;
const totalCharge = tenCharges.reduce((acc, each) => acc += each);
assert.ok(totalCharge < discountThreshold);   // NOT what I want

When I just ran it, a log statement showed that totalCharge was 0.9999999999999999. This is because floating point doesn't exactly represent many values, leading to a little invisible precision that can show up at awkward times.

One conclusion from this is that you should be extremely wary of representing money with a floating point number. (If you have a fractional currency part like cents, then usually it's best to use integers on the fractional value, representing €5.00 with 500, preferably within a money type.) The more general conclusion is that floating point is tricky when it comes to comparisons (which is why test framework asserts always have a precision for comparisons).

big data ,bliki ,precision ,data analytics

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}