Over a million developers have joined DZone.

The Importance of a Data Format Part 6 — When Two Orders of Magnitude Aren't Enough

So, check out these result. Our current version can index 18,000 documents in 22 ms vs. the JSON approach which takes 1.8 seconds. Read on to learn more.

· Database Zone

Learn NoSQL for free with hands-on sample code, example queries, tutorials, and more.  Brought to you in partnership with Couchbase.

So, our current version can index 18,000 documents in 22 ms vs. the JSON approach which takes 1.8 seconds. That is an amazing improvement, but I got to tell you, I got a good feeling about it.

Without running a profiler, I'm guessing that a major part of the 22 ms cost is going to be comparisons during a binary search process. We store the property names as a sorted array, and when you need to get a property by name, we need to do a binary search in the properties name to match it.

Our test case gets 3 properties from a total of 18,801 documents. We do this 30 times, to reduce the chance of jitter getting in the way. Here are the profiler results:


As you can see, the cost is indeed where I expected it to be. Basically, the ReadNumber and GetComparerFor are the expressions of the number of binary searches.

But, here we can take advantage of the fact that we know that most of those documents are going to have the same structure. In other words, once we have found the position of a particular property, there is a strong chance that the next time we look at a property with the same name, we'll find it in the same position. Caching that and using this as the initial position for doing the binary search means that most of the time, we hit the right property right on the nose, and in the worst case, we have a single extra comparison to make.

This single change gives us a really nice impact:


That is about 40% improvement in speed from this single change.

We have been following similar patterns throughout the development of this feature. Write the feature, make sure it passes all its tests, then profiler it. Usually that indicates a problem that we need to solve, either by optimizing code or changing the way we do things in the code... sometimes radically. But, that is a topic for the next blog post.

The Getting Started with NoSQL Guide will get you hands-on with NoSQL in minutes with no coding needed. Brought to you in partnership with Couchbase.

data format ,data access object ,database performance

Published at DZone with permission of Ayende Rahien, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}