So, our current version can index 18,000 documents in 22 ms vs. the JSON approach which takes 1.8 seconds. That is an amazing improvement, but I got to tell you, I got a good feeling about it.
Without running a profiler, I'm guessing that a major part of the 22 ms cost is going to be comparisons during a binary search process. We store the property names as a sorted array, and when you need to get a property by name, we need to do a binary search in the properties name to match it.
Our test case gets 3 properties from a total of 18,801 documents. We do this 30 times, to reduce the chance of jitter getting in the way. Here are the profiler results:
As you can see, the cost is indeed where I expected it to be. Basically, the ReadNumber and GetComparerFor are the expressions of the number of binary searches.
But, here we can take advantage of the fact that we know that most of those documents are going to have the same structure. In other words, once we have found the position of a particular property, there is a strong chance that the next time we look at a property with the same name, we'll find it in the same position. Caching that and using this as the initial position for doing the binary search means that most of the time, we hit the right property right on the nose, and in the worst case, we have a single extra comparison to make.
This single change gives us a really nice impact:
That is about 40% improvement in speed from this single change.
We have been following similar patterns throughout the development of this feature. Write the feature, make sure it passes all its tests, then profiler it. Usually that indicates a problem that we need to solve, either by optimizing code or changing the way we do things in the code... sometimes radically. But, that is a topic for the next blog post.