RavenDB 4.0 Unsung Heroes: Indexing Related Data
Even when following proper modeling procedures, there are still cases when you want to search a document by its relationship. Learn how to handle this in RavenDB.
Join the DZone community and get the full member experience.Join For Free
RavenDB is a nonrelational database, which means that you typically don't model documents as having strong relations. A core design principle for modeling documents is that they should be independent, isolated, and coherent — or more specifically:
- Independent, meaning a document should have its own separate existence from any other documents.
- Isolated, meaning a document can change independently from other documents.
- Coherent, meaning a document should be legible on its own, without referencing other documents.
That said, even when following proper modeling procedures, there are still cases when you want to search a document by its relationship. For example, you might want to search for all the employees whose manage name is John, and you don't care if they're John Doe or John Smith.
RavenDB allows you to handle this scenario by using
LoadDocument during the index phase. That creates a relationship between the two documents and ensures that whenever the referenced document is updated, the referencing documents will be reindexed to catch up to the new details. It is quite an elegant feature if I do say so myself, and I'm really proud of it.
It is also the source of much abuse in the wild. If you don't model properly, it is often easy to paper over that using
LoadDocument in the indexes.
The problem is that in RavenDB 3.x, an update to a document that was referenced using
LoadDocument was also required to touch all of the referencing documents. This slowed down writes, which is something that we generally try really hard to avoid and could also cause availability issues if there were enough referencing documents (as in, all of them, which happened more frequently than you might think).
With RavenDB 4.0, we knew that we had to do better. We did this by completely changing how we are handling
LoadDocument tracking. Instead of having to reindex all the relevant values globally, we are now tracking them on a per-index basis. In each index, we track the relevant references on a per-collection basis, and as part of the indexes, we'll check if there have been any updates to any of the documents that we have referenced. If we do have a document that has a lot of referencing documents, it will still take some time to reindex all of them, but that cost is now limited to just the index in question.
You can still create an index and slow it down in this manner, but the pay to play model is much nicer and there is no effect on the write speed for documents and no general impact on the whole database, which is pretty sweet. The only way you would ever run into this feature is if you run into this problem in 3.x and try to avoid it, which is now not necessary for the same reason (although the same modeling concerns apply).
Published at DZone with permission of Oren Eini, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.