Entity Framework - Second Level (2L) Cache - I
Entity Framework - Second Level (2L) Cache - I
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
Entity Framework does not support out of the box a second level cache. For those which are not familiar with this concept, the definition for it is quite simple. It is called a second level cache because there is a caching mechanism already in place at context level and operates at instance level. Another unfortunate aspect of Entity Framework is that it has very few (few, few, few) extension points compared with NHibernate. One of the most important extension points is the provider. So in order to implement a 2L cache mechanism in Entity Framework you should write a custom provider for it. Thanks to Jarek Kowalski this part is done already so you can check his EF Caching Provider project. Julie Lerman has extend Entity Framework cache provider and made it work with Windows AppFabric Server, here you can check her work. Ok, you might say that there is nothing new under the sky until now. Let me give you a hint...there might be something. EF Caching Provider is limited at query caching and nothing more. What does this means? It means that based on Entity Framework generated query string it creates a key and for that key will cache the query result. There are big limitations related to query caching, the most important one is lack of granularity.
Allow me to be more explicit about it by using a concrete example. Let us assume that we have a DbContext which holds a set of Customer, Order, Product and Image entities. Each product has multiple images (the idea of an image entity might not be the best choice but let's stick with it for now). This is basically a relation of one too many between product entities and images. A customer has multiple orders and an order has multiple products. Now when you will execute various queries using different filtering parameters in order to retrieve a list of products and the images associated with, that raw query result will be cached through EF Caching Provider.
All clear until now, in this scenario you will have multiple query results cached, but here comes the interesting part. What should happen with the cached query result when an update is performed on one of the associated images (just assume that the image description has to be changed)? What about when a new image is added to the context and inserted into the database? What about when a product is added to the context and inserted into the database or when it is updated? When using query caching you have two options: either invalidate the entire cache, either invalidate only the cache related to affected entities (assuming you have queries which retrieved also other entities instances from your model: customers, orders, etc). First option can be excluded immediately, what is the point of having a cache if at each insert/update it would be invalidated. We are talking about a cache with a global scope (your entire application), so if one of the X users performs an insert or update the cache is gone. For the second options Julie Lerman cache adapter relies on the native support of Windows AppFabric Server for regions. Basically for each entity affected by query a region is created if it doesn't exists and at that region level the corresponding data is stored. This means that for the previously mentioned products query two regions will be created one for Product entity and one for Image entity where Product region will hold product related data and image region will hold image related data. The use of this pattern presents a nice advantage. Let's assume that currently on the AppFabric server we have three regions: Customer, Product, Image. At a certain point a product will be updated on the database, this will determine to have Product region invalidated with all contained product data from cache while other cache regions will not be affected. But wait...what if there is a relation between a customer which resides on Customer region from cache and the product that just got updated? Is the customer cached data still valid? Well, I think it is not. The Customer region has to be invalidated also and if we do that we are going to lose a big chunk of cache for nothing. This is just a simple example of why caching raw query data is far from being an optimal solution. Until now I have spoke only about AppFabric, but what if you would like to use something else for your cache like Memcache for example which does support the concept of regions.
This sort of issues I am trying to tackle this week and I hope that by the end of I will have a project ready which will provide a proof of concept about how previously mentioned issues can be solved.
Opinions expressed by DZone contributors are their own.