Solving the Detached Many-to-Many Problem with the Entity Framework
Join the DZone community and get the full member experience.
Join For FreeIntroduction
This article is part of the ongoing series I’ve been writing recently, but can be read as a standalone article. I’m going to do a better job of integrating the changes documented here into the ongoing solution I’ve been building.
However, considering how much time and effort I put into solving this issue, I’ve decided to document the approach independently in case it is of use to others in the interim.
The Problem Defined
This issue presents itself when you are dealing with disconnected/detached Entity Framework POCO objects,. as the DbContext doesn’t track changes to entities. Specifically, trouble occurs with entities participating in a many-to-many relationship, where the EF has hidden a “join table” from the model itself.
The problem with detached entities is that the data context has no way of knowing what changes have been made to an object graph, without fetching the data from the data store and doing an entity-by-entity comparison – and that assuming it’s possible to fetch the same way as it was originally.
In this solution, all the entities are detached, don’t use proxy types and are designed to move between WCF service boundaries.
Some Inspiration
There are no out-of-the-box solutions that I’m aware of which can process POCO object graphs that are detached.
- I did find an interesting solution called GraphDiff which is available from github and also as a NuGet package, but it didn’t work with the latest RC version of the Entity Framework (v6).
- I also found a very comprehensive article on how to implement a generic repository pattern with the Entity Framework, but it was unable to handle detached many-to-many relationships. In any case, I highly recommend a read of this article, it was inspiration for some of the approach I’ve ended up taking with my own design.
The Approach
This morning I put together a simple data model with the relationships that I wanted to support with detached entities. I’ve attached the solution with a sample schema and test data at the bottom of this article. If you prefer to open and play with it, be sue to add the Entity Framework (v6 RC) via NuGet, I’ve omitted it for file size and licensing reasons).
Here’s a logical view of the model I wanted to support:
Here’s the schema view from SQL Server:
Here’s the Entity Model which is generated from the above SQL schema:
In the spirit of punching myself in the head, I’ve elected to have one table implement an identity specification (meaning the underlying schema allocated PK ID values) whereas the other two tables the ID must be specified.
Theoretically, if I can handle the entity types in a generic fashion, then this solution can scale out to larger and more complex models.
The scenarios I’m specifically looking to solve in this solution with detached object graphs are as follows:
- Add a relationship (many-to-many)
- Add a relationship (FK-based)
- Update a related entity (many-to-many)
- Update a related entity (FK-based)
- Remove a relationship (many-to-many)
- Remove a relationship (FK-based)
Per the above, here’s the scenarios within the context of the above data model:
- Add a new Secondary entity to a Primary entity
- Add an Other entity to a Secondary entity
- Update a Secondary entity by updating a Primary entity
- Update an Other entity from a Secondary entity (or Primary entity)
- Remove (but not delete!) a Secondary entity from a Primary entity
- Remove (but not delete) a Other entity from a Secondary entity
Establishing Test Data
Just to give myself a baseline, the data model is populated (by default) with the following data. This gives us some “existing entities” to query and modify.
More Work for the Consumer
Although I tried my best, I couldn’t come to a design which didn’t require the consuming client to do slightly more work to enable this to work properly. Unfortunately the best place for change tracking to occur with disconnected entities is with the layer making changes – be it a business layer or something downstream.
To this effect, entities will need to implement a property which reflects the state of the entity (added, modified, deleted etc.). For the object graph to be updated/managed successfully, the consumer of the entities needs to set the entity state properly. This isn’t at all as bad as it sounds, but it’s not nothing.
Establishing some Scaffolding
After generating the data model, the first thing to be done is ensure each entity derives from the same base class. (“EntityBase”) this is used later to establish the active state of an entity when it needs to be processed. I’ve also created an enum (“ObjectState”) which is a property of the base class and a helper function which maps ObjectState to an EF EntityState. In case this isn’t clear, here’s a class view:
Constructing Data Access
To ensure that the usage is consistent, I’ve defined a single Data Access class, mainly to establish the pattern for handling detached object graphs. I can’t stress enough that this is not intended as a guide to an appropriate way to structure your data access – I’ll be updating my ongoing series of articles to go into more detail – this is only to articulate a design approach to handling detached object graphs.
Having said all that, here’s a look at my “DataAccessor” class, which can be used with generic data access entities (by way of generics):
As with my ongoing project, the Entity Framework DbContext is instantiated by this class on construction, and implements IDisposable to ensure the DbContext is disposed properly upon construction. Here’s the constructor showing the EF configuration options I’m using:
public DataAccessor() { _accessor = new SampleEntities(); _accessor.Configuration.LazyLoadingEnabled = false; _accessor.Configuration.ProxyCreationEnabled = false; }
Updating an Entity
We start with a basic scenario to ensure that the scaffolding has been implemented properly. The scenario is to query for a Primary entity and then change a property and update the entity in the data store.
[TestMethod] public void UpdateSingleEntity() { Primary existing = null; String existingValue = String.Empty; using (DataAccessor a = new DataAccessor()) { existing = a.DataContext.Primaries.Include("Secondaries").First(); Assert.IsNotNull(existing); existingValue = existing.Title; existing.Title = "Unit " + DateTime.Now.ToString("MMdd hh:mm:ss"); } using (DataAccessor b = new DataAccessor()) { existing.State = ObjectState.Modified; b.InsertOrUpdate<Primary>(existing); } using (DataAccessor c = new DataAccessor()) { existing.Title = existingValue; existing.State = ObjectState.Modified; c.InsertOrUpdate<Primary>(existing); } }
You’ll noticed that there is nothing particularly significant here, except that the object’s State is reset toModified between operations.
Updating a Many-to-Many Relationship
Now things get interesting. I’m going to query for a Primary entity, then I’ll update both a property of thePrimary entity itself, and a property of one of the entity’s relationships.
[TestMethod] public void UpdateManyToMany() { Primary existing = null; Secondary other = null; String existingValue = String.Empty; String existingOtherValue = String.Empty; using (DataAccessor a = new DataAccessor()) { //Note that we include the navigation property in the query existing = a.DataContext.Primaries.Include("Secondaries").First(); Assert.IsTrue(existing.Secondaries.Count() > 1, "Should be at least 1 linked item"); } //save the original description existingValue = existing.Description; //set a new dummy value (with a date/time so we can see it working) existing.Description = "Edit " + DateTime.Now.ToString("yyyyMMdd hh:mm:ss"); existing.State = ObjectState.Modified; other = existing.Secondaries.First(); //save the original value existingOtherValue = other.AlternateDescription; //set a new value other.AlternateDescription = "Edit " + DateTime.Now.ToString("yyyyMMdd hh:mm:ss"); other.State = ObjectState.Modified; //a new data access class (new DbContext) using (DataAccessor b = new DataAccessor()) { //single method to handle inserts and updates //set a breakpoint here to see the result in the DB b.InsertOrUpdate<Primary>(existing); } //return the values to the original ones existing.Description = existingValue; other.AlternateDescription = existingOtherValue; existing.State = ObjectState.Modified; other.State = ObjectState.Modified; using (DataAccessor c = new DataAccessor()) { //update the entities back to normal //set a breakpoint here to see the data before it reverts back c.InsertOrUpdate<Primary>(existing); } }
If we actually run this unit test and set the breakpoints accordingly, you’ll see the following in the database:
Database at Breakpoint #1 / Database at Breakpoint #2
Database when Unit Test completes
You’ll notice at the second breakpoint that the description of the first entities have both been updated.
Examining the Insert/Update Code
The function exposed by the “data access” class really just passes through to another private function which does the heavy lifting. This is mainly in case we need to reuse the logic, since it essentially processes state action on attached entities.
public void InsertOrUpdate<T>(params T[] entities) where T : EntityBase { ApplyStateChanges(entities); DataContext.SaveChanges(); }
Here’s the definition of the ApplyStateChanges function, which I’ll discuss below:
private void ApplyStateChanges<T>(params T[] items) where T : EntityBase { DbSet<T> dbSet = DataContext.Set<T>(); foreach (T item in items) { //loads related entities into the current context dbSet.Attach(item); if (item.State == ObjectState.Added || item.State == ObjectState.Modified) { dbSet.AddOrUpdate(item); } else if (item.State == ObjectState.Deleted) { dbSet.Remove(item); } foreach (DbEntityEntry<EntityBase> entry in DataContext.ChangeTracker.Entries<EntityBase>() .Where(c => c.Entity.State != ObjectState.Processed && c.Entity.State != ObjectState.Unchanged)) { var y = DataContext.Entry(entry.Entity); y.State = HelperFunctions.ConvertState(entry.Entity.State); entry.Entity.State = ObjectState.Processed; } } }
Notes on this Implementation
What this function does is to iterate through the items to be examined, attach them to the current Data Context (which also attaches their children), act on each item accordingly (add/update/remove) and then process new entities which have been added to the Data Context’s change tracker.
For each newly “discovered” entity (and ignoring entities which are unchanged or have already been examined), each entity’s DbEntityEntry is set according to the entity’s ObjectState (which is set by the calling client). Doing this allows the Entity Framework to understand what actions it needs to perform on the entities when SaveChanges() is invoked later.
You’ll also note that I set the entity’s state to “Processed” when it has been examined, so we don’t act on it more than once (for performance purposes).
Fun note: the AddOrUpdate extension method is something I found in theSystem.Data.Entity.Migrations namespace and it acts as an ‘Upsert’ operation, inserting or updating entities depending on whether they exist or not already. Bonus!
That’s it for adding and updating, believe it or not.
Corresponding Unit Test
The following unit test establishes the creation of a new many-to-many entity, it is then removed (by relationship) and then finally deleted altogether from the database:
[TestMethod] public void AddRemoveRelationship() { Primary existing = null; using (DataAccessor a = new DataAccessor()) { existing = a.DataContext.Primaries.Include("Secondaries") .FirstOrDefault(); Assert.IsNotNull(existing); } Secondary newEntity = new Secondary(); newEntity.State = ObjectState.Added; newEntity.AlternateTitle = "Unit"; newEntity.AlternateDescription = "Test"; newEntity.SecondaryId = 1000; existing.Secondaries.Add(newEntity); using (DataAccessor a = new DataAccessor()) { //breakpoint #1 here a.InsertOrUpdate<Primary>(existing); } newEntity.State = ObjectState.Unchanged; existing.State = ObjectState.Modified; using (DataAccessor b = new DataAccessor()) { //breakpoint #2 here b.RemoveEntities<Primary, Secondary>(existing, x => x.Secondaries, newEntity); } using (DataAccessor c = new DataAccessor()) { //breakpoint #3 here c.Delete<Secondary>(newEntity); } }
Test Results:
Pre-Test – Breakpoint #1 / Breakpoint #2
Breakpoint #3 / Post execution (new entity deleted)
SQL Profile Trace
Removing a Many-to-Many Relationship
Now this is where it gets tricky. I’d like to have something a little more polished, but the best I have come up with to date is a separate operation on the data provider which exposes functionality akin to “remove relationship”.
The fundamental problem with how the EF POCO entities work without any modifications, is when they are detached, to remove a many-to-many relationship, the relationship to be removed is physically removed from the collection.
When the object graph is sent back for processing, there’s a missing related entity, and the service or data context would have to make an assumption that the omission was on purpose, not to mention that it would have to compare against data currently in the data store.
To make this easier, I’ve implemented a function called “RemoveEnttiies” which alters the relationship between the parent and the child/children. The one bug catch is that you need to specify the navigation property or collection, which might make it slightly undesirable to implement generically. In any case, I’ve provided two options – with the navigation property as a string parameter or as a LINQ expression – they both do the same thing.
public void RemoveEntities<T, T2>(T parent, Expression<Func<T, object>> expression, params T2[] children) where T : EntityBase where T2 : EntityBase { DataContext.Set<T>().Attach(parent); ObjectContext obj = DataContext.ToObjectContext(); foreach (T2 child in children) { DataContext.Set<T2>().Attach(child); obj.ObjectStateManager.ChangeRelationshipState(parent, child, expression, EntityState.Deleted); } DataContext.SaveChanges(); }
Notes on this Implementation
The “ToObjectContext” is an extension method, and is akin to (DataContext as IObjectContextAdapter).ObjectContext. This is to expose a more fundamental part of the Entity Framework’s object model. We need this level of access to get to the functionality which controls relationships.
For each child to be removed (note: not deleted from the physical database), we nominate the parent object, the child, the navigation property (collection) and the nature of the relationship change (delete).
Note that this will NOT WORK for Foreign Key defined relationships – more on that below.
To delete entities which have active relationships, you’ll need to drop the relationship before attempting to delete or else you’ll have data integrity/referential integrity errors, unless you have accounted for cascading deletion (which I haven’t).
Example execution:
using (DataAccessor c = new DataAccessor()) { //c.RemoveEntities<Primary, Secondary>(existing, "Secondaries", s); //(or can use an expression): c.RemoveEntities<Primary, Secondary>(existing, x => x.Secondaries, s); }
Removing FK Relationships
As mentioned above, you can’t just edit the relationship to remove an FK-based relationship. Instead, you have to follow the EF practice of setting the FK entity to NULL. Here’s a Unit Test which demonstrates how this is achieved:
Secondary s = ExistingEntity(); using (DataAccessor c = new DataAccessor()) { s.Other = null; s.OtherId = null; s.State = ObjectState.Modified; o.State = ObjectState.Unchanged; c.InsertOrUpdate<Secondary>(s); }
We use the same “Insert or Update’ call – being aware that you have to set the ObjectState properties accordingly.
Note: I’m in the process of testing the reverse removal – i.e. what happens if you want to remove a Secondaryentity from an Other entity’s collection.
Deleting Entities
This is fairly straightforward, but I’ve taken a few more precautions to ensure that the entity to be deleted is valid no the server side.
public void Delete<T>(params T[] entities) where T : EntityBase { foreach (T entity in entities) { T attachedEntity = Exists<T>(entity); if (attachedEntity != null) { var attachedEntry = DataContext.Entry(attachedEntity); attachedEntry.State = EntityState.Deleted; } } DataContext.SaveChanges(); }
To understand the above, you should take a look at the implementation of the “Exists” function which essentially checks the data store and local cache to see if there is an attached representation:
protected T Exists<T>(T entity) where T : EntityBase { var objContext = ((IObjectContextAdapter)this.DataContext) .ObjectContext; var objSet = objContext.CreateObjectSet<T>(); var entityKey = objContext.CreateEntityKey(objSet.EntitySet.Name, entity); DbSet<T> set = DataContext.Set<T>(); var keys = (from x in entityKey.EntityKeyValues select x.Value).ToArray(); //Remember, there can by surrogate keys, so don't assume there's //just one column/one value //If a surrogate key isn't ordered properly, the Set<T>().Find() //method will fail, use attributes on the entity to determine the //proper order. //context.Configuration.AutoDetectChangesEnabled = false; return set.Find(keys); }
This is a fairly expensive operation which is why it’s pretty much reserved for deletes and not more frequent operations. It essentially determines the target entity’s primary key and then checks whether the entity exists or not.
Note: I haven’t tested this on entities with surrogate keys, but I’ll get to it at some point. If you have surrogate key tables, you can define the PK key order using attributes on the model entity, but I haven’t done this (yet).
Summary
This article is the culmination of about two days of heavy analysis and investigation. I’ve got a whole lot more to contribute on this topic, but for now, I felt it was worthy enough to post as-is. What you’ve got here is still incredibly rough, and I haven’t done nearly enough testing.
To be honest, I was quite excited by the initial results, which is why I decided to write this post. there’s an incredibly good chance that I’ve missed something in the design and implementation, so please be aware of that. I’ll be continuing to refine this approach in my main series of articles with much cleaner implementation.
In the meantime though, if any of this helps anyone out there struggling with detached entities, I hope it helps. There’s precious few articles and samples that are up to date, and very few that seem to work. This is provided without any warranty of any kind!
If you find any issues please e-mail me rob.sanders@sanderstechnology.com and I’ll attempt to refactor/debug and find ways around some of the inherent limitations. In the meantime, there are a few helpful links I’ve come across in my travels on the WWW. See below.
Example Solution Files [ Files ]
Note: you’ll need to add the Entity Framework v6 RC package via NuGet, I haven’t included it in the archive.
Helpful Links
- http://blog.magnusmontin.net/2013/05/30/generic-dal-using-entity-framework/
- https://github.com/refactorthis/GraphDiff
- http://stackoverflow.com/questions/11686225/dbset-find-method-ridiculously-slow-compared-to-singleordefault-on-id
- http://stackoverflow.com/questions/10381106/cannot-update-many-to-many-relationships-in-entity-framework
- http://stackoverflow.com/questions/8413248/how-to-save-an-updated-many-to-many-collection-on-detached-entity-framework-4-1
- http://stackoverflow.com/questions/6018711/generic-way-to-check-if-entity-exists-in-entity-framework
Published at DZone with permission of Rob Sanders, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments