Data Layer, the Right Way
Join the DZone community and get the full member experience.
Join For Free
The goal with this post is to give you a better understand about how you can design your data layer and why it’s important to create a complete abstraction layer.
Whats wrong?
From the very beginning the applications has been data centric. That is, the database has been the very heart of the applications. Everything that was built was (and in some means still is) just a glorified administration interface for the database (especially CRUD applications). To make things worse, some still believe that stored procedures are essential and that parts (if not all) business rules should be stored in them.
To date most tables in the database are modeled according to 3NF (the third normal form). We get a clean database with very few repeated records.
The only thing that really has happened is that the database has been hidden behind an OR/M and we say that our applications is using classes modeled after the business (since the OR/M generate classes & mappings for us).
Building an application
Let’s start by examining the typical creation of an application (reminds me of a quote that I read a long time ago: The process of building application is like the making of sausages: It should never be watched).
Data Layers
Typical problems
Here are some typical problems which can surface when you are using an OR/M without proper abstractions.
- Have you ever struggled with a LINQ query which just doesn’t want to work?
- Poor performance due to lazy loading
- The generated SQL turns out to be slow
- Breaking changes in new version of the OR/M
- Complex and/or highly coupled models thanks to all relations in the db
What is an abstraction?
Wikipedia defines “abstraction” like this:
Abstraction is the process by which data and programs are defined with a representation similar in form to its meaning (semantics), while hiding away the implementation details. Abstraction tries to reduce and factor out details so that the programmer can focus on a few concepts at a time
So unlike what many believes, you do not create abstractions just to be able to switch OR/M or database engine. You create it to reduce complexity.
Before using an abstraction you have to remember all details for both layers when coding. The amount of information that you have to remember depends on the kind of tool/library you use in your data layer. For OR/Ms you have to remember how the data layer handles lazy loading, how to specifically eager load items and how each entity has been configured. For ADO.NET you have to remember the table structure and all data types. If you use your OR/M you have probably auto generated all entities. And by doing so you have forced yourself to use the data model as your business model. If you have used an OR/M for a while you have probably noticed that it’s not the best thing for anything but trivial applications.
So before using an abstraction you have to look at your code like this:
By using an abstraction you don’t have to care about how your OR/M or database engine works. All you need to know is that the data layer has a set of classes and methods which you invoke to fetch and store information.
Your OR/M is not an abstraction
If you use your OR/M as the data layer, your probably have a lot of code like:
public void Summarize(int userId) { // loading from db var carts = _dbSession.Query<ShoppingCart>().Where(x => x.UserId == userId); // some business logic var summary = 0d; foreach (var cart in carts) { summary += cart.Total; } return summary; }
It looks simple enough.
But how do we test it?
We have to make sure that it calls the DB with the correct set of parameters, right? Mocking LINQ providers is not fun, so that’s out of the question. We can create a fake memory DB by using SQLite or similar. But that doesn’t guarantee that everything works with the production db (SQL Server or whatever).
How about the mappings? To guarantee that everything works we have to do integration tests. That is to use real queries against a real DB.
So it might look like the method is only responsible of doing the business logic. But it’s in fact also responsible of reading the data in a correct way too. The OR/M is just a convenience layer which loads the information from the DB for us. It doesn’t abstract away the data access. Any change to the OR/M or the database WILL affect every piece of code that uses the OR/M and the database.
Our requirements will always change. Those changes will always affect your code. By using abstraction it’s easier to handle those (since you do it in one place at a time).
Using a correct abstraction
Let’s instead change the code to:
public void Summarize(int userId) { // loading from db var carts = _shoppingCartRepos.FindForUser(userId); // some business logic var summary = 0d; foreach (var cart in carts) { summary += cart.Total; } return summary; }
Now we are using an abstraction. The actual data access is hidden. We
do not have to know OR/M implementation details like mappings,
eager/lazy loading or OR/M specific LINQ statements. We just know that
we should call the FindForUser
method to get the data that we need.
We can now test the method easily. The only tests we need for the repositories is integration tests to make sure that all statements are working correctly.
Summary
Complete abstractions do take a longer time to write. But ask yourself. What will you spend most time on? Maintaining the application or writing it from scratch? Let’s say that the application got an expected lifetime of four years and that it takes one year to write it. Then you got three years in which you should maintain it. Do you think that those three years will be easier or harder with abtractions? It might be worth to spend some time extra in the beginning…
The right way
Hopefully you have now understood why we need to create some kind of abstraction from the database and that the OR/M isn’t it. But that is only part of the solution. Ask yourself: Why are you creating the application? Is it to get a nicely modeled database that your application has to follow? Or is it to solve the problem that the client has?
Brad tells us:
The database
The crazy thing is that we have let the database (as in RDBMS) dictate our applications way to long. And we have modeled our databases in the same way (using the normal forms) since 1971. That’s 42 years you lazy coders (I used a calculator =)). That’s INSANE!
It’s time to end the madness. Dare to say no. The database IS just storage.
If that’s not helping. Listen to Queen Elisabeth (no disrespect meant).
Is the database all bad then? No. But we have to mentally switch from letting it control our application to just be a utility in which we store our information. How can we do that?
Model the domain first
One thing has been mentioned before, and that is to model the domain /
business first and then the database. We create a set of classes like Person
, ChatMessage
etc in the first iteration. We then test them and demonstrate the application to the client.
After that we add the representations of those classes in the database. That is, we model the DB iterative too.
Denormalize
Normalization is not all good. Storage is cheap today. Think twice before you normalize if normalization would increase the complexity in your application.
If you for instance only store the shipping address in the DB (i.e.
you do not search or index it) there is really no need for a separate
table / relation. Store the address in a ShippingAddress
column of the Orders
table. (The user can even have a ShippingAdress
column in the User table, which you copy it from when the Order is placed).
SRP applies to the DB too
Your application may have different target audiences who do different things in the application. For instance accounting takes care of all invoices while shipping takes care of packaging and sending of the packages. And finally you have the users who places orders.
In a situation like that it would be tempting to create a Users
table which contains all fields for the accounting employees, the users
and the shipping employees. A more subtle approach would be to let the
user table represent the application users and then create a table AccountingUsers
with a FK to the Users
table.
Those approaches increases complexity and indirectly makes all persons in the system dependent of changes to the Users table.
It’s far better to create three different kind of person tables with no relation between them. You might even go so far as creating different databases. One for shipping, one for accounting and one for the core application. We do use code to make the different parts communicate with each other, right? No need for a single database.
Replace it
There are several schemaless (NoSQL) databases out there today. Many of them have matured enough to be used in production. If I can choose I would in most cases pick a NoSQL db. It lowers the friction between storage and code a lot.
Mix!
Nothing says that a single database solution (be RDBMS or NoSQL) are the right tool for the job. NoSQL might be a good fit for shipping while accounting should use RDBMS so that they can do their analytics.
Summary
Since the data source(s) are abstracted away we get a lot more freedom in our applications. We don’t have to try to make our applications fit the database but can instead fit database after our application.
Building applications
So if the database is not the thing that we should start with, what should we start with? How about something like this:
On Building Applications
You have hopefully got something to think about now. You don’t have to go about and change how you build your applications, but hopefully you think twice before choosing to go down a specific path.
The data access alternatives
When working with the database in the data layer we have in fact several approaches to choose from. The most basic approach is ADO.NET and the OR/Ms are by far the most complex solution. So why is it that the OR/Ms have gained so much momentum? It’s because they generate a lot of stuff for you. You are lazy. We all are. The thing is that what we gain in the beginning may be lost during the road. That’s why you need to know about the alternatives.
This blog post became a bit larger than I intended. I’m therefore moving the alternatives to seperate posts. They are however:
- Vanilla ADO.NET
- Data mappers
- OR/M
- NoSQL
Data layer patterns
There are also some patterns which can be used for the data layer (or involves the data layer). Those that I recommend are:
- Repository pattern
- Query object
- CQS (Command/query seperation)
- CQRS (Command/Query Segregation Principle)
Those will also be discussed in a separate post.
Published at DZone with permission of Jonas Gauffin, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Trending
-
What Is mTLS? How To Implement It With Istio
-
Building A Log Analytics Solution 10 Times More Cost-Effective Than Elasticsearch
-
How To Design Reliable IIoT Architecture
-
What Is Test Pyramid: Getting Started With Test Automation Pyramid
Comments