What to Consider When Dealing With Microservices Data
When working with microservices, it helps to consider what it means to be a service. This look at how microservices and their data are set up can help make things clear.
Join the DZone community and get the full member experience.Join For Free
Useful applications collect, munge, and present data to its users. Data becomes the lifeblood of an application, so to speak. As developers of an application with a single database, we are afforded many helpful abstractions: atomic transactions, tunable concurrency, indexes, materialized views, and more. This “single database view” of the world simplifies things for our application developers. As soon as we add more databases to our application (single application with multiple backends/databases), we have to deal with data challenges within the application.
For example, if our application’s main database is a MySQL database with all of the transactional workloads going through it, we may decide that for a particularly sensitive area of our application, we want to use something like Oracle, which may have better support for encryption of data at rest. Now, our application will have to make multiple data calls (two different databases), process queries across both databases and the joining of data inside our application code, and also figure out how best to handle atomicity challenges on updates (i.e. distributed transactions, self-managed eventual consistency, triggers, or nontransactional datastores).
Now, let’s imagine that we want to move to a microservices architecture. I’m sure you’ve heard the claim that each microservice should have its own database or datastore. What happens to our data?
As we start to break functionality into separate services, we’ll quickly confront these challenges. There are two main things to understand here. First, as Pat Helland reminds us, data on the inside of our service must be treated differently than data outside our service.Data inside of a service can still take advantage of the conveniences and abstractions afforded to us by the database that we decide to use (atomicity, query planning, concurrent controls, etc.). When services communicate with each other and send data outside a service boundary, we’re inherently dealing with stale data. Said a different way, as soon as data leaves a service, there is no guarantee it’s the most recent version of that data.
The second thing to understand: Since the data on the outside of our services cannot come with recency guarantees (it’s stale), there is a component of time to this equation. Microservices involved in this system will“eventually” see the updates of other services and must factor this into their application design. Some would describe this as an “eventually consistent” system.
How can we design around these two factors? To wit, are there design principles, patterns, and practices that take data on the inside/outside and time into account when building a system?
Domain-driven design fits this mindset quite well, and forces us to think more about how the business operates, the language it uses to describe their complexity, the natural transactional boundaries that the business sees, and how best to model this in software. This encourages us to think more closely about the natural transaction boundaries (non-technology speaking) that exist in the domain, draw a boundary around those (i.e. the bounded context), and map the interactions between these boundaries (i.e. context mapping). For example, if we take a naive, purely technological approach to a solution,without regard for the business or domain, we may end up building services around “User,” “Account,” “Order,” etc., each with its own database. Now, any time we need to refer to a User, Account, or Order, we need to consult with these respective services and these “services” end up being very hollow or anemic CRUD services doing little more than data access. Is that what a service is? What if we spent a little more effort to understand how the business thinks of these concepts? What is a User? What is an Order? What is a “thing?”
In my talks, I like to illustrate this complexity with a very simple example (borrowed from William Kent) by asking a question: What is a “book?” How would we describe what a “book” is for a fictitious online bookstore? A book has pages, a cover, and an author. I’ve written a book. So, would there be one entry in the system for the book I wrote? I have about 20 or so copies of that book next to my desk, and infinite copies as e-books online. Is each one of those a “book?” Is the e-book not a book until someone downloads it? Some books are so big they have to get broken down into smaller volumes. Is the whole thing a book? Or just the individual volumes? Which is it? In our online book store, what a book is depends on the domain. A book may be represented and treated differently in the order/checkout part of the system than in the recommendation engine. For ordering/checkout, we do care about each individual physical/electronic book. For the recommendation engine, we may just care about metadata, how it relates to other metadata, and its possible relevance. So maybe the services we have are the ordering/checkout service, catalog search, and recommendations. Each service will have an understanding of “book” that makes sense for it to provide a service.
Identifying these nuances in the domain and drawing boundaries around them allows us to focus on the inside versus the outside of the data. If we make changes to a book, or an order, or an account within the bounded context, we expect that to be aligned to a transactional boundary and be strictly consistent. When we make a change to the Order, it is consistent with any read/writes afterward. But, as we see with the book example, these concepts may be shared across multiple services, though their representation may be slightly (or dramatically) different. But how do we communicate changes about this data that might be similar?
DDD theory isn’t very opinionated about how the data is shared. The discussion in the DDD community revolves around interaction relationships like “customer/supplier,” “conformist,” “anti-corruption,” etc. Even so, a lot of practical implementations of these ideas end up going down the route of an event-driven architecture, raising events when interesting things happen within a bounded context and letting other bounded contexts interpret that event. An “event” here is announcing a fact that something happened (in the past — note the relationship to time and inherent staleness) in which other parts of the system may also be interested. For example, in our Checkout bounded context, if we successfully process an Order, we can store that within our own transactional boundary and then raise an event named “CheckoutPurchaseCompleted” with a reference to the book ID that was purchased. Other systems interested in this fact, maybe the Search service, can capture that event and make some decisions based on it; maybe it decreases its locally stored count of a particular book’s inventory and uses this as a factor of whether to display in search results. This way, the Search service doesn’t have to continuously call an Inventory or Book Availability service every time it has a search result that includes a particular book.
By taking an event-driven approach combined with DDD, we make Pat Helland’s “data on the inside vs. data on the outside” a core part of the design — which encourages us to think more closely about the “time” aspects of distributed systems. If we can comfortably live in this environment, we can achieve the holy grail of autonomous microservices, which then allows us to make changes quicker and more independently from the rest of the system.
Published at DZone with permission of Christian Posta, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.