The Question of Multiple Databases and Pre-Production Complexity
All teams must accept this fact: Each environment is its own database, and each database change must be synchronized with their other environments.
Join the DZone community and get the full member experience.Join For Free
You may have heard the familiar story where a team ships a performance bug in production. Then, during a retrospective, the team decides (for good reason) that this kind of bug won't make it to production again.
So, now they use a dedicated performance testing environment to catch bugs early. Conducting these performance tests reduces the number of performance bugs but as a result, the entire setup becomes more costly, difficult to manage, and time-consuming.
Tensions will always exist between the conflicting desires to test thoroughly, maintain speed, and avoid excessive complexity. Agile and DevOps methodologies bring their own values and perspectives to the process, as well. It becomes exceedingly difficult to satisfy all stakeholders with so many competing goals and ideas in the mix.
So, is dealing with multiple environments in general and multiple databases in particular worth the trouble?
In this post, we explore some of the factors that need to be weighed in order to strike a balance between these many demands while also keeping your team — and environments — happy and productive.
DevOps Is Great, But Only If You Can Manage It
Happy and productive teams keep the focus on shipping code and maintaining reliable production environments. DevOps teams achieve this through continuous delivery (CD), which requires automated testing throughout each stage in the promotion process all the way through to automated deployment.
The dev environment is a playground for all of the engineers to tinker with and test changes. Quality engineers do functional testing on QA before promoting code to perf. Next, they move on to staging, and then finally to prod. This sounds like a common workflow, but it creates complexity.
Automating everything across these environments is a real challenge. The truth is, you may not need all of them. Here are some questions to consider when evaluating how many environments you need:
- Whose responsibility is it to promote code across environments?
- Who maintains their infrastructure?
- How are databases kept in-sync across environments?
- How are testing environments populated with realistic data?
- How often are databases built from scratch?
- How is deployment automated across these environments?
- How do the environments fit the branching model?
- Can some environments be eliminated without impacting production?
- How flexible are you about all of the above?
Naturally, answers will vary from team to team, as they should. The key thing to remember with DevOps is to focus on making continuous, small improvements. Do not get lost trying to create the perfect solution.
Long Pipelines Don't Mean Big Changes
The lowest-hanging fruit is simply shipping smaller changes, more often. This approach removes much of the annoyance around managing multiple environments and all other forms of process machinery.
Working in small batches means breaking up changes into smaller and smaller chunks, which are easier to develop, test, and deploy.
Consider this scenario: Would you rather test a large database schema migration along with new product features, or a large database schema migration that does not impact current functionality?
The second option is clearly the better choice. It's easier on engineers, product owners, QA, and everyone else involved in shipping the software.
Working in small batches is a powerful balancing force against long release cycles and unreliable deployments. This CD tenet reduces risk and increases production reliability when it is carried out correctly. However, it is only beneficial when used with automation.
Small batches work particularly well with distributed applications. These applications inevitably create more moving parts, databases, and independently deployable components.
As you can imagine, keeping all of these parts under control is extremely difficult. Smaller batches can help combat the complexity that comes along with increasingly complex architectures.
Distributed Applications Bring Distributed Problems
Distributed applications are composed of multiple services. Each service might have multiple databases, and different services across the same application may not use the same database technology. One service may use an RDMS, while another might use a NoSQL model.
This complexity spills over into other areas. Hopefully, each engineer has their own sandbox that can run versions X, Y, and Z of services A, B, and C. This setup provides each engineer with extreme flexibility, but it comes with trade-offs.
The "laptop build" is one of the more problematic trade-offs. Engineers might produce a working build on their laptop that does not behave correctly when it is deployed (for one reason or another). Common reasons include differences in database setups, subtle differences in versions, different deployment methods, variations in test data, and environmental complexity.
Working with multiple databases is the most problematic issue since a release may need to coordinate between databases' changes across a number of services. Distributed applications open up a whole new can of worms.
There are a few questions you should consider when it comes to distributed applications:
- How can engineers run the entire product on their machines?
- What do we consider "releases" to be (i.e. are they deployments of individual services or the sum of all configurations between services)?
- How do we roll back a deployment?
- Who owns and operates individual services?
- How does the existing approach and infrastructure scale for introducing new services?
- What happens when we (if we are even able to) deploy multiple services at once?
- How does QA get production-like data?
These questions are not going away. They are, in fact, becoming more prevalent as more organizations move to microservices and other distributed architectures. Your team can succeed in this landscape with purpose-built tooling.
Tipping the Scales With Insight Tooling
Specialized automation and insight tooling tips the scale in your favor. Tooling helps everyone in the team quickly identify the state of all relevant databases and determine whether they match the relevant application. This insight keeps deployments moving since potential problems are identified early.
Such tooling is especially helpful for detecting differences between staging and prod before they impact your uptime. Tooling also clearly communicates how the deployment affects different databases.
This practice makes it easier to review changes and transition them through the pipeline. Insight tooling also helps achieve the DevOps goal of making more changes, more often. Use of these tools creates more frequent and more reliable deployments by providing engineers insights and automation tools so things always go as expected.
Managing Multiple Databases: A Matter of Art and Science
The balance between a number of database environments, engineering needs, cost, and efficiency will vary from team to team. The ultimate goal is to ensure that all commits are deployable to production. Some teams may need seven environments while others might only need three. One team may be building a monolith while another is working on a distributed application.
All teams, however, accept this fact:
Each environment is its own database, and each database change must be synchronized with their other environments.
They must also be aware of the large amount of engineering effort that is required to maintain multiple environments. Keeping these important considerations in mind, teams should strive to minimize their total number of environments.
If this is not realistically possible, then they can turn to specialized tooling to automate database management across an increasing number of environments.
Published at DZone with permission of Yaniv Yehuda, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.