Part of the reason for creating RavenDB was that I wanted a database that actually took how it is being used into account and provided good support for common usage scenarios. Making the process of moving between Dev > UAT > Production easier is a big part of that.
Typically, databases don’t handle that directly, but let the users figure it out. You can see the plethora of SQL Schema deploying and versioning options that you have to deal with.
With RavenDB — for indexes, in particular — we made the process very easy. Indexes are defined in code, deployed alongside the application, and versioned in the exact same manner in the exact same system.
But as the feature set of RavenDB grows, we need to consider the deployment scenario in additional places. We recently started talking about the development cycle of ETL processes, subscriptions, backups, and external replication. The last two are fairly rare in development/UAT scenarios, so we’ll ignore them for now. They are typically only ever set up and used in production. Sometimes, you test them on a dedicated instance, but it doesn’t make sense to deploy a backup configuration in most cases. External replication is basically just destination plus credentials, so there isn’t really all that much to track or deploy.
ETL processes and subscriptions, on the other hand, can contain quite a bit of logic in them. An ETL process that feeds into a reporting database might be composed of several distinct pieces, each of them feeding some part of the data to the reporting database. If the reporting needs change, we’ll likely need to update the ETL process, as well, which means that we need to consider exactly how we’ll do that. Ideally, we want a developer to be able to start working on the ETL process on their own machine, completely isolated. Once they are done working, they can check in their work into the code repository and move on to other tasks. At some future time, this code will get deployed, which will set up the right ETL process in production.
That is a really nice story, and it's how we are dealing with indexes, but it doesn’t actually work for ETL processes. The problem is that ETL is typically not the purview of the application developer; it is in the hands of the operations team, or maybe it is owned by the team that owns the reports. Furthermore, changes to the ETL process are pretty common and typically happen outside the release cycle of the application itself. That means that we can’t tie this behavior to the code, unlike indexes, which have a pretty tight integration with the code that is using them. ETL is a background kind of operation with little direct impact, so it can’t be tied to the application code like indexes is. Even with indexes, we have measures in place that prevent (lock) the index definition, so an administrator can update the index definition on the fly without the application overwriting it with the old version of the index.
Subscriptions are more of a middle ground. A subscription is composed of a client-side application that process the data and some server side logic related to filtering and shaping it. On the one hand, it makes a lot of sense for the subscribing application to control its subscription, but an admin that wants to update the subscription definition is a very likely scenario — maybe as a result of a data change or need input from the business. We can update the server side code without re-deployment, and that is usually a good idea.
To make matters a bit more complex, we also have to consider secrets management. ETL processes, in particular, can contain sensitive information (i.e. connection strings), so we need to figure out a way to have the connection string but not have the connection string need to remember to update the connection string from my local machine to the production database. Or, much worse, if I’m taking the ETL from production, I don’t want to accidentally also get the production connection string. That means that we need to use named connection strings and rely on the developer/admin to set it up properly across environments.
I would really appreciate any feedback you have about how to handle this scenario.
Both ETL processes and subscriptions are just JSON documents of not too much complexity, so actually moving them around between servers isn’t hard — it is the process of doing so that we are trying to flesh out. I should also mention that we are probably just going to make sure that there is a process to handle that — not something that is mandated because some companies have very different deployment models that we need to fit into. This is us trying to think about the best way to handle the most common scenario with as little friction as possible.