Today, one of the hot topics in cloud is Docker. It's no surprise, given that cloud architectures are typically built around virtualization, rapid deployment, automated orchestration and resource isolation. Often other buzzwords too. The point is, people are pretty sure that "using containers" is a good idea.
What everyone isn't so sure about, however, is exactly what that means. This is especially true as you look at the intersection of new-generation technologies that are designed around orchestration, independence and on-demand scale. NuoDB is just one example of this. Client-server and independent, transient services may naturally fit into a container model. Trying to contain software that is rebellious by nature (read: likes to scale on-demand or run in multiple places all at once) is harder to think about.
For those of you paying attention, you may be rolling your eyes at these recent trends. The notion of containing an application, or defining some resource boundary, is an old idea. Technologies like Solaris Zones or BSD Jails, virtualization via VMs or simply running in a chroot-ed environment have been with us a long time. What I think has sparked interest in Docker specifically, in part, is the mastering, versioning & deployment model they've wrapped around things like LXC. Opinions vary wildly about how mature this is, but it certainly fits with the mindset many of us have today on how to provision and manage systems.
Again, not something new, but increasingly a great way to build flexible, resilient infrastructure. Seen through a lense like Kubernetes or AWS Container Service (more on those in a separate post) containers are a way to apply multi-tenancy, tighten resource management and make distributed deployment a snap. Or from the point of view of Apcera, containers are a way to build strong security models that are driven by policy. Given these different views, and given that NuoDB is designed around a distributed mind-set that already uses SLAs to drive automated management, what can you do when you connect these approaches?
That's the question I've been getting a lot lately, so I did some experiments over the last few weeks. A few results are captured here in this post. These examples will get you thinking, and hopefully help you try NuoDB in some new ways. The focus here is how Docker helps simplify deployment, testing and scaling and keep you focused on effective resource management. So, let's start with a really simple example.
One container, ready to go
NuoDB doesn't have too many requirements, which makes getting started pretty simple. Let's say you want to spin up a container to test a fully-contained database. Using the Dockerfile syntax here's a starting-point building on Ubuntu (it would work for other distros too):
FROM ubuntu:14.04 ENV nuodb_package nuodb_22.214.171.124_amd64.deb RUN /usr/bin/apt-get update RUN /usr/bin/apt-get -y install default-jre-headless supervisor ADD $nuodb_package /tmp/ RUN /usr/bin/dpkg -i /tmp/$nuodb_package RUN /bin/sed -ie 's/#domainPassword =/domainPassword = foo/' /opt/nuodb/etc/default.properties ADD supervisord.conf /etc/supervisor/conf.d/supervisord.conf CMD /usr/bin/supervisord -c /etc/supervisor/conf.d/supervisord.conf
What's happening here? This is a set of steps to create an Image that has all the bits setup with a default command to run on startup. Because NuoDB doesn't hard-code an admin password out of the box this script sets one to get you started. That's ok for testing but not so satisfying for the real-world, so see below for more on dynamic runtime configuration.
In this example the sustaining command is a package called Supervisor. This is a way to keep the container running as long as the NuoDB management agent is still active and is a pretty common pattern. Here's the simple config file to get that working:
[supervisord] nodaemon=true [program:nuoagent] user=nuodb command=/opt/nuodb/etc/nuoagent start
By the way, all the examples here are available on github. Go there for full scripts, documentation, and the latest versions. If you want to contribute back your own recipes, that's the place to go too! Either way, if you have these two files saved as Dockerfile and supervisord.conf respectively in a directory along with the .deb NuoDB installer then you can run docker to build an image:
$ sudo docker build . ... Step 0 : FROM ubuntu:14.04 ... Successfully built 1fa0b5af059f
That's it! You'll see a different hash code at the end, but otherwise you're all set. What could you do with this image? One obvious use is for dev-test since this is a self-contained recipe & image for spinning up instances on demand or running instances across servers. It's a repeatable way to get NuoDB running on your laptop, test VM or public cloud instance. Make sure that THP is disabled on your host, then run your container like this:
$ sudo docker run -P --net=host 1fa0b5af059f
I'll explain the networking bit in the next section. The key thing to note is that what you're doing here is running a container that has all its NuoDB components available to the outside world so you can talk the management and the SQL protocols just like you would to any server. For many use-cases this is exactly what you want. What's missing is the ability to change anything about the configuration. For instance, the domain password, the port ranges or any existing NuoDB instances to peer with.
The quick answer is that Docker lets you define environment variables at run-time using the -e flag. By including a simple wrapper script, or taking advantage of Java's property management, it's easy to connect those variables to your contained installation. There are full examples of this in our github repository. When you fold those in you can run multiple containers on the same host using different ports, inject initial credentials or make any other run-time configuration changes you like. Simple.
Isolationism versus Collaboration
One of the strengths that containers offer is isolation: an application sees only itself and its own OS settings and can't accidentally interfere with other processes. In practice that's not exactly true (e.g., the comment above about host settings for THP) but it's pretty close. Part of providing that illusion is mapping incoming and outgoing ports, so that you could (for instance) run multiple web servers on the same host, where each server thinks it's using port 80 but from the outside point of view each host is actually using a distinct incoming point.
This works well for independent processes that accept connections from outside a container. NuoDB, however, is a peer to peer system where the database and management processes need to discover and talk directly to each other. If I add a Transaction Engine to a running database, the host & port get communicated to existing peer processes so everyone can keep communicating correctly. Today we support advertising alternate addresses to handle name/address translation (the same kind of problem) so we're looking at the right way to do the same for ports. Look for a discussion of that in the future. That said, it raises questions about the roles of peer archiectures and dynamic orchestration as they play together.
Today, the simplest way to get running with NuoDB is to let its peer processes talk directly with each other and expose the endpoints for queries and management. So, in the above example, two flags were used when running the container. The first is -P which tells the container to map ports numbers directly (port 80 inside the container is accessed as port 80 on the outside). The second is the --net=host setting which tells the container to flatten the interfaces if you're running multiple containers on the same host. This way they'll see which ports are already in-use and coordinate appropriately.
The way I see it, this is one of the interesting areas that hasn't seen a lot of exploration in the world of containers: how do you maintain security & isolation while supporting a p2p model that is designed around self-discovery. I think it's an increasingly important topic as networks move towards intelligence, policy and distribution. I would love to hear your thoughts and opinions!
The previous example was focused on one set of use-cases: making it easy to spin-up a full NuoDB installation, or multiple installations, on a host. That's powerful for testing, deployment, expansion, etc. There are plenty of other good reasons, however, to put things inside a container so let's consider one more example.
NuoDB has a simple model for isolated multi-tenancy. On a given host you can install our software and from a single point of management run multiple database processes that are supporting independent databases. We're building automation into the product that drives process management from an SLA point of view. Even still, if you've got a bunch of processes running on the same host you want some guarantees that a given database's process doesn't interfere with what another database needs from the system resources. Sounds like a job for a container.
NuoDB also has a simple way to define what the agent actually runs when asked to start a Transaction Engine or Storage Manager. In place of the nuodb executable you can put a wrapper script that marshals up the provided input and uses it to invoke Docker. In this model, there's a single NuoDB Agent running on a host, but all NuoDB database processes are running in their own, isolated containers. Because you don't need a full NuoDB environment to make this work, the Dockerfile gets even simpler:
FROM ubuntu:vivid ENV nuodb_package nuodb-2.2.0.linux.x86_64.tar.gz ADD $nuodb_package /tmp/ RUN /bin/mv /tmp/nuo* /opt/nuodb CMD /opt/nuodb/bin/nuodb --connect-key $key --agent-port $port --report-pid-as $pid
The documented version of this file and the wrapper script that makes the CMD work are available on github. Note that this example uses a feature that we'll be rolling out later this month in our next release. That's the --report-pid-as flag, which tells the NuoDB process to report a PID that's meaningful outside the container. Without this, the multiple containers would all be reporting the same PID for their contained processes which causes the management tier to get confused since multiple processes appear to be on the same host with the same PID.
This recipe gives you a simple way to support multiple databases on the same host OS with resource allocation guarantees. Defining strict requirements on things like CPU or memory ensure that even under heavy load no one database process ruins it for everyone else. From a management point of view none of the NuoDB tools or interfaces change, and all the reporting mechanisms continue to run as normal so this is operational drop-in.
The 'D' in Docker does not stand for Durability
The goal in writing this was to showcase some of the ways that NuoDB can be used within containers. It was also to call-out some of the places where using containers doesn't mesh well with other infrastructure. In that spirit, this post wouldn't be complete without a discussion on data durability.
In all of the examples above, the NuoDB database processes were running in transient containers with transient filesystems. That's a really good thing for many services that don't need to write to disk. It enhances security, simplifies management and lowers cost. All good. For databases, however, this obviously isn't such a good thing.
So what are the options? For typical databases you need to bypass the container model and get at a real filesystem. Docker does support filesystem access, with varying degrees of indirection and isolation. Obviously one problem with this approach is that it breaks the illusion of an independent environment. A bigger problem is that many environments don't give you this option (e.g., Amazon EC2 Containers as of the time this post was written) and that's arguably an increasing trend that makes a lot of sense.
So what else can you do? Some databases are moving in the direction of providing durability & availability through replication. The focus is less on making one disk really solid and more on keeping enough copies of the data that you could lose any host and keep running. I think that's an interesting model. It's not enough for an enterprise that needs to survive complete failure but it's a cost-effective model for some applications. You can do this with NuoDB, though I wouldn't typically recommend it.
Another approach is keep your data outside the container, but available through an interface that's available from wherever a container gets spun-up. For instance, with NuoDB you can put data into S3 so there's no need for a local, durable disk. Traditionally, databases are focused on the IO path as the optimization point so this isn't possible but as database architectures evolve this model is starting to be possible, as with our system. This is one of the reasons that I think the NuoDB model is such a good fit for the cloud.
One more approach that's possible with NuoDB is to run the storage peers directly on non-containerized hosts. In our terminology, run the TEs in containers and the SMs directly on their hosts. This gives you in-memory, on-demand scaling that's exploiting a lot of the great stuff that containers are designed to give you. It also focuses the flexibility on in-memory and lets the durable hosts, which typically are less dynamic because they're using pre-provisioned disk, run with different rules.
Bottom-line, viewing the world through containers makes you think about things differently. That's especially true for durability. It takes a new view on data availability and durability and new data management models to really shine.
Take-aways and next steps
Like I said several times, this was just an introduction to key concepts. I wanted to share a couple of simple examples & recipes to help get you started. I also wanted to highlight some of the challenges that we're looking at next. I strongly believe that as container models evolve they will need distributed, flexible services at the data layer to scale.
The jury is definitely "still out" on Docker. There are plenty of questions about its security capabilities and whether the mastering & management facilities hold up under real-world scrutiny. As I've sketched out here, I still have quesitons about how it fits with durable or peer systems. As a way to simplify testing and build repeatable environments, however, I have to say it's a nice tool.
Going forward we're continuing looking at the orchestration and security features that will make these models better tie together. Separately, I'm also exploring some new models around data management, tracking provenance and managing residency rules. These kinds of concerns require knowledge about where a given container is running, something that is usually hidden from contained applications, so this is another area where I expect to explore new ideas. We'd love your thoughts, feedback and collaboration in all these spaces! In the meantime, have fun playing with the recipes laid out here.