Back up CockroachDB to S3 via HTTPS Proxy
In this article, see how to backup CockroachDB to S3 via HTTPS proxy.
Join the DZone community and get the full member experience.Join For Free
When running your production workloads on CockroachDB, you'll want to take regular backups. CockroachDB is frequently deployed into a cloud, such as EC2, GCP, or Azure, and these cloud environments consistently offer a highly durable "blob store". That, coupled with how well CockroachDB's backup/restore works with these blob stores, makes them an excellent choice of backup target. In certain cases, organizations may choose to limit outbound traffic from their workloads running in the cloud, so they may deploy a proxy to manage these HTTP and/or HTTPS requests. Having just configured this myself, I figure sharing it here would make sense.
In my case, I'm running a three-node cluster right on my MacBook Pro, so I'm cheating a little bit in that it's not running in the cloud. Still, for the purposes of this experiment, I think it's okay. I should note that, in that startup procedure, the one shown on the link to the docs, you need to add this step prior to starting each of the three
Now, what's this process running on port
8888 on my Mac? That is my proxy and, for this, I chose to try one called mitmproxy. With that installed, I started it up:
Once running, mitmproxy logs traffic to the terminal. I'll show what this looks like down below, once the backup completes and some log entries are present. After mitmproxy starts up, it creates a directory containing CA certificates:
To establish trust between CockroachDB's S3 client and mitmproxy, you'll need to run the following SQL command while logged in to CockroachDB as an ADMIN user:
That string is the certificate contained in the file
Prior to running the backup, a couple of things need to be done within Amazon EC2:
- Create an S3 bucket to contain the backups (mine is in the
- Use the IAM interface to download security credentials.
I've already got a table,
t1, that has data in it, so I won't show that. The next part is to build up the SQL backup command:
Going over this one part at a time:
BACKUP TABLE t1: just the usual syntax to back up a specified table. You can also do a specific database, or the entire cluster.
TO 's3://crdb-goddard/t1: the backup files will be created within the
t1 folder of the
crdb-goddard bucket created earlier.
?: this is the separator between the URL and the parameters, and the
& character separates each of the provided parameters shown below.
AWS_ACCESS_KEY_ID=AKIAX4BORNLV64R5CHGM: this is the "Access key ID" obtained above using the IAM UI.
AWS_SECRET_ACCESS_KEY=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: this is the "Secret access key", also downloaded via the IAM UI.
AWS_ENDPOINT=https%3A%2F%2Fs3.us-east-1.amazonaws.com: when using a proxy and setting its cert as we have done here, it's necessary to explicitly set this parameter as well as the following one.
AWS_REGION=us-east-1: (see above)
AS OF SYSTEM TIME '-30s';: this specifies to take the backup as of the specified time, 30 seconds ago. This helps to reduce contention with ongoing transactions.
If all of this is correct, the backup should run and produce files within the S3 folder. Let's see ...
That appears to have succeeded. Here are the files produced in S3:
Finally, here is what the mitmproxy logs show:
And that's what I wanted to show. Thank you for following along. I hope this is helpful!
Published at DZone with permission of Michael Goddard. See the original article here.
Opinions expressed by DZone contributors are their own.