Speeding up Your Docker Based Builds With Codeship
Speeding up Your Docker Based Builds With Codeship
Codeship's Continuous Integration product is wonderful at enabling a dependable build process. But how can we make this process even faster? Read on for an answer.
Join the DZone community and get the full member experience.Join For Free
Get the fastest log management and analysis with Graylog open source or enterprise edition free up to 5GB per day
“Codeship is awesome, but how can we make our builds faster?”
This is how a lot of conversations start once people have configured Codeship CI/CD with Docker and have their project testing and deploying successfully.
Speeding up your Docker build times isn’t a luxury. It can save your development team hours per day and improve your ability to respond to bugs and productions issues, increasing your customer and stakeholder experience. It also makes your investment into a CI/CD tool like Codeship even more valuable, letting you do more with the service every day.
At Codeship, we wanted to put together some best practices for investigating why your builds might be slow, as well as show you a few ways you can speed them up.
Where Are You Losing Time?
The first thing you want to do when beginning to troubleshoot your build speeds is to find out how long each step in your workflow is taking. Are you losing more time to image downloads, container building, fetching dependencies, or running tests?
Through the online UI, you can examine your logs in detail to see where the big jumps and gaps in your timestamps are.
Once you’ve learned a little bit more about the various steps of your CI/CD process and how long they take, you should be able to lock onto a few paths for potential problems and solutions:
- If you’re spending too much time downloading base images, investigate caching, downloading images, building more efficient images, and running efficient services.
- If building your containers is taking a lot of time, look at improving how you handle dependencies and creating efficient services for CI.
- If your tests are taking the bulk of your time, consider parallelizing, build your services with tests in mind, and keep an eye on resource and infrastructure usage.
Caching is one of the most common ways to speed things up. In our case, we’re talking about caching the Docker image build layers created during your CI process. The main benefit of this is that you don’t need to rebuild complex or large images over and over. You can just download and reuse the image from last time if nothing in that image (i.e., your dependencies, codebase or assets) has changed.
It’s important to know that even if some things change (your codebase should, for instance), you don’t need to rerun everything — only everything dependent on the change. The further down the Dockerfile your volatile content is, the less time your image build will take to rerun.
The first thing you need to do with caching is to enable it. To do that, you add a simple
cached: true directive to your
codeship-services.yml file as seen below for every image you want cached. If you’ve enabled caching, you will need to specify credentials for your Docker image repo so that we have access to push up the cached image. You can find more information on configuring those credentials here.
app: build: image: repo/name dockerfile_path: Dockerfile cached: true encrypted_dockercfg_path: dockercfg.encrypted
The next thing you’ll want to do is see if caching is working. Push a build to your repo to kick off a new Codeship build and, once it’s complete, review your logs.
You’ll need to make sure that you’ve provided credentials for pushing to
repo/name via either an encrypted
dockercfg or a
dockercfg_service generator associated with the relevant step. Codeship pulls a previously populated cache for the branch or tag being built; should that fail, we’ll pull the cache for your master branch.
Near the top of your logs, you should see lines displaying that your cached image is being used:
If you see that, then your cache is working. If you see an like this:
Or an error like this:
Then we’ve got a problem. This means your cache is not working. Let’s take a look at a few reasons why caching often doesn’t work.
- When you start using caching, no cache will exist. Some repositories treat this as an error.
- The cache was invalidated due to an
COPYhigh up in your Dockerfile, or due to upstream changes in your base image. This will not appear as an error, but rather as though you were not using caching for that part of the image build.
- The credentials provided were not valid or were missing.
Now, if you want to test your cache locally, you can do that by appending the
--remote-cache=true to your local
jet run command, as well as specifying
--ci-branch=mybranch. Usage of the remote cache when building locally is disabled by default.
Dockerfile and Dependencies
Now let’s look your application images themselves; that is, your Dockerfiles and specifically how they build your assets and dependencies into your final images. We’ll look at a few common reasons why building images from Dockerfiles can take a bit too long, as well as a few possible solutions.
Move Complexity From Your Test Scripts Into Your Dockerfile
Your dependencies may be installing every time when they don’t need to! Typically you will want to create folders, workdir, users, etc., and then include things like
apt-get update before finally installing all your dependencies. Next, run any other commands and add additional files.
The more complexity you can move from your test scripts into your Dockerfile, the more resuable your image is within a CI execution. Ideally you can split things up so that for each command needed to run, the bare minimum number of files are added to support the command. We also recommend grouping as many related commands to together, within individual layers, as possible.
However, if you find too many things are creating layers that are too “volatile” to be cache reliable, you should experiment with separating them out into separate, logical layers. There can be bit of a push-and-pull to find the right balance that works per project.
Install Dependencies Into a Private Base Image
Additionally, you can consider installing all of your dependencies into a separate, private base image that you pull in and link to your service. This way, compiling your services is not even part of your core base image, and you have fewer monolithic pieces and can optimize separately.
Split Your COPY Commands
In your Dockerfile, you can also try splitting up your
COPY into several smaller
COPY commands whenever only certain files are needed for something, like a
RUN or a directory prep. You can also make sure that you order these sections in a way that places your most “unstable” files as far down the Dockerfile as possible, minimizing cache invalidation and reducing how many layers need to be rebuilt.
For more information on optimizing your Dockerfile, we have a specific guide right here.
Now we’ll take a look at a few solutions involving your
First, consider using step-specific minimal service files. Let’s say your Rails app requires Redis and Postgres containers. You may not need those dependent services for all of your tests, so running them is going to cost you extra time (over and over again, if they’re not utilized on multiple steps).
As a solution, you could easily define different versions of your services with different links and dependencies. Then swap those in and out throughout your CI pipeline so that each step is not starting a single container it doesn’t need. An example of this would be a “Ruby” service with just the code added for linters to use, and a full “app” service with your database and cache services linked for running tests.
Here’s a simple, high-level example:
app: build: image: myapp dockerfile_path: Dockerfile links: - redis - postgres redis: image: redis:3.0.5 postgres: image: postgres:9.3.6 app_ruby_only: build: image: myapp dockerfile_path: Dockerfile
Since the build payload for both the app and ruby services are identical in this scenario, the image for each service will be built once and shared between them.
The second recommendation for optimizing your
codeship-services.yml file is to consider a separate, private base image with your pre-built dependencies and link it to your main container.
As long as that separate image is being cached and your dependencies are still being installed in your original Dockerfile for redundancy and to catch any changes, this should offload the CI time for testing and building your dependency changes from your application CI process. You can even do this with just some of your dependencies, isolating the most reliable ones and leaving the ones subject-to-change in your main image build, if that use case makes sense for you.
A third way to speed up your builds based on your
codeship-services.yml file is a bit more Codeship-specific. If you’re using fairly popular public base images, you can contact us about adding them to the pre-built CI environment we spin up so that they’re ready to go. We’ve got a couple dozen base images we include by default right now, and we’d be happy to add more if we know there’s demand for the increase in speed it provides.
Finally, another good option is to consider where your builds might be bottlenecking and/or hitting constraints with your infrastructure resources. We strongly recommend coming up with a few different workflows of the same pipeline — combining parallelization, nested steps, differentiated and united containers — and running several builds against each. You can then start to figure out what configurations impact your resource usage and therefore slow your build down.
Now, we’ll look at a few build speed optimizations around your
For example, parallelizing can be a great way to speed your builds up. But it also greatly increases your infrastructure usage. If you’re finding that you have slow speeds but aren’t inclined to upgrade to a more powerful instance type, do some quick tests to see if reducing your amount of parallelization actually speeds things up.
As an example of using parallel steps, you can do something like like this:
- type: parallel steps: - service: app command: ./script/ci/ci.parallel spec - service: app command: ./script/ci/ci.parallel plugin - service: app command: ./script/ci/ci.parallel qunit our team that we've released it - type: serial name: master_deployment tag: master steps: - service: deploy command: deploy_me_to_staging - service: deploy command: validate_staging.sh - service: notifications command: notify_team
While parallelizing with Codeship is great, you can also parallelize internally. In most languages, there are packages that allow you to run threads simultaneously, such as
parallel_tests for Rails and
concurrently for NodeJs. This lets you parallelize actions within the existing infrastructure usage, rather than adding on to the infrastructure weight. Note that all optimizations gained here are permanent within your codebase, not specific to a Codeship build or deployment.
Again, if you’re hitting a wall because of how many resources you’re using — whether because of parallel steps, the amount of things being built, compiled, downloaded, or pushed or even just because of the size of the project — you can always contact our help desk about it. We’ll gladly take a look at just how much of your resources you’re using up and help you plan from there.
Published at DZone with permission of Ethan jones . See the original article here.
Opinions expressed by DZone contributors are their own.