Dockerfile Tips and Tricks

If you're looking to get started with Docker, these tips will keep your Dockerfile up to snuff. Pin your dependencies, consider the order of your statements, and clean up.

By  · Tutorial
Comment
Save
11.7K Views

Last month saw the celebration of Docker Global Mentor Week 2016, a great initiative by Docker to help users improve at all skill levels. Docker is one of the key technologies in the resin.io stack, and we've found that there are a lot of Docker-related best practices, tips, and tricks which can dramatically improve the resin.io developer experience. Docker already has a best practices collection, but not all of them apply to the resin.io use case. In the spirit of Global Mentor Week, in this blog post, I've collected our highest impact Docker tips for resin.io applications and hardware devices.

The notes below are divided into two main parts: Must Have practices that you should really use every time, and Nice to Have tips that can further improve your code and your experience, but are a bit less hard and fast.

Must Have

The following practices should save you a lot of pain during your development process.

Pin Software Versions

The clear winner of the best practices lineup is pinning the versions of all your dependencies. This includes the base images, the code you pull from GitHub, the libraries your code relies on, and so on. With versioning, you can tie down much easier a known-working release of your application. Without it, it's easy for your components to change such that a previously working Dockerfile does not build anymore.

You can find the latest available date-tagged base image versions at the resin.io Docker Hub listing, just choose your base image and look at the Tags tab. For example, here's the tag listing for resin/raspberrypi3-debian. Thus you should for example use jessie-20161119, instead of the plain jessie tag, as the latter changes day-to-day:

FROM resin/raspberrypi3-debian:jessie-20161119  


The structure of our base images changes sometimes (rarely, but it does), while with the date tag you can rely on a known good version of the base image (and courtesy of Docker, they will always be available for download).

A trickier thing is pinning the version of the software installed from the operating system's package manager. In Debian this would be running apt-get with specific version information, such as

RUN apt-get update && \  
    apt-get install -yq --no-install-recommends \
      i2c-tools=3.1.1-1 \
    ...


Same goes for Debian packages, Alpine packages, and Fedora packages, and their respective package managers. It takes a bit more legwork to set up pinned versions if you have a decent number of packages you've installed, but it's worth it on the long run.

Quite often, you'll install software from version control (such as from git/GitHub), in which case there's no excuse for not using specific commits, defined by a unique ID (such as hash/SHA for git), or a tag. Here's an example of how you would check out a specific tagged version of the code with git:

# Can use tag or commit hash to set MRAAVERSION
ENV MRAAVERSION v1.3.0  
RUN git clone https://github.com/intel-iot-devkit/mraa.git && \  
    cd mraa && \
    git checkout -b build ${MRAAVERSION} && \
    ...


Finally, the pinning should be applied to every library that you install, whether it's using requirements.txt (Python), package.json (Node.js), Cargo.toml (Rust), or some other programming language's package manager. Always pin (or often called lock or freeze) the external libraries to a version number or unique commit!

Clean Up After Yourself

It's common wisdom that one of the best ways to speed up a computer program is to eliminate unnecessary calculations ("make it do less"). The same goes for software deployment: the best way to speed up deploys and updates is not to ship code that is not needed. In our case: clean up after yourself and remove the unneeded bits from your container.

What are unneeded bits? Most commonly they are temporary files left behind the package manager or source code of software that is built and installed in your Dockerfile.

The way to clean up after the package manager depends on the distribution used in your base image. In the case of Debian and Raspbian, that's apt-get, and Docker already has quite a bit of advice regarding using apt-get in a Dockerfile. It comes down to finishing up the installation step with the removal of temporary information such that:

RUN apt-get update && \  
    apt-get install -yq --no-install-recommends \
      <packages> \
    && apt-get clean && rm -rf /var/lib/apt/lists/*


The last line above removes the temporary files left behind by apt-get that you won't need on your device.

If you use Alpine Linux, the apk package management tool has a handy --no-cache option, which leaves behind nothing to clean up:

RUN apk add --no-cache <package>  


For Fedora, the dnf package manager can be handled similarly to apt-get:

RUN dnf makecache && \  
    dnf install -y \
      <packages> \
    && dnf clean all && rm -rf /var/cache/dnf/*


Cleaning up the source codes of installed software is usually quite simple, just removing the directories created in earlier steps of the build process. To keep with the MRAA example above, this would be one way to clean up after a git checkout:

ENV MRAAVERSION v1.3.0  
RUN git clone https://github.com/intel-iot-devkit/mraa.git && \  
    cd mraa && \
    git checkout -b build ${MRAAVERSION} && \
    <some build steps>
    make install && \
    cd .. && rm -rf mraa


Also make sure that you keep all the cleanup statements in the same RUN section, otherwise they will appear to be cleared up, but still present in the final Docker container as ballast.

Combine RUN Statements

The last note above leads me to the last Must Have practice, which is combining the RUN statements logically within your Dockerfile. The steps that logically belong together should be in the same statement, to avoid a couple of common problems, mostly related to caching and using disk space unnecessarily. First, you can have unexpected build outcomes due to caching. If your apt-get update step is in a separate RUN from your apt-get install <package> step, the former might be cached and not updated while you expect it to be. Similar things can happen if you separate your git clone and the actual build.

Second, files deleted in separate later RUN steps are retained in the final container, but not accessible (ballast).

The Docker documentation has a few more notes and background on this advice.

Nice to Have

The following practices are highly recommended, usually taking your experience from good to great, but not necessarily being a bottleneck for getting things done.

Order Dockerfile Statements

Docker tries to cache all the steps in your Dockerfile that has not changed, but if you change any statement, all the steps following it will be redone. You can save quite a bit of time in the build process by arranging your Dockerfile in order of least likely to more likely to change, whenever possible. For example, general setup such as setting working directory, enabling the initsystem, setting maintainers should happen earlier.

MAINTAINER Awesome Developer <awesome@developer.net>  
WORKDIR /usr/src/app  
ENV INITSYSTEM on  


These statements can be followed by installing packages using the operating system's package manager, then compiling your dependencies, enabling system services, and other setup. For example, towards the end of this section of your Dockerfile you should be installing your Python:

COPY requirements.txt ./  
RUN pip install -r requirements.txt  


Or Node.js dependencies.

COPY package.json ./  
RUN npm install  


Copying your application source code should come near the end, as that is most likely to change most often. It could just be a "copy everything" command, such as:

COPY . ./  


This way you can speed up the build and deployment process, and your Dockerfile will be easier to read as well! The examples above are just for reference, the logical order can greatly depend on your particular application!

Use .dockerignore

Connecting to the previous step, always define a .dockerignore, to tell our builders what content from your source code would not need to go on the device itself, not copied in the COPY . ./ step. The ignored content can be the README.md or other documentation, images included with that documentation, or any other pieces that are not required for your application's functionality but that you are keeping in the same repository for one reason or another.

Use a Start Script

Having created (and debugged) a large number of projects, this one would be personal advice: don't call your application right from the CMD step, but call a start script there:

CMD ["bash", "start.sh"]  


And then, inside your start.sh, you can have, for example, python app.py or any other way to start your application. The advantage is that it's much easier to expand or add debugging steps to the start script than to constantly rewrite the CMD step. You want to emit some debug info before your main code starts? Just add as many lines and as much testing logic to your start script as you like.

On the other hand, you can also speed up your development and testing using resin sync. Resin sync can copy your application source code into one of the running devices and update it in place (without rebuilding the Dockerfile), then restart the container with the updated settings. However, it can only do that effectively if the file is not cached by Docker, for example due to being referenced in CMD directly.

Create a Non-Root User

By Docker default, the code in your application container is run by root. As a good preventive security practice, it's recommended to create a non-root user, and grant it only as much privilege as needed. For example:

RUN useradd --user-group --shell /bin/false resin  
USER resin  


This will create a user called resin, and run all subsequent steps as that user. See more on this in the Docker docs, or this blog post.

Summary

For further research, check our documentation on Build Optimization or the Docker Best practices for writing Dockerfiles (those that apply). You might also want to take a look at the Dockerfile Linter for general improvements and advice.

Do you have any other Docker best practice on resin.io that you would like to share? Leave your advice here in the comments, chat with us on Gitter, or drop by the forums! Would love to hear!

Published at DZone with permission of Gergely Imreh, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.


Comments