Understanding and Creating Effective Docker Images
Understanding and Creating Effective Docker Images
If you've ever wanted to know more about Docker Images, you've come to the right article. And take a screenshot; it'll last longer.
Join the DZone community and get the full member experience.Join For Free
Learn how to migrate and modernize stateless applications and run them in a Kubernetes cluster.
Ever wondered about how to speed up your image building process or build minimal-sized Docker images for your application on a CI server or on your laptop to develop and deploy apps quickly? This post might be helpful for you.
First, to build effective Docker images for your applications, it would be better to first look at the anatomy of a Docker image and how things are organized in a Docker image.
The Image Technology
It is very likely that you have used an ISO file before, either by getting it from your friend or downloaded directly from the Internet to install a particular piece of software such as Ubuntu or Windows on your system. The ISO file is also known as an Image file for that piece of software. That image contains everything you need to install the software and you can install (instantiate) multiple running instances of that software from the same image on a number of computers.
An image is actually a snapshot of a software at a particular time. The image itself does not consume computing resources but when you install the software from that image, the software will consume the computing resources.
The same way you might have used or built images for virtual machines of a public or on-premises cloud such as AMIs in AWS, VHDs in Azure, OVAs in VMware and others. A VM image is also a snapshot of a virtual machine on a particular time period.
If we take the same ISO file analogy and install Ubuntu on a system, for example, against the ISO file then we can say that the system is a running instance of the ISO image file which actually consumes computing resources. If we build an EC2 instance from an AMI in AWS, that EC2 instance is a running instance of that AMI which consumes computing resources for what we pay for.
Docker images & containers are related to each other in the same way. A container is a running instance of a Docker image which actually consumes computing resources. A Docker image represents a snapshot of an application at a particular period of time.
But the question is how Docker images are different from other image technologies such as ISO image files, AMIs, VHDs etc and how do we create effective Docker images for our applications? This is where we need to understand Docker Image Layered File System.
Docker Images & Layers
A Docker image is composed of layers. A Docker image is created using a Dockerfile. Dockerfile is a file that contains a number of instructions used to build your application image. To help understand an Image’s Layered File System, let’s take an analogy of a construction building. Say we have a 4 story building of a company’s office. Each floor of the building is held by a department of the company.
If we take each floor as a layer of the building, we can say that the building has 4 layers built on the top of each other. The bottom-most layer is called the base layer of the building as shown below. At the top of the base layer, we add an additional layer to form a complete building.
Suppose we want to make a change in the layer 2 of the building; for this we have to destroy and rebuild the 3rd and 4th layers of the building as well, but won’t change the bottommost base layer.
A similar concept applies to Docker images. Docker images are composed of layers built on the top of each other. If we modify any layer below the topmost layer, all the upper layers would take effect but the layers below that modified layer won’t change.
Let’s take a simple example to see how image layers are created & organized. Consider the below Dockerfile:
FROM microsoft/dotnet:latest WORKDIR /app COPY . /app RUN ["dotnet", "restore"] ENTRYPOINT ["/bin/bash"]
We can easily determine that what this Dockerfile is all about by simply reading the instructions. The Dockerfile is for a.NET Core application and we can see that this Dockerfile contains 5 instructions, hence it will create 5 image layers. We run the command
docker image build -t <img-name> . at the root of the project to build the image and notice the output carefully:
Sending build context to Docker daemon 70.66kB Step 1/5 : FROM microsoft/dotnet:latest ---> 7d4dc5c258eb Step 2/5 : WORKDIR /app ---> f155edccaebc Removing intermediate container b9d453e30500 Step 3/5 : COPY . /app ---> 5e8829f8e16a Step 4/5 : RUN dotnet restore ---> Running in 18c1895b1882 Restore completed in 63.42 ms for /app/app.csproj. ---> 8aa5ee29da9e Removing intermediate container 18c1895b1882 Step 5/5 : ENTRYPOINT /bin/bash ---> Running in f5fcc6b37b77 ---> ce49ab5a2c9c Removing intermediate container f5fcc6b37b77 Successfully built ce49ab5a2c9c Successfully tagged kjanshair/app1:latest
The way Docker creates an image is it first creates a container from the base image called Intermediate Container (you can see the base image ID at line 3). It then executes the second instruction of the Dockerfile in the intermediate container, creates another image layer (with ID on line 5) and destroy the intermediate container (line 6). The same way it creates another intermediate container from the last image (image with the ID at line 5), execute the 3rd instruction, creates another image layer and destroys the intermediate container and it keep doing the same for all instructions in the Dockerfile until it creates a final image (with ID at line 18) and assign a tag to it (line 19).
These layers are built on the top of each other. You can use the
docker image history <img-name> command to see all layers of a particular image. The nice thing about these image layers is that each layer is cached by the Docker Engine while creating the image. If we, for example, change an instruction at line 3 of the Dockerfile and then re-build the image, the Docker Engine will only create the new layers against the change instruction and all the layers above of that changed layer but won’t change the layers below that modified layer instead, it will get it from cache (Remember the building analogy).
To check whether image layers are built again by the Docker engine, We need to modify the source code a bit and then re-build the image. Upon modifying the source code, the change will affect the 3rd instruction of the Dockerfile. But before modifying the code, run the
docker image history <img-name> command and note down the top 4 layer IDs of the image that is:
ce49ab5a2c9c - Layer 5 8aa5ee29da9e - Layer 4 5e8829f8e16a - Layer 3 f155edccaebc - Layer 2 7d4dc5c258eb - Layer 1
Let’s modify the source code, rebuild the image and notice the output.
Sending build context to Docker daemon 70.66kB Step 1/5 : FROM microsoft/dotnet:latest ---> 7d4dc5c258eb Step 2/5 : WORKDIR /app ---> Using cache ---> f155edccaebc Step 3/5 : COPY . /app ---> eb76f1a774b4 Step 4/5 : RUN dotnet restore ---> Running in 05be553ca0f0 Restore completed in 17.94 ms for /app/app.csproj. ---> c4125def5c1f Removing intermediate container 05be553ca0f0 Step 5/5 : ENTRYPOINT /bin/bash ---> Running in 83d3eb5046c9 ---> 64a97e217018 Removing intermediate container 83d3eb5046c9 Successfully built 64a97e217018 Successfully tagged kjanshair/app1:latest
Notice the line Using cache at line 5. This is because upon modifying the source code, the change didn’t affect at this layer so it remained unchanged and Docker got the cached layer, but below that instruction, everything was rebuilt. Type the
docker image history <img-name> command again and check the top 5 layer IDs.
64a97e217018 - Layer 5 c4125def5c1f - Layer 4 eb76f1a774b4 - Layer 3 f155edccaebc - Layer 2 7d4dc5c258eb - Layer 1
Noticed that the bottommost 2 layer IDs remained unchanged but above 3 layers have been changed due to change in the source code.
It is important to note that the RUN statement in the Dockerfile executes the command in a new layer.
So now you have an idea that how Docker images, layers, and cache works. A basic idea of how images and layers work is important as it will help you to build optimized images for your applications.
Creating Optimized Docker Images
Have you noticed that there was a problem while creating the image for our application? That is, NuGet packages were restored again in the image merely upon changing the code this should not be the way we have to restore our packages each time we make a change in the code which results in slower execution time.
As stated earlier, changing the source code affects the COPY instruction in the Dockerfile. Hence it will make the above layers to change. Another way of writing Dockerfile for the same application could be:
FROM microsoft/dotnet:latest WORKDIR /app COPY app.csproj /app RUN ["dotnet", "restore"] COPY . /app ENTRYPOINT ["/bin/bash"]
This adds two more layers in the Docker cache but while building the image, you will see that the NuGet packages don’t get restored upon modifying the source code which results in faster image building for our application, it will only copy the new source code and set the entry point for the container.
So the second approach for Dockerizing the application would be faster for development and deployment. This is how it will help you know which components of your application should be cached in layers and which should not, depending upon your particular case.
Understanding how Docker organizes images and layers is very helpful for creating effective Docker images. The image built using the second approach is a bit larger in disk size but faster as compared to the first approach for Ddockerizing the same application. Again everything depends upon the situation and application’s structure that you want to Dockerize. The above solution probably changes in your particular case.
Opinions expressed by DZone contributors are their own.