CHAPTER 2
In order to package up your own Docker image, you write a text document called a Dockerfile that lists all the steps needed to both make the image and use the Docker command line to build it. The Dockerfile uses a very simple domain-specific language that only requires a handful of instructions. Code Listing 10 shows a perfectly valid Dockerfile.
Code Listing 10: A Simple Dockerfile
FROM ubuntu RUN apt-get update && apt-get install nano |
When you build an image from that Dockerfile and run a container from the image, you’ll be working in an Ubuntu container with the nano package installed. The FROM instruction specifies the base image so that your image will start from there and layer on the changes in the rest of your Dockerfile. In this case, it will run two apt commands to install Nano.
In order to build an image, you use the docker image build command. You need to specify both a repository name to identify the image, and the path Docker should use as the context for building the image. You can also tag images with labels, which explains how you can have multiple image versions in a repository (like ubuntu:12.04 and ubuntu:14.04). Code Listing 11 builds an image using a file called Dockerfile in the local directory.
Code Listing 11: Building the Docker Image
$ docker image build --tag dockersuccinctly/ubuntu-with-nano . Sending build context to Docker daemon 2.048 kB Step 1/2 : FROM ubuntu latest: Pulling from library/ubuntu Digest: sha256:34471448724419596ca4e890496d375801de21b0e67b81a77fd6155ce001edad Status: Downloaded newer image for ubuntu:latest ---> ccc7a11d65b1 Step 2/2 : RUN apt-get update && apt-get install nano ---> Running in 53cccf9021fb ... ---> 9ef68677ce6a Removing intermediate container 53cccf9021fb Successfully built 9ef68677ce6a Successfully tagged dockersuccinctly/ubuntu-with-nano:latest |
Tip: Dockerfile (with no extension) is the default filename Docker looks for, but you can call your Dockerfile anything and identify it with the --file option. This means you can call your file server.dockerfile and build it with docker image build --file server.dockerfile.
The image build command gets executed by the Docker server—the client simply sends the details through. That’s why you must specify a path (using . for the current working directory in this case). The client sends the contents of the path to the server, and the server stores it in a working folder it uses to build the image. That folder is called the build context, and later in this chapter we’ll see why it’s important.
You can also give your image a repository name. When you build locally you can call your image anything you like, but the convention is to use a format such as {user}/{application}, where the user part is your account ID on Docker Hub.
The tag is the unique identifier for a particular image within a repository, which means that in the public registry on the Docker Hub you’ll see images with repository names like microsoft/azure-cli and sixeyed/hadoop-dotnet that each have many image versions. If you don’t specify a version in a tag, Docker uses the default latest.
Note: Some images on the Hub don’t have a user in the repository name, which means the Ubuntu image is simply called ubuntu rather than canonical/ubuntu. These are from official repositories that are curated, verified, and security scanned. You should use official images for your base image as a preference.
When you successfully build an image, it’s stored in the Docker server’s local cache and you can run containers from it. You can also push it to a shared image registry such as Docker Hub or your own registry (which we’ll cover in Chapter 3, Image Registries and the Docker Hub).
The Docker CLI can list all the images stored in the Docker image cache with the image ls command, as shown in Code Listing 12.
Code Listing 12: Listing Images in the Engine Cache
$ docker image ls REPOSITORY TAG IMAGE ID CREATED SIZE dockersuccinctly/ubuntu-with-nano latest b06d1e92b27e 3 minutes ago 165.9 MB ubuntu latest f8d79ba03c00 8 days ago 126.4 MB hello-world latest c54a2cc56cbb 7 weeks ago 1.848 kB nginx alpine 5ad9802b809e 8 weeks ago 69.3 MB |
The output from the docker image ls command tells you the repository name and the tag for each image, its unique image ID, when it was created in the cache, and the size. In this output, I have three images downloaded from the Hub for the containers I ran in Chapter 1, and I have my own newly built image with the dockersuccinctly account name. When you start using larger images, your local cache can use a lot of disk space—we’ll see how to manage that in Chapter 4, Data Storage in Docker.
The only required Dockerfile instruction is FROM, which specifies the base image on top of which a new image will be built. Of course, that doesn’t do much on its own, but to build useful production-grade images you need only a few more instructions:
Here’s a very simple Dockerfile that shows all the main instructions—this image is for a basic app that listens for input on a specific port and echoes out any input it receives to a file. Code Listing 13 shows the Dockerfile in full.
Code Listing 13: Dockerfile for an Echo Application
FROM ubuntu RUN apt-get update && \ apt-get install -y netcat-openbsd ENV LOG_FILE echo.out COPY ./echoserver.sh /echoserver.sh RUN chmod +x /echoserver.sh EXPOSE 8082 VOLUME /server-logs CMD /echoserver.sh |
Note: The full code is on GitHub at SyncfusionSuccinctlyE-Books/Docker-Succinctly, and a built image is available on the Docker Hub in the repository dockersuccinctly/echoserver. The order of instructions is important, as we’ll see soon, but in this example the instructions are ordered to make them easy to follow.
As Docker builds that image, it will process each of the following instructions:
Note: It’s important to understand that those commands are happening inside containers during the build process, not on your local machine. After you’ve built this image on your machine, the image will have netcat installed and have a value set for the LOG_FILE environment variable, but your local machine won’t.
Everything except the final CMD instruction gets executed during the build. When you run a container from the image, Docker uses the CMD instruction to tell it how to start—in this case by running the echoserver.sh script. That script starts netcat listening on port 8082 and redirects the output from client connections to a file. The file path uses the volume named in the Dockerfile and the log file name from the environment variable. Code Listing 14 shows how to start the echo server container and find the virtual IP address of the container.
Code Listing 14: Running the Echo Server Container
$ docker container run --detach --publish 8082:8082 --name echo-server dockersuccinctly/echoserver 7a372ff9f350995b4bb8a84215cd8020bd87dbd196367935dab568ed1939cc5f |
The container is now running netcat listening on port 8082, and Docker is forwarding requests to port 8082 on the local host into the container.
In Code Listing 15, we connect to the container using netcat on the host, specifying the localhost address and the published port 8082. Then we write a string to netcat and exit the connection.
Code Listing 15: Connecting to the Echo Server from the Host
$ nc localhost 8082 Hello, Docker Succinctly! ^C |
Note: If you’re running on Windows, the netcat utility won’t be available, but you can install a version from https://eternallybored.org/misc/netcat/.
And lastly, in Code Listing 16, we use docker container exec to run a command inside the container and write the output back to the host. In this case, we read the contents of the output file that netcat on the container is using—this is an echo of the string we sent from the client.
Code Listing 16: Viewing the Echo Server’s Log File
$ docker container exec echo-server cat /server-logs/echo.out Hello, Docker Succinctly! |
By using a volume for the output location and an environment variable for the file name, we can change where the echo data is written for different instances of the container when we run them. We’ll look more closely at Docker volumes in Chapter 4, Data Storage in Docker.
This is a simple example of a standard approach to Dockerfiles. A typical Dockerfile for packaging an application will state a minimal base image, install the application platform, copy in the application source, compile the application, and specify how the application starts.
You can vary that approach. For example, if your platform has an official image, you can use that as the base, and if you can publish your app with all its dependencies, you can compile first, then copy the binaries into the container. There’s a balance between the portability of the Dockerfile, dependencies on third-party resources, and the size of the built image.
Docker uses a layered filesystem for images. Starting from the base image, the Docker server runs a temporary container from the image for each instruction in the Dockerfile, executes the instruction, then saves the temporary container as a new image, adding it to the local image cache. Docker uses the cache during the build process, which means that if it finds an image matching the current instruction stack—that is, one that matches the state you’re asking Docker to create—it will reuse the cached image.
You can write your Dockerfile to make maximum use of the cache by ensuring that the Dockerfile is correctly structured and that the Dockerfiles for different applications each have similar structures. This way, they will use cached images as much as possible. Ideally, when you build apps with similar dependencies, Docker will need only to execute instructions in new layers that are specific to the application. For instance, the Dockerfiles in Code Listing 17 and Code Listing 18 are identical up to the final COPY instruction.
Code Listing 17: Dockerfile ‘A’
FROM ubuntu RUN touch /setup.txt RUN echo init > /setup.txt COPY file.txt /a.txt |
Code Listing 18: Dockerfile ‘B’
FROM ubuntu RUN touch /setup.txt RUN echo init > /setup.txt COPY file.txt /b.txt |
When you build an image from the first Dockerfile, it will create new layers for both of the RUN instructions and the final COPY instruction. Code Listing 19 builds an image from the first Dockerfile by using the -f flag to specify the source Dockerfile name.
Code Listing 19: Building Dockerfile ‘A’
$ docker image build -t dockersuccinctly/a -f a.dockerfile . Sending build context to Docker daemon 4.096 kB Step 1 : FROM ubuntu ---> f8d79ba03c00 Step 2 : RUN touch /setup.txt ---> Running in c9761757ff3c ---> e4d6c1754277 Removing intermediate container c9761757ff3c Step 3 : RUN echo init > /setup.txt ---> Running in 0f63b9763bef ---> 3050c9fc2760 Removing intermediate container 0f63b9763bef Step 4 : COPY file.txt /a.txt ---> f339e6dd38bb Removing intermediate container c1c6ba5469a5 Successfully built f339e6dd38bb |
Let’s note a few things here. The first instruction in Step 1 finds a match in the cache because we’ve already downloaded the ubuntu image, which means Docker simply writes the ID of the cached image it’s going to use (starting f8d).
For Step 2 there is no match, so Docker runs a temporary container from the f8d image, executes the command, and saves the temporary container to a new image with the ID starting e4d. Similarly, for Steps 3 and 4, there is no match in the cache, which means Docker runs a temporary, intermediate container from the image in the previous step, saves the container as a new image, and removes the intermediate container.
The docker image history command displays all the layers in an image, as in Code Listing 20, which shows the layer history for the dockersuccinctly/a image.
Code Listing 20: History of the ‘A’ Image
$ docker image history dockersuccinctly/a IMAGE CREATED CREATED BY SIZE e03337199b8c 3 seconds ago /bin/sh -c #(nop) COPY file:9363c0e5fcfd8d7ad 8 B 715ac6bf594b 4 seconds ago /bin/sh -c echo init > /setup.txt 5 B 1e5d518d70c9 5 seconds ago /bin/sh -c touch /setup.txt 0 B f8d79ba03c00 8 days ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0 B <missing> 8 days ago /bin/sh -c sed -i 's/^#\s*\(deb.*universe\)$/ 1.895 kB <missing> 8 days ago /bin/sh -c rm -rf /var/lib/apt/lists/* 0 B <missing> 8 days ago /bin/sh -c set -xe && echo '#!/bin/sh' > /u 745 B <missing> 8 days ago /bin/sh -c #(nop) ADD file:a2427e00553ce3905b 126.4 M |
There is a lot of detail in there. The “missing” layers mean we don’t have the intermediate layers in our cache, because they’re part of the base Ubuntu image. Docker downloaded that image from the Hub—we didn't build it locally, which means we don’t have all the layers. But we can see part of the instructions that went into building the final Ubuntu image, which is the one with ID f8d that our image started with. There are three layers above that, one for each of the instructions in the Dockerfile.
If you now build an image from the second Dockerfile, it will find matching images in the cache for the first two instructions. Docker tells you it’s found a cache hit in the build output, as we see in Code Listing 21.
Code Listing 21: Building Dockerfile ‘B’
$ docker image build -t dockersuccinctly/b -f b.dockerfile . Sending build context to Docker daemon 4.096 kB Step 1 : FROM ubuntu ---> f8d79ba03c00 Step 2 : RUN touch /setup.txt ---> Using cache ---> 1e5d518d70c9 Step 3 : RUN echo init > /setup.txt ---> Using cache ---> 715ac6bf594b Step 4 : COPY file.txt /b.txt ---> 8cd641b1af84 Removing intermediate container 6d924ab8d087 Successfully built 8cd641b1af84 |
The Docker server has run only an intermediate container to execute the COPY instruction in Step 4 because there was no match in the cache for that. Everything up to that point can come from the cache because the Dockerfile instructions are identical to the cached layers. If we look at the layers in the dockersuccinctly/b image, we’ll see the bottom seven layers match the dockersuccinctly/a image and that only the final layer is different, as in Code Listing 22.
Code Listing 22: History of the ‘B’ Image
$ docker image history dockersuccinctly/b IMAGE CREATED CREATED BY SIZE 8cd641b1af84 About a minute ago /bin/sh -c #(nop) COPY file:9363c0e5fcfd8d7ad 8 B 715ac6bf594b 5 minutes ago /bin/sh -c echo init > /setup.txt 5 B 1e5d518d70c9 5 minutes ago /bin/sh -c touch /setup.txt 0 B f8d79ba03c00 8 days ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0 B <missing> 8 days ago /bin/sh -c sed -i 's/^#\s*\(deb.*universe\)$/ 1.895 kB <missing> 8 days ago /bin/sh -c rm -rf /var/lib/apt/lists/* 0 B <missing> 8 days ago /bin/sh -c set -xe && echo '#!/bin/sh' > /u 745 B <missing> 8 days ago /bin/sh -c #(nop) ADD file:a2427e00553ce3905b 126.4 MB |
The image cache saves a huge amount of time when you’re building many images with similar instructions. But you need to be careful with the image cache. Each instruction in a Dockerfile is a candidate for adding to the image cache, which effectively preserves the state at the time the image was built.
If Docker finds a match in the cache, it will use that match, which means you need to be sure you only cache layers that are good candidates for reuse and that don’t have contents that will become stale. For example, in Code Listing 23 we have the apt-get update command in its own RUN instruction.
Code Listing 23: A Dockerfile Which Will Become Stale
FROM ubuntu RUN apt-get update |
When this image gets built, it will cache a layer with an updated package list. If you build an image from a different Dockerfile that starts with the same instructions, it will use the cached image with the saved package list. If we build the new image several months after building the original image, the new image will pick up the old cached layer—it will not run apt-get update for the new image. Any subsequent instructions that install packages will be using an old package list.
Tip: How you structure your Dockerfile impacts the speed of the build process, but the image cache can also have a functional impact on the contents of the image. Docker’s resource Best practices for writing Dockerfiles is worth getting to know. One of its key recommendations is that you combine multiple commands in a single RUN statement in order to prevent unintended cache hits.
Packaging applications into a Docker image is simple—you specify the base image, install dependencies, copy in your application files, and tell Docker the command to run when a container starts from the image. You will need to consider the workflow for your Dockerfile so that it builds efficiently, and you’ll also need to be aware of the Docker image cache and how it can impact your builds.
The majority of the work that goes into the Dockerfile is about optimizing your image. Typically, for a production image you want the smallest possible image in order to keep your app secure and make it easy to move around. The official base images on the Docker Hub are a good place to start, and in the next chapter we’ll have a closer look at the public Hub and other image registries.