Deep Dive into Docker Cache Mechanism with Layers

8 min readDec 7, 2022

Containerization and Docker

Containerization is a method of packaging and deploying applications in containers. Docker is a tool that allows for the creation, deployment, and running of applications in lightweight and portable containers. A Dockerfile is a text file that provides instructions for building a Docker image, which is a snapshot of the container environment. The Dockerfile specifies the steps needed to create the image, such as installing dependencies and copying files into the container. The file is written using a simple syntax and is typically used to automate the process of building a Docker image. When executed, the Dockerfile creates a Docker image that can be run as a container. Docker provides consistency and reproducibility in the deployment of applications, making it useful for building and deploying microservices and for continuous integration and deployment in software development.

Docker images can be easily moved between different environments and platforms, making it possible to deploy applications consistently across different environments. This allows for flexibility and agility in the deployment process, and makes it easier to build and deploy applications in a continuous integration and deployment workflow.

Docker and Dockerfiles provide scalability and resiliency. With container orchestration platforms like Kubernetes, it is possible to easily scale and manage large-scale, distributed applications. This allows for efficient resource allocation and deployment of applications in a distributed environment.

To create a Dockerfile, you first need to create a new text file and save it with the .Dockerfile extension. Then, you can add the instructions for building your Docker image to the file. Here is an example of a simple Dockerfile that creates an image based on the latest version of the Ubuntu operating system:

# Use the latest version of Ubuntu as the base image
FROM ubuntu:latest
# Update the package manager and install the NGINX web server
RUN apt-get update && apt-get install -y nginx
# Copy the contents of the local "html" directory to the container
COPY html /var/www/html
# Expose port 80 so that the web server can be accessed from the host
EXPOSE 80
# Start the NGINX web server when the container is run
CMD ["nginx", "-g", "daemon off;"]

In this example, the Dockerfile uses the FROM instruction to specify the base image for the Docker image. The RUN instruction is used to run commands to update the package manager and install the NGINX web server. The COPY instruction is used to copy files from the local "html" directory to the container. The EXPOSE instruction is used to expose port 80, and the CMD instruction is used to specify the command to run when the container is started.

To build a Docker image from this Dockerfile, you can use the docker build command. For example:

$ docker build -t demo-nginx-image .

This command will build an image with the name “demo-nginx-image” based on the instructions in the Dockerfile. Once the image is built, you can use the docker run command to create a container from the image and run it. For example:

$ docker run -p 8080:80 demo-nginx-image

This command will create a new container from the “demo-nginx-image” image, map port 8080 on the host to port 80 in the container, and start the container. This will make the NGINX web server available on port 8080 on the host.

Cache Mechanism with Layers in Docker Filesystem

In Docker, a layer represents a change to the filesystem of a Docker image. Each instruction in a Dockerfile generates a new layer in the resulting Docker image. For example, the FROM instruction creates a new layer from the base image specified in the instruction, and the RUN instruction creates a new layer by running a command and committing the changes to the filesystem.

Docker uses a cache mechanism to improve the performance of building images. When a Dockerfile is used to build an image, Docker will first check if it has a layer in its cache that corresponds to each instruction in the Dockerfile. If a cache hit occurs, Docker will use the cached layer instead of rebuilding the layer from scratch. This can save a significant amount of time when building images, since layers that have not changed will not need to be rebuilt.

For example, suppose you have a Dockerfile that looks like this:

FROM ubuntu:latest
RUN apt-get update && apt-get install -y nginx
COPY html /var/www/html
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]

If you build this image for the first time, Docker will need to download the base Ubuntu image, run the apt-get commands to install NGINX, and copy the HTML files to the image. This will create several layers in the resulting Docker image.

If you build the same image again, Docker will check its cache to see if it already has the layers for each instruction in the Dockerfile. If the cache contains the layers for each instruction, Docker will use the cached layers and skip the steps of downloading the base image and running the apt-get commands. This will make the build process much faster, since it will only need to copy the HTML files to the image and create a single new layer.

Working with Filesystem of the Docker Images

When you run the COPY instruction in a Dockerfile, you are copying files from the local filesystem of the host where the Dockerfile is being built to the filesystem of the Docker image being built. The files are not being copied between different storage locations, but rather from the host filesystem to the filesystem of the Docker image.

For example, suppose you have a Dockerfile like this:

FROM ubuntu:latest
COPY html /var/www/html

This Dockerfile uses the COPY instruction to copy the contents of the local "html" directory to the /var/www/html directory in the Docker image. This does not involve copying the files between different storage locations, but rather from the host filesystem to the filesystem of the Docker image.

Once the Docker image is built, you can use the docker run command to create a container from the image and run it. This will start a new process in the container and mount the filesystem of the Docker image as the root filesystem of the container. When you access files in the container, you are accessing the files on the filesystem of the Docker image, not on the host filesystem.

Instructions in a Dockerfile

In a Dockerfile, the CMD, RUN, and ENTRYPOINT instructions are used to specify commands that will be executed when a Docker container is run. However, each of these instructions serves a different purpose and has a specific use case.

The CMD instruction is used to specify the default command that will be executed when a container is run. This command can be overridden when the container is started using the docker run command. The CMD instruction is used to provide default arguments for the ENTRYPOINT instruction.

The RUN instruction is used to execute commands during the build process of a Docker image. The RUN instruction creates a new layer in the resulting Docker image and commits the changes to the filesystem.

The ENTRYPOINT instruction is used to specify the command that will be executed when a container is run. Unlike the CMD instruction, the ENTRYPOINT instruction cannot be overridden when the container is started. The ENTRYPOINT instruction is used to define the command that will be executed when a container is run, and the CMD instruction is used to provide default arguments for the ENTRYPOINT command.

Here is an example of how these instructions can be used in a Dockerfile:

FROM ubuntu:latest

# Install the Python 3 interpreter
RUN apt-get update && apt-get install -y python3

# Copy the contents of the local "src" directory to the container
COPY src /app

# Set the default command to run when the container is started
CMD ["python3", "/app/main.py"]

# Set the command to run when the container is started
ENTRYPOINT ["python3"]

In this example, the RUN instruction is used to install the Python 3 interpreter, the COPY instruction is used to copy the source code for a Python application to the container, and the CMD instruction is used to specify the default command to run when the container is started. The ENTRYPOINT instruction is used to specify the command that will be executed when the container is run.

When the container is run, the ENTRYPOINT command will be executed with the default arguments specified in the CMD instruction. This will run the Python 3 interpreter and pass the /app/main.py script as an argument. This will execute the Python script and run the application.

If you want to override the default command and provide different arguments when running the container, you can do so using the docker run command. For example, you can run the container like this:

$ docker run -it my-python-app python3 /app/other.py

This will run the python3 command specified in the ENTRYPOINT instruction, but will override the default arguments specified in the CMD instruction and instead pass the /app/other.py script as an argument. This will execute the other.py script instead of the main.py script.

The benefits of using Docker and Dockerfiles include isolation, portability, scalability, and resiliency. These benefits make Docker a valuable tool for building and deploying applications in a variety of environments and for different types of applications.

Key Takeaways

A Dockerfile is a text file that contains instructions for building a Docker image.
Each instruction in a Dockerfile generates a new layer in the resulting Docker image.
Layers in a Docker image are stored in a union filesystem, which allows files and directories from multiple layers to be combined into a single virtual filesystem.
Docker uses a cache mechanism to improve the performance of building images. If a cache hit occurs, Docker will use the cached layer instead of rebuilding the layer from scratch.
When you run the COPY instruction in a Dockerfile, you are copying files from the local filesystem of the host where the Dockerfile is being built to the filesystem of the Docker image being built. This does not involve copying the files between different storage locations.
Once the Docker image is built, you can use the docker run command to create a container from the image and run it. This will start a new process in the container and mount the filesystem of the Docker image as the root filesystem of the container.
In a Dockerfile, the CMD, RUN, and ENTRYPOINT instructions are used to specify commands that will be executed when a Docker container is run.
The CMD instruction is used to specify the default command that will be executed when a container is run. This command can be overridden when the container is started.
The RUN instruction is used to execute commands during the build process of a Docker image.
The ENTRYPOINT instruction is used to specify the command that will be executed when a container is run. Unlike the CMD instruction, the ENTRYPOINT command cannot be overridden when the container is started.
The CMD and ENTRYPOINT instructions can be used together to specify both the default command to run and the command that will be executed when the container is run.