Let me start by saying that docker does a lot of things right. It increased the portability of software, to my knowledge greatly increases the security/sandboxing of software running inside, and is fairly lightweight when compared to a full fledged VM. I would argue however that it does a poor job of offering reproducibility, and as such many docker images in the wild are not actually repeatable. At least when considering the full software development lifecycle.
To get to why that is, let’s first talk about what docker is trying to do conceptually. Which is to emulate an operating system. It models this problem as layers, both the steps and the images themselves. And this makes sense! Using this model we can compose the layers together to build new layers!
What is the problem that most software developers are using docker to solve though? I’d wager most of them aren’t building operating systems, I’m sure not! Instead most of us are building packages. Packages that we distribute and run inside of docker containers.
So let’s examine packages for a second then. A package is an executable
program or library that depends on
0..n other packages. If we think
about how to model that problem, the first thing that springs to mind
for me is a tree. Each node (package) can have
So the model used by docker to represent an OS is layers, and the
obvious model for packages is a tree. These are very different models!
It stands to reason that docker may not be very good at representing
packages. This becomes pretty clear when we consider that most docker
images leverage the package managers that accompany the given stack, and
rely on them for reproducibility. Let’s pick on node, because it’s
fairly common. The node package manager (
npm) uses a lockfile to
ensure that it accurately reproduces the set of dependencies the package
depends on. However, this only accounts for dependencies that are also
node packages. It’s not uncommon to also depend on system packages,
which are often installed using the operating systems package manager.
So now we have the OS, which is provided by the base image, system packages, which are provided through the OS package manager, package dependencies, which are provided through the package-specific package manager, and code, which is provided by us. To truly achieve reproducibility throughout the software development lifecycle, we need to be able to change one of these without affecting the other three. And yet what I often see in docker images is a step where we update and install packages through the OS package manager. This violates the previously mentioned principle, as now we cannot update code, or install a package without also changing the system dependencies. To avoid this we could just not update system packages, but we also don’t control the base image! It could change! Not to mention outdated system packages poses a real security risk.
To solve this with docker alone, we would need to create our own base image that we control, update and install our system packages, upload this image to a registry and use it as the base image instead. That sounds like a lot of work, and it is! That said, it may be better than the alternative; Finding yourself unable to deliver code because you have no way of reproducing the state of the operating system your code was running on! And yes, this does actually happen, and has happened to me.
What is the real problem here though? Docker is delegating package management to package managers that weren’t designed to accurately reproduce a dependency tree! Operating system package managers weren’t designed for that purpose! What if we used a package manager that was designed for this purpose? One with a lockfile? That integrated with docker nicely? Supported multiple distros? Modelled packages as a tree? Maybe even managed our packages dependencies too?
nix does all this and more. Which is not to say it doesn’t have it’s rough edges, but it is a fantastic tool to address this problem. You also get cacheable builds, so your CI pipelines could avoid building the package multiple times, as well as a development environment that matches production but runs directly on your machine. It’s well worth checking out.