Build Images¶
This document describes how Read the Docs uses the Docker Images and how they are named. Besides, it proposes a path forward about a new way to create and name our Docker build images to allow sharing as many image layers as possible and support installation of OS level packages as well as extra requirements.
Introduction¶
We use Docker images to build user’s documentation. Each time a build is triggered, one of our VMs picks the task and go through different steps:
run some application code to spin up a Docker image into a container
execute
gitinside the container to clone the repositoryanalyze and parse files (
.readthedocs.yaml) from the repository outside the containerspin up a new Docker container based on the config file
create the environment and install docs’ dependencies inside the container
execute build commands inside the container
push the output generated by build commands to the storage
All those steps depends on specific commands versions: git, python, virtualenv, conda, etc.
Currently, we are pinning only a few of them in our Docker images and that have caused issues
when re-deploying these images with bugfixes: the images are not reproducible in time.
Note
The reproducibility of the images will be better once these PRs are merged, but OS packages still won’t be 100% the exact same versions.
To allow users to pin the image we ended up exposing three images: stable, latest and testing.
With that naming, we were able to bugfix issues and add more features
on each image without asking the users to change the image selected in their config file.
Then, when a completely different image appeared and after testing testing image enough,
we discarded stable, old latest became the new stable and old testing became the new latest.
This produced issues to people pinning their images to any of these names because after this change,
we changed all the images for all the users and many build issues arrised!
Goals¶
release completely new Docker images without forcing users to change their pinned image
allow users to stick with an image “forever” (~years)
use a
baseimage with the dependencies that don’t change frequently (OS and base requirements)baseimage naming is tied to the OS version (e.g. Ubuntu LTS)allow us to add/update a Python version without affecting the
baseimagereduce size on builder VM disks by sharing Docker image layers
allow users to specify extra dependencies (apt packages, node, rust, etc)
automatically build & push all images on commit
deprecate
stable,latestandtestingnew images won’t contain old/deprecated OS (eg. Ubuntu 18) and Python versions (eg. 3.5, miniconda2)
Non goals¶
allow creation/usage of custom Docker images
allow to execute arbitraty commands via hooks (eg.
pre_build)
New build image structure¶
ubuntu20-baselabels
environment variables
system dependencies
install requirements
LaTeX dependencies (for PDF generation)
other languages version managers (
pyenv,nodenv, etc)UID and GID
The following images all are based on ubuntu20-base:
ubuntu20-py*Python version installed via
pyenvdefault Python packages (pinned versions) * pip * setuptools * virtualenv
labels
ubuntu20-conda*same as
-py*versionsConda version installed via
pyenvmambaexecutable (installed viaconda)
Note that all these images only need to run pyenv install ${PYTHON_VERSION}
to install a specific Python/Conda version.
Specifying extra user’s dependencies¶
Different users may have different requirements. We were already requested to install
swig, imagemagick, libmysqlclient-dev, lmod, rust, poppler-utils, etc.
People with specific dependencies will be able to install them as APT packages or as extras
using .readthedocs.yaml config file. Example:
build:
image: ubuntu20
python: 3.9
system_packages:
- swig
- imagemagick
extras:
- node==14
- rust==1.46
Important highlights:
users won’t be able to use custom Ubuntu PPAs to install packages
all APT packages installed will be from official Ubuntu repositories
not specifying
build.imagewill pick the latest OS image availablenot specifying
build.pythonwill pick the latest Python version availableUbuntu 18 will still be available via
stableandlatestimagesall
node(major) pre-compiled versions onnodenvare available to selectall
rust(minor) pre-compiled versions onrustupare available to selectknowing exactly what packages users are installing, could allow us to prebuild extra images:
ubuntu20-py37+node14
Implementation
We talked about using a Dockerfile.custom and build it on every build.
However, at this point it requires extra work to change our build pipeline.
We decided to install OS packages from the application itself for now using
Docker API to call docker exec as root user.
This reduces the amount of work required but also allows us to add this feature
to our current existing images (they require a rebuild to add nodenv and rustup)
Updating versions over time¶
How do we add/upgrade a Python version?¶
Python patch versions can be upgraded on the affected image.
As the base image won’t change for this case, it will only modify the layers after it.
All the OS package versions will remain the same.
In case we need to add a new Python version, we just need to build a new image based on base:
ubuntu20-py310 that will contain Python 3.10 and none of the other images are affected.
This also allow us to test new Python (eg. 3.11rc1) versions without breaking people’s builds.
How do we upgrade system versions?¶
We usually don’t upgrade these dependencies unless we upgrade the Ubuntu version. So, they will be only upgraded when we go from Ubuntu 18.04 LTS to Ubuntu 20.04 LTS for example.
Examples of these versions are:
doxygen
git
subversion
pandoc
swig
latex
This case will introduce a new base image. Example, ubuntu22-base in 2022.
Note that these images will be completely isolated from the rest and don’t require them to rebuild.
This also allow us to test new Ubuntu versions without breaking people’s builds.
How do we add an extra requirement?¶
In case we need to add an extra requirement to the base image,
we will need to rebuild all of them.
The new image may have different package versions since there may be updates on the Ubuntu repositories.
This conveys some small risk here, but in general we shouldn’t require to add packages to the base images.
Users with specific requirements could use build.system_packages and/or build.extras in the config file.
How do we remove an old Python version?¶
At some point an old version of Python will be deprecated (eg. 3.4) and will be removed.
To achieve this, we can just remove the Docker image affected: ubuntu20-py34,
once there are no users depending on it anymore.
We will know which projects are using these images because they are pinning it in the config file. We could show a message in the build output page and also send them an email with the EOL date for this image.
Deprecation plan¶
It seems we have ~50Gb free on builders disks. Considering that the new images will be sized approximately (built locally as test):
ubuntu20-base: ~5Gbubuntu20-py27: ~150Mbubuntu20-py36: ~210Mbubuntu20-py39: ~20Mbubuntu20-conda47: ~713Mb
which is about ~6Gb in total, we still have plenty of space.
We could keep stable, latest and testing for some time without worry too much.
New projects shouldn’t be able to select these images and they will be forced to use ubuntu20
if they don’t specify one.
We may want to keep the two latest Ubuntu LTS releases available in production. At the moment of writing this they are:
Ubuntu 18.04 LTS (our
stable,latestandtestingimages)Ubuntu 20.04 LTS (our new
ubuntu20)
Once Ubuntu 22.04 LTS is released, we should deprecate Ubuntu 18.04 LTS, and give users 6 months to migrate to a newer image.
Work required¶
There are a lot of work to do here. However, we want to prioritize it based on users’ impact.
allow users to install packages with APT
update config file to support
build.system_packagesconfigmodify builder code to run
apt-get installasrootuser
allow users to install extras via config file
update config file to support
build.extrasconfigmodify builder code to run
nodenv install/rustup installre-build our current images with pre-installed nodenv and rustup
make sure that all the versions are the same we have in production
deploy builders with newer images
pre-build commands (not covered in this document)
new structure
update config file to support new image names for
build.imageautomate Docker image building
deploy builders with newer images
Conclusion¶
There is no need to differentiate the images by its state (stable, latest, testing) but by its main base differences: OS and Python version. The version of the OS will change many library versions, LaTeX dependencies, basic required commands like git and more, that doesn’t seem to be useful to have the same OS version with different states.
Allowing users to install system dependencies and extras will cover most of the support requests we have had in the past. It also will allow us to know more about how our users are using the platform to make future decisions based on this data. Exposing users how we want them to use our platform will allow us to be able to maintain it longer, than giving them totally freedom on the Docker image.