Every “works on my machine” ML repo eventually bites you. This is my baseline for keeping experiments reproducible without a 9 GB image.

# syntax=docker/dockerfile:1
FROM python:3.12-slim
ENV PIP_NO_CACHE_DIR=1 PYTHONUNBUFFERED=1
WORKDIR /work
COPY requirements.txt .
RUN --mount=type=cache,target=/root/.cache/pip pip install -r requirements.txt
COPY . .
ENTRYPOINT ["python", "train.py"]

Pin everything in requirements.txt with hashes (pip-compile --generate-hashes). Then mount data, never bake it:

docker run --rm --gpus all \
  -v $PWD/data:/work/data:ro \
  -v $PWD/runs:/work/runs \
  ml-exp:latest --epochs 30 --bs 256

The cache mount keeps rebuilds fast; the read-only data mount keeps a careless script from corrupting your dataset at 3 a.m. The runs mount means checkpoints survive the container. Boring, but I haven’t lost a run to environment drift since I started doing this.

One more habit: record the image digest and the git SHA into the run directory at startup. When a result looks wrong three weeks later, you can rebuild the exact environment that produced it.