Python *.pyc files in a Docker image

For the sake of curiosity, I wanted to open a discussion if Python cache files, e.g. *.pyc, should be preferred in a Docker context.

From what I have read online, the rationale behind avoiding them, is that the time used to generate them in runtime is smaller that the size/time overhead they add when deploying the docker context to execute the program. Are there any production reference that I can read to evaluate whether this proposition if true or not in my case?

In my virtual enviroment (${VIRTUAL_ENV}), the size of the python cache files is of the order of 100MB:
find ${VIRTUAL_ENV} -depth -type f -name '*.pyc' -exec du -ch {} + | grep total

Assuming that I wish to avoid *.pyc files when building the image, what is the correct way to handle them?

  1. Avoiding creating them. What is the correct way using pip?

  2. Deleting them after creation, running something like:
    find ${VIRTUAL_ENV} -depth -type f -a -name "*.pcy" -delete

Reference, the official Python docker image does delete them, see for instance lines L82-L87:

Assuming that I wish to avoid *.pyc files when running the image (the context of the Docker container is not saved anyway), what is the correct way to handle them?

I should use the enviroment variable ENV PYTHONDONTWRITEBYTECODE=1 in the docker image , right?See here: 1. Command line and environment — Python 3.11.3 documentation

1 Like

I found out that I can use the --no-compile cmd option of pip, see here: pip install - pip documentation v23.1.2 (pypa.io)

What is the difference in using it with the Python enviromental variable PYTHONDONTWRITEBYTECODE?

For instance:

ENV PYTHONDONTWRITEBYTECODE=1
RUN python -m pip install mypackage

versus

RUN python -m pip install --no-compile mypackage

--no-compile controls whether pip generates bytecodes on installation, and PYTHONDONTWRITEBYTECODE controls whether the bytecode is written onto disk when the code is run by python. Since the goal here is to reduce image size, you probably don’t want to set the environment variable (since the bytecode being written to disk at runtime is fine and may still be a performance gain), only the pip flag.

1 Like

I’m thinking about setting PYTHONDONTWRITEBYTECODE=1 as the storage is not persistent with the images, I would write to the disk and not exploting it later, would not I? If I don’t write the bytecode, is it allocated into the RAM?

1 Like

It’s not in RAM but on disk. I don’t follow the rest of your comment, sorry. Whether that affects what you want to do is up to you.

What I meant is: assuming PYTHONDONTWRITEBYTECODE=1 is set, (no .py[co] files are witten to the disk), does Python “cache” bytecode into RAM?

In other words, where can I read more about the internal usage of this “cache” files in python with respect to run-time performances? Thank you.

It does not cache the bytecode specifically, but the underlying structures the bytecode is designed to describe. The bytecode is specifically so things load faster at startup; once the thing is loaded for the first time into memory, the performance is the same.

2 Likes