For the sake of curiosity, I wanted to open a discussion if Python cache files, e.g. *.pyc, should be preferred in a Docker context.
From what I have read online, the rationale behind avoiding them, is that the time used to generate them in runtime is smaller that the size/time overhead they add when deploying the docker context to execute the program. Are there any production reference that I can read to evaluate whether this proposition if true or not in my case?
In my virtual enviroment (${VIRTUAL_ENV}), the size of the python cache files is of the order of 100MB: find ${VIRTUAL_ENV} -depth -type f -name '*.pyc' -exec du -ch {} + | grep total
Assuming that I wish to avoid *.pyc files when building the image, what is the correct way to handle them?
Avoiding creating them. What is the correct way using pip?
Deleting them after creation, running something like: find ${VIRTUAL_ENV} -depth -type f -a -name "*.pcy" -delete
Reference, the official Python docker image does delete them, see for instance lines L82-L87:
Assuming that I wish to avoid *.pyc files when running the image (the context of the Docker container is not saved anyway), what is the correct way to handle them?
--no-compile controls whether pip generates bytecodes on installation, and PYTHONDONTWRITEBYTECODE controls whether the bytecode is written onto disk when the code is run by python. Since the goal here is to reduce image size, you probably don’t want to set the environment variable (since the bytecode being written to disk at runtime is fine and may still be a performance gain), only the pip flag.
I’m thinking about setting PYTHONDONTWRITEBYTECODE=1 as the storage is not persistent with the images, I would write to the disk and not exploting it later, would not I? If I don’t write the bytecode, is it allocated into the RAM?
It does not cache the bytecode specifically, but the underlying structures the bytecode is designed to describe. The bytecode is specifically so things load faster at startup; once the thing is loaded for the first time into memory, the performance is the same.