tl;dr
In the below Detail I discuss what I perceive to be two issues with Python. Both issues relate to the fact that venv (or some other virtual environment management system) appear to be necessary to perform both deployment of an application and shared library creation during development.
The end result of what I propose below I believe would remove the requirement for virtual environments except if other external libraries are required during the development process. I do not propose removing it completely during the development process, because I see it has utility in managing external libraries and dependencies.
Detail
I want to put two ideas out there which I think could solve what I see as the two most significant issues with Python at the present time.
Those issues are:
- Python does not support libraries containing shared code (see below)
- There is no convenient and standardized way to deploy a Python application. This is related to the above point
These issues are related to the fact that one might attempt to structure a the source tree for a Python project in the following way:
python-project/
bin/
application1.py
application2.py
src/
lib_example/
__init__.py
In this example structure, application1.py and application2.py depend on some common library code which is deferred to a directory called src.
This is a fairly standard project structure. I don’t know of a more logical way to structure any software project, regardless of whether the development language is Python or something else.
As you may already know, the above project structure will not actually work. The reason is because src by default is not part of PYTHONPATH, so the interpreter cannot find the code which lives there.
The solution to this is usually to use venv, although there are other possibilities.
I want to propose the following two arguments:
venv, and virtual environments in general, should be used to manage external packages, installed usually either via a wheel or a source distributionvenvshould not be required for your Python application within a project to find the source code that it depends on if that source code is part of the same project
At the moment, the system, at a philosophical level, works like this:
- You build your library code
- You install your built library code into the local virtual environment
- You can now run your Python application from within the local virtual environment
Which doesn’t make a lot of sense. Why should you have to build your libraries and in particular install them for those libraries to be usable?
You could argue that Python just works that way and it’s just different to how other languages tend to do things.
That’s fair enough, but the reason I raise this is I think there is a simpler solution.
It would work something like this:
- There would need to be a way to declare that an executable Python script is part of a “project”. A “project” is one or more executable Python scripts which depend on some code which is deferred to a library.
- This could be a line at the start of the file, or it could be a file in the same directory containing some details.
For example:
#!/usr/bin/env python3
dependencies project.toml
def main():
print('hello world')
if __name__ == '__main__':
main()
# project.toml (in `bin` directory)
dependencies ../project.toml
# project.toml (in `python-project` dir)
libs src # declares a dependency of type libs which exists in `./src`
By using a series of such files, or some other mechanism, it would be possible to tell the Python interpreter where all the relative paths containing code are. There could be other possible ways of doing this, this is just one possible idea for how it might be implemented
All of the above is related to point (1). I will now address (2):
I have been trying to find information about how to “deploy” a Python application, and I haven’t found anything particularly compelling.
Let me summarize some options.
Docker
- Build a custom Docker Container. Write a Dockerfile which will copy the important parts of your application (code) to the Container and set the entry point to run one Python script. You need one Docker Container per executable script, because each Container can only have one entry point. This does not require a venv in the Container because you can install your libraries “system wide” within the Container, and the Container provides the environment isolation.
- But: Do you really want to depend on Docker just to deploy a Python application to some local system. Are you going to be able to manage it with systemd? Probably not easily. You’re basically stuck doing process management with Docker. That might be ok if you want to use Docker to distribute your app to a cluster. Otherwise, you have the disadvantage of needing to move around a whole Docker image, which is a pretty large object (100s MB?) compared to your Python code which is probably a few 10 kB, at most.
Others
- I don’t really know what other alternatives there are. Copy your entire project directory to
/optor something? - Perhaps you can do something slightly better by writing a script which puts your code into a tar.gz which you can then move to some target location, untar it and have the code setup in the correct place. You still can’t run that code, because you then need to setup a venv in that location.
It’s possible I’m missing something which should be obvious. I’m sure someone will tell me if I am.
Just imagine this is the objective we want to achieve:
We have finished working on our project in /home/dave/python-project and we’re ready to call this a “version 1” and we want to deploy it.
“Deploy” for us just means moving it to somewhere like /opt/ and writing some service files for systemd which point at the right Python interpreters and executable Python scripts such that we can manage the processes with systemd. Sure, if you are on some other OS you will not be using systemd but I’m sure there’s going to be some equivalent system which you could apply this discussion to.
But even after moving the code to /opt we are not done. We need to now create a venv in a nearby location and “install” whatever code is part of our “shared library” (in src) into the venv.
This is why I say there is no convenient and standardized way to deploy a Python application. Because really, what is this?
Why can’t we instead run some standard tool - call it pybuild - and have a single executable Python script come out (if nothing else) which contains all the code for the executable and local libraries.
In other words, why don’t we have a standard tool which just bundles all the code from application1.py together with the code in src in some kind of rudimentary “build” system?
We would then have just one file which could be run by a Python interpreter. Copy it to /opt and you’re done. No venv required.
I am not sure whether or not this idea would work if parts of those libraries are written in C like numpy. Someone who knows more about the internal workings of Python can perhaps comment on this.
As a final general point, I would be interested to hear about how others are managing this kind of deployment process.
- Are you doing it the manual build script + venv way?
- Do you resort to Docker?
- Do you use some other tool or method which I’m not currently aware of?