Does Python have two major problems? (Deployment and libraries)

tl;dr

In the below Detail I discuss what I perceive to be two issues with Python. Both issues relate to the fact that venv (or some other virtual environment management system) appear to be necessary to perform both deployment of an application and shared library creation during development.

The end result of what I propose below I believe would remove the requirement for virtual environments except if other external libraries are required during the development process. I do not propose removing it completely during the development process, because I see it has utility in managing external libraries and dependencies.

Detail

I want to put two ideas out there which I think could solve what I see as the two most significant issues with Python at the present time.

Those issues are:

  1. Python does not support libraries containing shared code (see below)
  2. There is no convenient and standardized way to deploy a Python application. This is related to the above point

These issues are related to the fact that one might attempt to structure a the source tree for a Python project in the following way:

python-project/
  bin/
    application1.py
    application2.py
  src/
    lib_example/
      __init__.py

In this example structure, application1.py and application2.py depend on some common library code which is deferred to a directory called src.

This is a fairly standard project structure. I don’t know of a more logical way to structure any software project, regardless of whether the development language is Python or something else.

As you may already know, the above project structure will not actually work. The reason is because src by default is not part of PYTHONPATH, so the interpreter cannot find the code which lives there.

The solution to this is usually to use venv, although there are other possibilities.

I want to propose the following two arguments:

  • venv, and virtual environments in general, should be used to manage external packages, installed usually either via a wheel or a source distribution
  • venv should not be required for your Python application within a project to find the source code that it depends on if that source code is part of the same project

At the moment, the system, at a philosophical level, works like this:

  1. You build your library code
  2. You install your built library code into the local virtual environment
  3. You can now run your Python application from within the local virtual environment

Which doesn’t make a lot of sense. Why should you have to build your libraries and in particular install them for those libraries to be usable?

You could argue that Python just works that way and it’s just different to how other languages tend to do things.

That’s fair enough, but the reason I raise this is I think there is a simpler solution.

It would work something like this:

  • There would need to be a way to declare that an executable Python script is part of a “project”. A “project” is one or more executable Python scripts which depend on some code which is deferred to a library.
  • This could be a line at the start of the file, or it could be a file in the same directory containing some details.

For example:

#!/usr/bin/env python3

dependencies project.toml

def main():
    print('hello world')

if __name__ == '__main__':
    main()
# project.toml (in `bin` directory)

dependencies ../project.toml
# project.toml (in `python-project` dir)

libs src # declares a dependency of type libs which exists in `./src`

By using a series of such files, or some other mechanism, it would be possible to tell the Python interpreter where all the relative paths containing code are. There could be other possible ways of doing this, this is just one possible idea for how it might be implemented

All of the above is related to point (1). I will now address (2):

I have been trying to find information about how to “deploy” a Python application, and I haven’t found anything particularly compelling.

Let me summarize some options.

Docker

  • Build a custom Docker Container. Write a Dockerfile which will copy the important parts of your application (code) to the Container and set the entry point to run one Python script. You need one Docker Container per executable script, because each Container can only have one entry point. This does not require a venv in the Container because you can install your libraries “system wide” within the Container, and the Container provides the environment isolation.
  • But: Do you really want to depend on Docker just to deploy a Python application to some local system. Are you going to be able to manage it with systemd? Probably not easily. You’re basically stuck doing process management with Docker. That might be ok if you want to use Docker to distribute your app to a cluster. Otherwise, you have the disadvantage of needing to move around a whole Docker image, which is a pretty large object (100s MB?) compared to your Python code which is probably a few 10 kB, at most.

Others

  • I don’t really know what other alternatives there are. Copy your entire project directory to /opt or something?
  • Perhaps you can do something slightly better by writing a script which puts your code into a tar.gz which you can then move to some target location, untar it and have the code setup in the correct place. You still can’t run that code, because you then need to setup a venv in that location.

It’s possible I’m missing something which should be obvious. I’m sure someone will tell me if I am.

Just imagine this is the objective we want to achieve:

We have finished working on our project in /home/dave/python-project and we’re ready to call this a “version 1” and we want to deploy it.

“Deploy” for us just means moving it to somewhere like /opt/ and writing some service files for systemd which point at the right Python interpreters and executable Python scripts such that we can manage the processes with systemd. Sure, if you are on some other OS you will not be using systemd but I’m sure there’s going to be some equivalent system which you could apply this discussion to.

But even after moving the code to /opt we are not done. We need to now create a venv in a nearby location and “install” whatever code is part of our “shared library” (in src) into the venv.

This is why I say there is no convenient and standardized way to deploy a Python application. Because really, what is this?

Why can’t we instead run some standard tool - call it pybuild - and have a single executable Python script come out (if nothing else) which contains all the code for the executable and local libraries.

In other words, why don’t we have a standard tool which just bundles all the code from application1.py together with the code in src in some kind of rudimentary “build” system?

We would then have just one file which could be run by a Python interpreter. Copy it to /opt and you’re done. No venv required.

I am not sure whether or not this idea would work if parts of those libraries are written in C like numpy. Someone who knows more about the internal workings of Python can perhaps comment on this.

As a final general point, I would be interested to hear about how others are managing this kind of deployment process.

  • Are you doing it the manual build script + venv way?
  • Do you resort to Docker?
  • Do you use some other tool or method which I’m not currently aware of?

I would have said that venv is a standardized and convenient way to deploy a Python application, so that would appear to be a matter of personal opinion. Depending how you structure your source tree, there will not be any problems with python finding where to load your shared code from. Python projects are deployed in many different ways, so I would not recommend assuming that your deployment scenario is common.

You can very easily adjust sys.path in your main application code, so unless there are other complexities you’re not explaining, I don’t think this is a fair description of the situation.

Yes, there are issues with the standard practices for deploying Python applications, but this isn’t one of them (IMO).

You don’t have to put your code in the src directory. The primary purpose of putting it there is precisely so that it is not on sys.path. If that is not what you want then just put your modules or packages at top level.

You might be interested in zipapp.

I should have given more detail on this - what you suggest will work, up to a point.

As soon as you have a more complex project, it breaks:

python-project/
  bin/
    thing-a/
      process-a-1.py
      process-a-2.py
    thing-b/
      process-b-1.py
      process-b-2.py

If thing-a and thing-b have common library code, this no longer works. You need some other location to put the shared stuff.

Also, do you really want to mix your modules and packages mixed in with your executables in the same directory? Probably not. It’s not going to scale very well.

If you approach the discussion with this attitude, it is guaranteed to go nowhere. A little humility, please.

4 Likes

It seems that the problems described are caused by a directory structure that is unpythonic.

1 Like

Moving this to the Help category as it appears to be a rant about problems the author is having rather than a constructive and actionable idea. Additionally, I’ve temporarily silenced the author for rude behavior directed at others. Remember that participating in any Python space is bound by the Code of Conduct Python Software Foundation Code of Conduct - Python Software Foundation Policies as a minimum baseline. Strive to behave politely and professionally beyond what the CoC requires.

6 Likes

The current state of Python packaging is not great at handling applications. It is more geared towards dealing with libraries. It is a known point, there is not much controversy about it.

But there are solutions. You could look at this “Overview of Python Packaging” document, for some pointers. I have not read it myself in a while, and I have no doubt that it is outdated, some new developments are probably missing. As far as I can tell most of the work regarding applications is slightly outside of the realm of PyPA and its PEPs.

Some pointers from me:

There is a whole range of solutions, more or less involved, for all kinds of use cases.

1 Like

Try to make your grandmother install and use your pyqt application using venv. I’d say that “convenient” is not the right adjective to describe deploying with venv.

1 Like

Is this one of those situations where Python does something less well than some other tool and therefore we conclude the Python way of doing something is “pythonic” so that we don’t have to say “it’s less good”?

These kind of comments are really unhelpful.

What does a pythonic directory structure look like? Can you show us what you would do, please? At least then we have something concrete to look at.

1 Like

Just to give another, general comment, relating to this reply.

I don’t take much issue with having to use something like a venv for deployment. I’m not actually against using it.

The point of my post really was to say - “ok there are some systems which by design want you to use venv to install Python packages”. Fair enough. Debian is one such example. If we are strongly directed towards using it, so be it.

The question is really more about local libraries. Should we really be installing these using the same venv? We’re not really installing anything at all, just providing a way to reference those files containing library code from some other directory where we run some Python executable script.

It might be there is a very compelling reason why this should be done.

I did read that “Overview of Python Packaging” document. I don’t really understand what value it has to offer. What should I conclude from reading it, for example?

Finally, let me ask one specific question about deploying with venv.

Let’s say you copy the whole project directory to some target machine. You want to setup a systemd service to be run and manage the processes. Normally, a systemd service definition file has a ExecStart= line. This points to some executable. It could be a bash script, a compiled binary, or a python script.

My question: How do we get that ExecStart to enable the virtual environment before running a python executable. The only way I can see to do it is to write a bash script for each executable, which causes it to run.

Then, the question is how to ExecStop? I don’t have any ideas about this.

Use the Python binary from inside the venv. While running in the venv. sys.executable is that binary, so it’s trivially easy to make an installer script that uses that to build the service file. This also takes care of other ways you could have multiple installations of Python.

This is nothing new or surprising. It’s basics of having multiple interpreters installed.

Would that be without activating the virtual environment via the activate file?

Yes. Try it.

The only thing that “activating” a venv does is to:

  • remember old environment variable state to undo it later
  • reconfigure the prompt via whichever environment variable that was that I forgot (irrelevant to making the code work)
  • add the executable to PATH
  • set up a deactivate command

So all you need is the ability to /path/to/venv/bin/python entry_point.py etc. In the rare case that the script in turn shells out to Python again and thus cares about the PATH setting, PATH=/path/to/venv/bin:. python entry_point.py.