Projects that aren't meant to generate a wheel and `pyproject.toml`

pf_moore · October 24, 2023, 5:49pm

On the assumption that with the provisonal acceptance of PEP 723, we’re going to revisit this discussion, can I suggest that we start by thinking about what workflows we want to support here. I can see the following potential arrangements of “a directory with a pyproject.toml and some Python code” that could be candidates, and I think it’s worth being very explicit about which we support. Ideally, we could also do with agreeing some terminology - “project” is over-used, “application” seems to mean different things to different people, and “project that will be built into a wheel” is clumsy…

A directory that will be built into a single wheel. That wheel may be intended as a library, or as an executable application that is run via an entry point and installed using something like pipx, but either way the key is that there is a build process that generates a wheel.
A directory that contains the source code for a standalone application. This is also built, but not using a typical build backend, and the resulting distributable artefact is a runnable binary of some sort. GUI applications often work like this, and CLI programs would probably work well this way, but the overheads involved often mean developers prefer to distribute CLI applications as wheels (see previous case).
A directory that contains an application that is run “in place”, typically using some form of management script. Web applications often take this form.
A directory where the developer executes many inter-related tasks, all aimed at one fundamental goal, but often independent. Some tasks may not even be written in Python, and the tasks may have very little in common beyond the overall goal (data acquisition, cleansing, analysis and reporting tasks are very different, for example). Data science projects often take this form.
A directory containing scripts all working on the same environment, but often with very little else in common. This may be the same as the previous case, although the lack of a unified goal may be significant. Automation projects often work like this.
A monorepo project, containing multiple sub-projects. I don’t know much about this workflow, except that such projects seem to have trouble even with the existing capabilities of pyproject.toml.

There are probably other common workflows (scientific analysis may well be another - I’m assuming it’s similar to data science, but my experience of scientific programming is very limited, so that’s a guess).

Key features of these different project types that we need to consider include the following:

Some types want fully pinned dependencies, where being able to reproduce one or more exact runtime environments is critical. Other types want broad dependencies, to work in as many environments as possible. Pinning is part of the build (or project management) step for the former, and the install step for the latter (at least in types that have build or install steps). Some types may not particularly care - reproducibility is not always a key feature in every project.
Some types have one clear “thing” that is executed to use the project. Others have multiple scripts, commands and processes.
Some projects may have a single license/readme/maintainer, others may not.

We may or may not choose to support all of these project types with pyproject.toml. I don’t think it’s necessarily a given that it’s a suitable solution for everything. But we do need to be clear what our target use cases are, and right now, I get the feeling that a lot of the confusion in the discussion comes from people talking from the perspective of different (and possibly even incompatible) use cases, without realising it.