Projects that aren't meant to generate a wheel and `pyproject.toml`

pradyunsg · July 14, 2023, 8:45pm

And… I think using project.dependencies is OK even in projects that don’t intend to generate a wheel. Even for projects that aren’t intended to be wheels, specifying this information in this form seems like a good idea to me.

That said, PEP 621 states that name and version are required. Further, given the lack of pip install --only-deps at the moment, there isn’t really any way for users to only-install-dependencies when specified in this manner.

Assuming that feature happens^[1], do folks have any concerns with having users use pyproject.toml even for projects that they never intend to package into a wheel?

Ignoring the details of whether it involves merely parsing pyproject.toml, or generating metadata for the package. ↩︎

pf_moore · July 14, 2023, 9:03pm

I would personally hate being expected to specify a version for an adhoc script.

Also, isn’t the set of dependencies for a standalone app/script actually (the input for) a lockfile? So shouldn’t we be thinking about this in terms of how we want to structure the input to a lockfile solution (if/when we actually get one)?

I’m not averse to a proposal to adapt PEP 621 to cover the “standalone script/application” use case, but IMO it would be a change and as such would probably need a new PEP to cover it.

brettcannon · July 14, 2023, 10:44pm

Usually, if someone is bothering with a lock file.

Sure, but we all know how lock files went the time we attempted that. Does that mean you think having pyproject.toml work for this use case is a half measure and you would rather try and tackle the bigger lock file discussion again? Because if “input to a lockfile solution” to you is more than listing the top-level dependencies then we have to have a much broader discussion of what lock files are meant to encode to know what should be written down, what’s input e.g. on the command-line, etc.

abravalheri · July 15, 2023, 12:49am

I am not sure if I understood the question (so please correct me if I am wrong), but I am against flexibilising pyproject.toml for things that are not supposed to be installed as “distributions” (and therefore eventually be turned into wheel even if they are only made available as sdists or source trees).

The way things work right now, the existence of pyproject.toml (specially if it contains a project section) is a reasonable indicator that the user intends the source code to be installed as a distribution. That is the primary reason for the existence of the file (it was created with that sinlge objective and only later flexibilised to accept other stuff under tool) and it also can be used for tools to detect the context in which they should treat the working directory.

I don’t think there is a need for having a single file being used to configure everything in the Python ecosystem. If other tools think that the structure given by PEP 621 is good for things other than packaging, they can simply copy the same structure in a file that is not called pyproject.toml. It would be better in terms of separation of concerns, wouldn’t it?

pradyunsg · July 15, 2023, 12:51am

I disagree – storing configuration for black/ruff etc is not a reasonable indicator that code contained in that directory is intended to be installed as a distribution.

The project section, yes, is currently only usable with distribution; but it’s certainly not an indicator without that.

abravalheri · July 15, 2023, 12:55am

That why I think that making PEP 518 flexible was a bit problematic in the first place, but that ship already sailed… So my concern is not to make things even worse.

jamestwebber · July 15, 2023, 1:14am

Just as a data point: I do a ton of “standalone project” stuff in the context of scientific analyses, and the PEP 621 structure works pretty well for me. I don’t personally need an --only-deps flag, because I still need to install my own project code. My workflow is:

Create a repo for a project, with pyproject.toml and requirements.txt
Write project-specific code as a package and upload to Github for sharing
For analysis I’ll clone the repo^[1], use pip install -e to install my code, and perform data analysis in jupyter notebooks, importing the code in the notebook as needed

Maybe this structure is already a level of complexity beyond what’s under discussion. And maybe I do it this way as a result of how packages work now, and I’d do it differently if standalone scripts were supported more explicitly. I’m not sure.

I will say that this structure is way more organized than a lot of my colleagues . I guess I partially agree with @abravalheri in that there’s no need to make it easier to dump a bunch of spaghetti in a folder and call it a pyproject, but I do think that the use case of “organized code w/ project-like configuration, but not intended for a wheel” is pretty common in research. The reproducibility of the structure is a big feature there.

typically on a VM in the cloud ↩︎

ketozhang · July 15, 2023, 2:06am

I am in support of this. I’ve been using pyproject.toml for a few years now, and this is coming as a surprise to me that pyproject.toml was only mean for generating wheels.

Same as @jamestwebber, almost all my script directories (incl. DAGs and Flask projects) are converted to packages because it is much easier to work with.

This would remove the need of workarounds that is necessary when developing and using scripts without installing the script directory^[1]. Without the scripts installed, imports behave differently (e.g., relative imports breaks) needing to change sys.path or use python -m.

It’s package, but I have no intentions to create a dist ↩︎

ofek · July 15, 2023, 2:55am

This is useful sometimes. For example, we have a large Python monorepo that defines many CLI scripts/entry points that can be used once the project is installed in the virtual environment.

pradyunsg · July 15, 2023, 8:14am

What do you consider a PEP-worthy change here?

Taking a step back from the “what we intended” and looking at what functional pieces we have right now, it’s unclear to me what a PEP would propose changing. I expect that we’re not changing PEP 621 and the only piece changing is a pip feature for unrelated reasons.

You and I share that but, as you say, the ship sailed on that one.

Worse in what manner?

The concern was (as I understand it) that pyproject.toml indicates that code which will be installed as a distribution. That isn’t the case and it’s not a packaging-exclusive marker file today; and it hasn’t been since its introduction.

There are reasons that some folks don’t like that pip opts them into “pyproject.toml-based builds” which involve build-isolation if they have a pyproject.toml; and this is a big part of that.

Indeed. That is IMO a better framing for this – I have a bunch of scripts and related things in a folder, and this thing has external dependencies.

From a user’s perspective, this is a “Python project” even if it isn’t intended to be a redistributable Python package. Even if this project did create wheels, it’s unlikely that users would use those wheels beyond treating them as an intermidiary that they don’t care about which pulls in their dependencies.

Functionally, these users can (and do) rely on requirements.txt (and conda’s environment.yml) files. We don’t really have a tool-agnostic way to do this though.

pf_moore · July 15, 2023, 10:05am

Not precisely. It means that I think using the existing [project.dependencies] field locks us into the PEP 621 metadata in a way that might not be appropriate for all use cases of lockfiles (which I see as basically the same as the use cases for this proposal). I’m not suggesting we have another go at lockfiles, just that we look at this proposal in the context of what people want lockfiles for.

Specifically, making name and/or version optional. Conversely, if name and version are not optional, I don’t understand how the motivating VS Code feature will work - I certainly don’t want to be asked to specify a name and version every time I start a new project, and I’d be very unhappy if VS Code put some sort of “dummy” value in there - that would be just as bad as setuptools’ current behaviour of using UNKNOWN-0.0.0.

Definitely. But there are also other use cases. For example, I have a folder of “adhoc work”, that contains a number of independent scripts. These each have their own dependencies, and have no relationship to each other, apart from all being too small or temporary to warrant a “project directory”.

The point here is that while VS Code may want to ease beginners into a “directory per project” model, and may choose to only support that model, the Python packaging ecosystem doesn’t have the same freedom to declare existing workflows unsupported. So there’s a conflict of priorities, in that the VS Code motivating example has a limited set of use cases that matter, but the packaging standards can’t ignore the bigger picture.

Speaking very much as a user, I have things that I absolutely consider a “Python project” that are nothing more than a single script, or a single Jupyter notebook. These might well be stored in a directory that contains lots of other files and projects, none of which are in any way related to each other or to the Python project in question.

In terms of the title of this thread, a single script can very much be a “project that isn’t meant to generate a wheel”. And for such a script, there’s absolutely no way to associate a particular pyproject.toml with it. So I think that any solution to “how do we specify a project’s dependencies” must include some provision to explicitly name a project’s “dependency file”. That’s a benefit of the requirements.txt format that isn’t shared by pyproject.toml, for better or worse.

Agreed. And having such a thing would be good. But can we agree that any solution that mandates a specific named file would lose one important benefit of the requirements format, which is that the filename is up to the user? And if we don’t cover that, there will be people who need to continue using requirements files, because the new format doesn’t handle their needs.

One other issue I have, which is vaguely related to the “putting everything in pyproject.toml” comments, is that pyproject.toml is fundamentally a user-edited file. That’s not mandated by any spec, but PEP 518 listed “Reliable comments” and “Easy for humans to edit” as key factors in the choice of TOML as a format. If we allow project dependencies to be specified in pyproject.toml, we open up the possibility of tools expecting to write dependencies as a result of some sort of “add dependency” command, and IMO we need to be much more explicit that it’s not acceptable to lose comments or user formatting of pyproject.toml in the process. I have no idea if this is something that has been considered for the VS Code extension, so I don’t want to make it specifically about that particular use case, but given that format-preserving TOML editors aren’t common (neither tomli-w nor toml do this as far as I can see) it’s a non-trivial requirement.

abravalheri · July 15, 2023, 11:33am

Worse in the sense of not even being able to rely on the fact that a folder that has a pyproject.toml file which includes a project table is meant to go through the build process and (enventually) become a distribution.

By “eventually become a distribution”, I mean things in a broad sense: even if a project don’t want to distribute files via an index, if people still want to install it with pip install <source code path> or pip install -e, my view is that it is indeed intended to go through the build process and will generate a wheel (even if as transient file; a step towards installation). I believe that use cases similar to the one described by @jamestwebber fall in this category (unless I misunderstood that), and it seems to be inline with the practices we have in place nowadays.

Right now pyproject.toml’s project table is a way users have to communicate with the build backend (and as an optimisation, other tools can also look into it to extract metadata without having to go through the entire build process). I think we should not mix things, and leave this as it is. I am not against having a standard that fullfills the “improved requirements.txt” (or even better “improved requirements.in”) niche. I just don’t think we need to overload pyproject.toml with that.

Btw, project.dependencies may even not be the best tool for the job here. One of the most common questions we get all the time is how to include index information on the dependencies specification. Since project.dependencies in bounded by PEP 440 and 508, the answer for that question is always “so far, you can’t” (and to be honest I think this works well in terms of supply chain trust and security for projects meant to produce distributions).

pf_moore · July 15, 2023, 12:58pm

I note the “which includes a project table” - what’s wrong with “which includes a build-system table”? It doesn’t handle legacy projects, but nor does checking for project.

My use cases certainly don’t fall into the category of things I’d install in any sense. I just run python mainscript.py.

I agree we shouldn’t be mixing concerns here. I’m not against putting the new data in pyproject.toml if that’s the consensus (although see my other comment about keeping it human-editable), but I think a new use case deserves a new section. To be concrete, how about [run] with a dependencies key for dependencies? It could potentially be extended with other items like env-vars for setting environment variables, or virtualenv-name to explicitly name the project virtualenv, but let’s leave that until genuine use cases come up.

jamestwebber · July 15, 2023, 6:13pm

Yeah, this is why I wasn’t sure if my use-case was beyond the scope here–I don’t think of it as “meant to generate a wheel” but I do generate one transiently.

My concern here is that there’d be so much overlap between the [run] section and the existing [project] section that it’d be confusing for newcomers to decide which one they needed. Over time, their project can become complex enough that they move from scripts to a package.

Would run and project be mutually exclusive? If my project has entry points, do I need a [run] section? Why isn’t my project a project anyway? I’m not sure it helps.

pf_moore · July 15, 2023, 8:57pm

(I just said the same thing in the context of the VS Code proposal on the other thread - sorry for the duplication, but it applies to both discussions)

This is essentially saying that we’ll use [project.dependencies] as a sort of “requirements-lite” for cases where there’s a pyproject.toml, with the added confusion that when there’s a build backend, the build backend may need to be invoked to get the actual dependencies.

But pyproject.toml cannot be the final standard replacement for requirements files, because it doesn’t support all the use cases. Even if we ignore all the stuff that isn’t requirements in a requirements file, there’s still:

Two independent projects sharing the same requirements file.
Multiple projects sharing a directory, each with their own set of requirements.
Requirements for parts of a project (docs-requirements.txt, vendor.txt, …)

So if we use [project.dependencies], aren’t we just setting things up so there will never be a single solution for requirements? Aren’t we better keeping the status quo for now? I don’t see the benefit we’d get from allowing non-installable projects (ones that happen to have a pyproject.toml file) to specify their requirements in pyproject.toml rather than in requirements.txt.

layday · July 16, 2023, 12:57pm

IMO a standalone dependency group table (or “run” with a “dependencies” sub-key, or whatever) would be a much better alternative to requirements.txt which can serve both dist-less projects (apps) and libraries which might be misguidedly using extras for this purpose. Lots of prior art in this space to draw from (Poetry, PDM, pacmans in other languages). The nomenclature is unfortunate (“project” can stand in for both packages and apps) but we should not reinterpret the project table simply because we named it a certain way - it’s very obviously targeting packages.

BrenBarn · July 17, 2023, 1:55am

I wouldn’t say I particularly have any issues with that, but I’m not sure it’s something we should recommend or condone. As @pf_moore said, there are some projects that can’t fit the pyproject.toml model (e.g., because they aren’t associated with a single directory). I wouldn’t want a bunch of people getting the idea that they can use pyproject.toml for this stuff and then only gradually realizing it won’t actually work (and then probably complaining about that).

As with some of the other packaging discussions, I think it would be good to try to get a sense of what kinds of projects these are and what kinds of things people want to do with them and what is the best way to meet those needs. And it’s better to do nothing than to try to shoehorn a solution into a system that we know can’t actually handle it.

steve.dower · July 17, 2023, 4:21pm

Or possibly [dev] with a dependencies key for dev dependencies? And maybe even a package name to indicate which dev tool is preferred for creating and setting up the local environment?

brettcannon · July 18, 2023, 12:39am

We would probably populate the name with the name of the directory and use a fake version number of 0 or something.

Yes, although I will ask that if we do come up with a separate file which has flexibility in the name that the flexibility be limited for easier discoverability (*requirements*.txt is entirely convention and where people put an e.g. docs marker varies, if at all as they might use a subdirectory to namespace instead).

Yep, like Poetry, pipenv, and PDM already do (it’s covered by the table @courtneywebster has in the blog post).

This was brought up in Adding a non-metadata installer-only `dev-dependencies` table to pyproject.toml .

What I’m hearing in this discussion is two orthogonal questions:

Use pyproject.toml or a new file?
Should there be a way to able to provide data that even project.dependencies can’t represent, and if so what is that data?

The general trend seems to be that using pyproject.toml as-is isn’t a way most people want to go (maybe lukewarm, but no one seems to be screaming it’s the best solution and we should lean into it compared to trying to come up with a “proper” solution). It also seems like this would feed into whatever lock file format we eventually end up with by being what requirements.in is today for pip-tools (although I’ve been using it’s new pyproject.toml support in a project and liking it ).

So, do we want to try and come up with a top-level dependency-specifying solution for apps? If so, maybe the first step is trying to answer question 2 with what information we would want it to capture for app dependencies?

sbidoul · July 18, 2023, 8:13am

Not screaming but… I personally use and recommend to use project.dependencies to declare the top level (and not pinned) dependencies of any application, whether or not building a wheel happens in the deployment pipeline.

This feels quite natural and intuitive, and it actually did not even occur to me that it could be controversial until reading this thread.

Adding a name and version has never been a show stopper nor considered burdensome.

Should that prevent users (or VS Code) to use project.dependencies today when that works for them? Pip requirements files are here to stay anyway.

That use cases resonates with me. But for independent (and I assume, single file scripts) with their own dependencies, perhaps we need some standard to declare dependencies in the scripts, together with some pipx run script.py that interprets that.

Because having a separate file to declare dependencies of each script would make it more difficult to move the script around.

So I tend to think this use case alone does no go against the use of project.dependencies for declaring top level dependencies of apps.

So far that was not my understanding of the specs so the debate here comes as a surprise to me. However I note that, today, the simplest pyproject.toml with only project.name, project.version and project.dependencies leads to either a reasonable wheel (or editable install) or a helpful error message from setuptools that guides the user towards making their project package-able (kudos to the setuptools folks for that, BTW).