General discussion of some proposals I have for pyproject.toml extensions

kknechtel · November 14, 2023, 11:58pm

Posting an update to summarize what I’ve determined so far. (I also renamed the thread to better reflect how things ended up.)

The overall effect is that I’m going to start a new discussion thread to try and figure out some things that were left behind in the “projects not intended to build a wheel” thread, with clearer focus this time. Later, I expect to skip ahead to idea 3 and propose something really simple there. I can’t see a reason to pursue any of the rest any further, but hopefully this record will be useful to others.

On lockfiles

tl;dr Good luck to everyone else; you’ll need it.

I now understand @ofek 's concerns generally about lockfile data: many clients could want to store the result of a complex solve, including complex dependencies. Trying to put that into pyproject.toml could cause huge amounts of bloat, and represents information that’s impractical to write by hand. (While tools already commonly edit and maintain pyproject.toml, this should probably be discouraged generally - the ability to maintain it by hand is one of the reasons PEP 518 chose the TOML format, and users are likely to want to preserve comments etc.)

It appears likely that there will be a need a standardized lockfile format (or more than one, according to varying needs for “levels of reproducibility”) regardless of any attempt to standardize requirement lists. It also appears that efforts to produce such standards are well under way.

My own ideas about how to do lockfiles also appear to be entirely irrelevant now, too, as a result of both this and other points below.

Re “Task/target descriptions in pyproject.toml”

tl;dr I’m not working on this further but I’m hoping others will pick up the torch.

That was a bad name - it became Storing requirements for tasks in pyproject.toml. This was really more my (mis)interpretation of @pf_moore 's idea. The key point of contention, from my perspective, is: there is considerable resistance to the idea of reserving specific names for requirement lists and defining semantics for what to do with specific named lists. This is the opposite of what I expected, but it is what it is.

Such definitions are the root cause of all the complexity that caused the initial proposal linked above to be poorly received. Without them, the remaining idea is so simple that I think there are no real decisions to be made aside from bikeshedding about names. @pf_moore effectively already laid it out, twice (notwithstanding my desire to complicate things); and @sirosen also expressed interest in writing it up (I can’t really fathom what “alternatives” might come up here, but I’d be happy to see them).

This leaves the question of whether the time for such an idea has actually come yet. There seem to be two possible blockers:

Shall we use `pyproject.toml` for non-wheel-related information?

I think the answer is clearly “yes”, and I don’t see serious objection. However, the existing design may make that awkward. I had thought that the “projects not intended to build a wheel” thread was supposed to tackle that question explicitly, but it has gotten out of control and does not seem to have clear direction. I will start a new thread specifically about this question.

I think that a discussion like this is also necessary so that we can identify both a) meaningful types of non-wheel-related information, and b) meaningful types of “project” which might want idiosyncratic types of information.

I also think that this discussion will be relevant to figuring out a potential [run] table (or alternative), and the implications for PEP 723.

Can we actually end reliance on `requirements.txt`?

The general impression I now have is that requirements.txt (metonymically: files created by pip freeze in an informal “requirements.txt format”, which basically function as command-line options for Pip to install dependencies) are used for two more-or-less orthogonal purposes:

either as a pseudo-lockfile, or input explicitly for a solver that will generate a (proprietary) lockfile;
to give “alternate” lists of dependencies in contexts other than wheel-building.

This idea addresses only the second case, and @pf_moore seemed concerned that it would not be worth proposing if there is no lockfile standard - presumably, people would still be dependent on the requirements.txt approach to lock dependencies, and then it seems like they would keep using it for other dependency lists as well.

However, I think it’s clearly a good idea to push forward with something here. After all, I was told to scale back a massive, overarching vision, cut it in pieces and take things one step at a time - surely we aren’t now proposing to suppress a good idea just because it doesn’t completely solve an existing problem (which could be argued to be really two orthogonal problems). “Now is better than never”; if people are working on a lockfile idea and a requirement-list idea, it makes far more sense to release each PEP when they’re ready, rather than have them wait on each other indefinitely.

Re “Abstract dependency symbols for pyproject.toml”

tl;dr I’m shelving this, and don’t plan to explain further about the original idea.

@pradyunsg told me in the PEP 725 thread that it’s explicitly desirable to keep external (non-Python) dependencies physically separate, within pyproject.toml, from those that are available on PyPI, and that it’s an intentional aspect of the design. It comes across that developers who need to work with those dependencies will be satisfied with having a separate syntax for the individual dependencies in that section, according to their particular needs; and that attempts to unify this with PEP 508 dependency specifiers, or come up with something more general that can express both, doesn’t serve a purpose.

That shoots down one motivation for this idea. I have two more:

Have extensibility for information not covered by PEP 508;
Make it possible to “alias” a complex dependency, so it could be easily specified in multiple dependency lists.

The latter depends on (pun intended) the acceptance of at least some scheme that would put multiple dependency lists in the same file, whether or not that’s pyproject.toml. Even then it seems somewhat marginal.

The former might not have much use either. There aren’t a lot of things that could be added. Solved dependencies in a lockfile and non-pinned dependencies for a library have some overlap (name, Python version requirement) but also things that are unique to one or the other (version exclusions, hashes). The only thing that I think is common across the board, that can’t already be described with PEP 508, is an index url; current wheels wouldn’t be able to represent that, and it’s not at all clear that doing so would be desirable.

@pradyunsg also pointed out that PEP 633 was a similar, rejected idea. That was written as an alternative to PEP 631 (later merged into PEP 621), and strictly covers the same information about requirements. Neither of those is the case for my idea (which would supplement the existing behaviour rather than replacing it), but I don’t think any of this makes it worthwhile to pursue the design.

I could try to design a table specifically for “exploded” PEP 725 dependencies, but it doesn’t seem like that would be well received either.

Re “Allowing ancillary config files in pyproject.toml style”

tl;dr I’m still interested in this, but it needs to wait.

I don’t want to propose using these to cover alternate use cases for the same set of code files. I agree with @sirosen that this will get messy and redundant, and I’m hoping that the “Shall we use pyproject.toml for non-wheel-related information?” discussion leads to a conclusion like “yes, we shall use it for every abstract “task” that relates to a given set of code files”, and also that people will accept ideas like “[project.requires-python], despite its original purpose in the context of wheel building, has clear and obvious meaning for many other tools”.

Instead, my primary goal here is to support cases like monorepos, and other situations where multiple distinct “projects” (whatever that means) sit in the same parent directory. It seems like some monorepos are able to get away with just putting pyproject.toml files in subfolders and navigating to the right place before running tools. But this covers only relatively limited cases (in fact, I brought it up in that thread and got a few likes for it); and it seems like people sometimes do some really impressive (read: far more work than ought to be necessary) things to deal with more complex cases.

Some other prior related discussion:

Re “Embedded config data for single-file “projects””

This is largely obviated now. The plan here was that if my other ideas were accepted, PEP 723 wouldn’t make a lot of sense, and I’d want to write something to describe analogous functionality that did make sense in the new framework. The only remaining aspect that makes sense, I think, is the idea of a dedicated tool to extract PEP 723 data and write a separate file from it. Whether even that makes sense will also depend on further discussion.