General discussion of some proposals I have for pyproject.toml extensions

kknechtel · November 10, 2023, 5:57pm

After extensive deliberation over the past couple of weeks, I’ve come up with multiple ideas about ways to extend the underlying idea behind pyproject.toml to Cover Everyone’s Use Case and Make Everyone Happy . This follows on from discussion in Projects that aren’t meant to generate a wheel and pyproject.toml, PEP 725: Specifying external dependencies in pyproject.toml, PEP 722/723 decision (and all the related threads), User experience with porting off setup.py, Wanting a singular packaging tool/vision, The 10+ year view on Python packaging (what’s yours?), and quite possibly more I’m forgetting.

The rough outline is something like:

Task/target descriptions in pyproject.toml
- This is a new idea I worked out after my discussion with @pf_moore about the motivations for a [run] table and abstracting the idea of a requirement group; and with @sirosen about the limitations of “extras”. (I also appreciated the discussion about the existing semantics of [project.requires-python], but I don’t think it changes anything about my thought process here.)
Abstract dependency symbols for pyproject.toml
- The idea is that instead of only using PEP 508 dependency specifiers, one could use a string that refers to an entry in a [dependency] table with more detailed information. This would allow for specifying hashes and provisioning for native packages, as well as describing the kinds of non-native dependencies currently mentioned in PEP 725. This is a new idea that I’m working on - based on the desire to be able to record the information that PEP 725 discusses, in a way that makes sense in the context of (1). Doing things this way also has the advantage that I don’t actually have to expand the specification for dependency specifiers (which would be hard to design and possibly have ripple effects).
- I think there is a slight dependency between (1) and (2) when it comes to understanding the details of how native dependencies would work when cross-compiling.
Allowing ancillary config files in pyproject.toml style
- This is the core of the idea I was trying to present originally, modified a bit after feedback. I was going back and forth on the idea of an inheritance mechanism and how to design it. I became strongly convinced it would be necessary to support inheritance if separate configs would ever be needed for separate “tasks” over the same set of files; but given the new idea in (1) I now think that separate configs would only ever be needed to represent separate (possibly overlapping) sets of files in a given repository - so I’ve dropped thinking about that, at least for now.
Embedded config data for single-file “projects”
- This would essentially be my replacement for PEP 723, in the context of the other proposals. Originally there seems to have been considerable resistance to the idea of opening up the full power of pyproject.toml in this context - despite the decision to refer to pyproject in the spec, even while describing a [run] table that isn’t yet approved for pyproject.toml. But in the context of my other ideas, especially (3), I think it makes a lot more sense to unify the spec with the config data seen elsewhere. The full set of things I want to propose here - especially the extraction tool - are probably dependent on (3).
I’m not sure if it needs to be written out separately, but in my mind, (2) and (3) taken together solve the problem of lockfiles, more or less automatically.
- Essentially, (3) allows the lockfile to exist as a separate file in the common format, and (2) allows that format to be powerful enough to express the things a lockfile needs to express. AFAICT, anyway.

I’m pretty sure that I have all of these ideas well enough formed in my mind to be able to write out appropriately detailed specifications. But before I try to do that, I’d like to decide whether I should:

Post those write-ups here separately, as I finish them, without heed to intervening discussion?
Make a separate thread for each?
Make a thread for the suite of proposals and “reserve” posts at the top to edit in the write-ups as I complete them?
Write everything up first and post it all at once?
Something else I haven’t thought of?

pf_moore · November 10, 2023, 6:19pm

Ultimately you will need to produce one or more PEPs, each of which will have to stand on its own. Each PEP will need to propose a specific, well-defined feature, argue for the benefit of having that feature, and explain how it would be used and taught, and how we would transition from the current situation to using the new feature. Basically just what every PEP has to do.

If you try to do too much at once, everyone will get overwhelmed and discussions will go nowhere. And speaking from experience, the PEP(s) will never get approved because you wont get any sort of consensus, or view of the community’s feeling on the proposal.

If you break everything down, I appreciate that not all of the pieces might make sense on their own. You’ll have to work out what sort of split makes most sense. Remember that if you’re making a proposal, it’s up to you to persuade people it’s a good idea. You can’t just say “let’s do such-and-such” and expect other people to do the job of making a case for your proposal. So if you can’t work out how to “sell” your proposal, you’re going to have to address that before publishing it.

You also need to demonstrate that you’re responding to community feedback, so don’t take too much on at once. If you’re posting 4 different proposals at once, you’ll need to manage 4 discussions, keep the 4 proposals up to date, notify the threads of changes so that people are aware of what’s happening, etc. Are you confident you can do that? Personally, I doubt I could manage to do a good job of handling more than one proposal at a time, but you’ll have to judge for yourself what you cam manage (and most importantly, have the time for).

One specific point:

You cannot just replace PEP 723. It’s been accepted^[1], and as such it is the solution for defining script dependencies. Yes, it might still get withdrawn or modified based on what happens in this discussion, but you have to work on the basis that it’s an accepted standard - Brett could choose to simply waive the condtion he put on acceptance and say that PEP 723 is final.

So, tl;dr - I’d suggest picking one of your proposals, make sure it stands alone, and publish that as a draft PEP. By all means mention the others for background, but otherwise leave them until the first one is completed. Be prepared for a lot of work, as even one PEP is a big undertaking…

provisional acceptance is still acceptance! ↩︎

kknechtel · November 10, 2023, 6:49pm

I had hoped to discuss the ideas, or at least lay them out more informally, before getting to PEP writing, but maybe that’s a waste of time.

If I tackle these ideas individually, I think the order I’ve written them out is probably most natural; if something isn’t well liked I can reconsider the implications for the subsequent proposals on the fly. Given the need for such consideration, it definitely sounds like writing the ideas out one at a time makes more sense.

In particular, if (1) is well liked then I think it kills the idea of a [run] table. I had assumed that this would necessarily cause PEP 723 to be rejected; but whatever happens, happens. If PEP 723 is accepted, whether or not it describes a format compatible with pyproject.toml, I would have to either just drop (4) or greatly reduce its scope. (Similarly, (2) in a sense competes with PEP 725.) While (3) is in a sense my “darling”, (1) is more or less a blocker, and overall certainly more urgent in my mind.

So I guess the best thing is to just move straight to drafting a PEP for that. I haven’t tried this before. I should start by checking out GitHub - python/peps: Python Enhancement Proposals and reading PEPs 1 and 12, correct?

pf_moore · November 10, 2023, 8:13pm

What I said is just my opinion, if you feel that you need to have a bit more discussion before going to a PEP, then by all means go ahead. But I know I’ve already stopped commenting on your posts on the previous thread because I feel I need more detail to understand what you’re proposing, and I suspect that if you start another discussion thread, I’ll simply end up doing the same. If you’re getting good feedback from others, then that’s fine, though.

Only @brettcannon can really say if that’s the case. The wording of the acceptance suggests that if we don’t agree to a [run] section^[1], PEP 723 is no longer accepted. But the acceptance has been marked as “provisional”, and in my view, provisional acceptance allows for backward-incompatible changes to be made. Which might mean that it’s acceptable to modify PEP 723 to omit the problematic clause that puts constraints on future standards around a [run] section^[2].

The truth is that no-one knows at the moment.

Because I want script dependencies to be sorted, I prefer to assume that PEP 723 is the official solution, and the “provisional” status means that we’ll adjust it if possible to make it work. But I may be alone in this view. For me, the two “disaster” scenarios are (1) PEP 723 gets rejected and we’re back to square one, or (2) the requirements PEP 723 places on pyproject.toml make it impossible for us to come up with the right solution for pyproject.toml without triggering rejection of PEP 723. My current fear is that (2) is happening, and we need something that simply isn’t [run] as described by PEP 723.

And finally, as the author of PEP 722, if PEP 723 does get rejected, I’ll lobby extremely hard for either PEP 722, or a variant of PEP 722 which uses TOML (which I’d be willing to write, given that the acceptance of PEP 723 implicitly invalidates the arguments PEP 722 made against TOML) to be accepted in its place. I am absolutely not interested in going back through the whole “script dependencies” debate with another new proposal. So sorry, but that makes me a very strong -1 on your (4), regardless of what it says.

Which, according to PEP 723, must have a certain specific form. ↩︎
Although Brett said “I don’t want /// pyproject to have anything that’s invalid in pyproject.toml; it should at best be a subset of what’s available, not a superset” which suggests he wouldn’t allow that. ↩︎

kknechtel · November 10, 2023, 9:27pm

My current feeling is that I know enough to start writing (1). For (2) I want to talk to the PEP 725 people more, and especially see how they respond to (1). That means I should probably be trying to put (3) and (4) out of my head for a while, and I’ll see what other background discussion (and proposals) come up in the mean time.

… Ah. Fair enough You’re entirely justified in that; in truth I was heavily leaning towards PEP 722 anyway. I just figured, given the user study and the shape of Brett’s decision, that the consensus had come down in favour of “it’s fine and preferable to use TOML for this kind of metadata, and we’ll make it work somehow”.

ofek · November 11, 2023, 12:09am

What is idea number 1 exactly? I couldn’t understand from the text on top.

kknechtel · November 11, 2023, 12:24am

Mm, I thought about it some more and that name is not good. I will hopefully have something better when I start drafting. But the basic idea is to have tables like [required-to.build-wheel], [required-to.run] etc. that store separate lists of dependencies that the code might have depending on what you are doing with it. Here, [required-to.build-wheel] is effectively a replacement for [project.dependencies], a name which I think is misleading and unnecessarily focuses the contents of pyproject.toml on wheel-building (and the design imposes a limit that I want to remove). Then, dependencies related to extras go in places like [required-for.extra-name.build-wheel] etc.

The needed Python version will still default to [project.requires-python], because this should be the same no matter what you’re doing with the code. The code was written to some version of the Python language spec, and that doesn’t change because you’re testing it vs. building etc. Maybe there is some pathological case where someone needs to use a different version of Python to run e.g. Sphinx vs. what the code itself should use; but I consider that to be out of scope (an environment task, rather than a code-usage task).

ofek · November 11, 2023, 1:25pm

I see, thanks for explaining!

For projects that are meant to be distributed as a wheel, anything that is not project.dependencies is poor UX IMO.

pf_moore · November 11, 2023, 1:31pm

I don’t think a proposal that tries to replace the existing [project] table is going to get very far. There’s been too much invested in transitioning to PEP 621, and expecting users to move to something else, just as they have gone through the work of moving to PEP 612, isn’t going to get accepted.

sirosen · November 11, 2023, 4:47pm

My main feedback is at a meta-level, not specific to the various ideas here. I would recommend trying to pursue an incremental path, in which you are only advocating for one change or feature at a time.

I think incremental charge works better for discussion, as already mentioned here, but also for implementers and users.
Incremental changes are easier to implement more quickly, and have a clearer scope of work to be considered finished. By contrast, implementing “partial support” for a spec is always rife with judgement calls about which parts can safely be omitted.
And for users, the benefits of faster implementation are that all of their tools will get the new features more quickly. That has the obvious effect that they can start using the new feature, but also the more subtle effect that all of their tools stay in sync and they can just learn about the feature (rather than learning it and learning which tools don’t support it).

Non-incremental changes can be the right choice, but usually when incremental improvements cannot (or cannot in reasonable time) achieve the same results.

tl,dr: Incremental change is good for a variety of reasons.

Since I’m a proud, card carrying member of the Incrementally Improve Packaging club, I’m only going to give focused feedback on a couple of pieces here.

(1) sounds interesting, maybe similar to allowing named sections for dependencies in the proposed run table. But it’s not clear enough what it means. “Task definitions” sound like tox and hatch envs. And although I think some of that data (mainly dependencies) belongs in pyproject.toml, I don’t think that all of it does. Not only do I think this needs more definition for us to have a fruitful discussion, but I think you need to make sure you note how it’s different from other proposals. Perhaps it will be more extensible in the future?

(2) sounds again interesting, but it also sounds like new syntax. I have some cautious feelings about new syntax for dependencies. Based on my experience with ^1.0.2 as a version specifier (poetry allows this, and I’ve seen it confuse people), I think new syntax has a high bar to pass – will it be valid everywhere that PEP 508 strings are valid?
However, poetry is my best frenemy, and has a good solution for this sort of thing: allow package items in dependency lists to be strings or objects. Objects can have a variety of fields like name , version, etc. and are very easy to extend in the future. Would that be an option here to consider?

(Aside: I may be slow to reply for ~1 week.)

kknechtel · November 11, 2023, 11:17pm

Synonymize (and just the one key), not replace. In my own projects I show an almost reckless disregard for backwards compatibility, but I do like to think I at least understand the concept.

Extensibility (and parallelism) is central to the idea, yes. I will try to define a few semantics that I think are the low-hanging fruit, and try to figure out what makes sense for going forward from there.

My original plan was to enhance the actual PEP 508 string syntax, but now the plan is to have more tables, and one of the entries might be a PEP 508 string with the existing syntax and semantics. It’s meant (going forward) to be able to handle non-native dependencies, so it inherently can’t be valid everywhere PEP 508 strings are.

There are inherently a lot of problems people want to be able to solve, so it just isn’t going to be possible with a small set of minor tweaks. Any putative [run] table wouldn’t be that small of a change, after all, either; and it would (to my understanding) address exactly one new use case.

If I understand it correctly (and in theory I use Poetry, so I should be able to check this…) that is not much different from what I am proposing. I just add a layer of indirection, so to speak, so that the dependency list doesn’t mix strings and objects, but instead some strings are table names (TOML tables == JSON objects, really). That also allows for reusing a complex dependency description in multiple lists.

kknechtel · November 14, 2023, 11:58pm

Posting an update to summarize what I’ve determined so far. (I also renamed the thread to better reflect how things ended up.)

The overall effect is that I’m going to start a new discussion thread to try and figure out some things that were left behind in the “projects not intended to build a wheel” thread, with clearer focus this time. Later, I expect to skip ahead to idea 3 and propose something really simple there. I can’t see a reason to pursue any of the rest any further, but hopefully this record will be useful to others.

On lockfiles

tl;dr Good luck to everyone else; you’ll need it.

I now understand @ofek 's concerns generally about lockfile data: many clients could want to store the result of a complex solve, including complex dependencies. Trying to put that into pyproject.toml could cause huge amounts of bloat, and represents information that’s impractical to write by hand. (While tools already commonly edit and maintain pyproject.toml, this should probably be discouraged generally - the ability to maintain it by hand is one of the reasons PEP 518 chose the TOML format, and users are likely to want to preserve comments etc.)

It appears likely that there will be a need a standardized lockfile format (or more than one, according to varying needs for “levels of reproducibility”) regardless of any attempt to standardize requirement lists. It also appears that efforts to produce such standards are well under way.

My own ideas about how to do lockfiles also appear to be entirely irrelevant now, too, as a result of both this and other points below.

Re “Task/target descriptions in pyproject.toml”

tl;dr I’m not working on this further but I’m hoping others will pick up the torch.

That was a bad name - it became Storing requirements for tasks in pyproject.toml. This was really more my (mis)interpretation of @pf_moore 's idea. The key point of contention, from my perspective, is: there is considerable resistance to the idea of reserving specific names for requirement lists and defining semantics for what to do with specific named lists. This is the opposite of what I expected, but it is what it is.

Such definitions are the root cause of all the complexity that caused the initial proposal linked above to be poorly received. Without them, the remaining idea is so simple that I think there are no real decisions to be made aside from bikeshedding about names. @pf_moore effectively already laid it out, twice (notwithstanding my desire to complicate things); and @sirosen also expressed interest in writing it up (I can’t really fathom what “alternatives” might come up here, but I’d be happy to see them).

This leaves the question of whether the time for such an idea has actually come yet. There seem to be two possible blockers:

Shall we use `pyproject.toml` for non-wheel-related information?

I think the answer is clearly “yes”, and I don’t see serious objection. However, the existing design may make that awkward. I had thought that the “projects not intended to build a wheel” thread was supposed to tackle that question explicitly, but it has gotten out of control and does not seem to have clear direction. I will start a new thread specifically about this question.

I think that a discussion like this is also necessary so that we can identify both a) meaningful types of non-wheel-related information, and b) meaningful types of “project” which might want idiosyncratic types of information.

I also think that this discussion will be relevant to figuring out a potential [run] table (or alternative), and the implications for PEP 723.

Can we actually end reliance on `requirements.txt`?

The general impression I now have is that requirements.txt (metonymically: files created by pip freeze in an informal “requirements.txt format”, which basically function as command-line options for Pip to install dependencies) are used for two more-or-less orthogonal purposes:

either as a pseudo-lockfile, or input explicitly for a solver that will generate a (proprietary) lockfile;
to give “alternate” lists of dependencies in contexts other than wheel-building.

This idea addresses only the second case, and @pf_moore seemed concerned that it would not be worth proposing if there is no lockfile standard - presumably, people would still be dependent on the requirements.txt approach to lock dependencies, and then it seems like they would keep using it for other dependency lists as well.

However, I think it’s clearly a good idea to push forward with something here. After all, I was told to scale back a massive, overarching vision, cut it in pieces and take things one step at a time - surely we aren’t now proposing to suppress a good idea just because it doesn’t completely solve an existing problem (which could be argued to be really two orthogonal problems). “Now is better than never”; if people are working on a lockfile idea and a requirement-list idea, it makes far more sense to release each PEP when they’re ready, rather than have them wait on each other indefinitely.

Re “Abstract dependency symbols for pyproject.toml”

tl;dr I’m shelving this, and don’t plan to explain further about the original idea.

@pradyunsg told me in the PEP 725 thread that it’s explicitly desirable to keep external (non-Python) dependencies physically separate, within pyproject.toml, from those that are available on PyPI, and that it’s an intentional aspect of the design. It comes across that developers who need to work with those dependencies will be satisfied with having a separate syntax for the individual dependencies in that section, according to their particular needs; and that attempts to unify this with PEP 508 dependency specifiers, or come up with something more general that can express both, doesn’t serve a purpose.

That shoots down one motivation for this idea. I have two more:

Have extensibility for information not covered by PEP 508;
Make it possible to “alias” a complex dependency, so it could be easily specified in multiple dependency lists.

The latter depends on (pun intended) the acceptance of at least some scheme that would put multiple dependency lists in the same file, whether or not that’s pyproject.toml. Even then it seems somewhat marginal.

The former might not have much use either. There aren’t a lot of things that could be added. Solved dependencies in a lockfile and non-pinned dependencies for a library have some overlap (name, Python version requirement) but also things that are unique to one or the other (version exclusions, hashes). The only thing that I think is common across the board, that can’t already be described with PEP 508, is an index url; current wheels wouldn’t be able to represent that, and it’s not at all clear that doing so would be desirable.

@pradyunsg also pointed out that PEP 633 was a similar, rejected idea. That was written as an alternative to PEP 631 (later merged into PEP 621), and strictly covers the same information about requirements. Neither of those is the case for my idea (which would supplement the existing behaviour rather than replacing it), but I don’t think any of this makes it worthwhile to pursue the design.

I could try to design a table specifically for “exploded” PEP 725 dependencies, but it doesn’t seem like that would be well received either.

Re “Allowing ancillary config files in pyproject.toml style”

tl;dr I’m still interested in this, but it needs to wait.

I don’t want to propose using these to cover alternate use cases for the same set of code files. I agree with @sirosen that this will get messy and redundant, and I’m hoping that the “Shall we use pyproject.toml for non-wheel-related information?” discussion leads to a conclusion like “yes, we shall use it for every abstract “task” that relates to a given set of code files”, and also that people will accept ideas like “[project.requires-python], despite its original purpose in the context of wheel building, has clear and obvious meaning for many other tools”.

Instead, my primary goal here is to support cases like monorepos, and other situations where multiple distinct “projects” (whatever that means) sit in the same parent directory. It seems like some monorepos are able to get away with just putting pyproject.toml files in subfolders and navigating to the right place before running tools. But this covers only relatively limited cases (in fact, I brought it up in that thread and got a few likes for it); and it seems like people sometimes do some really impressive (read: far more work than ought to be necessary) things to deal with more complex cases.

Some other prior related discussion:

Re “Embedded config data for single-file “projects””

This is largely obviated now. The plan here was that if my other ideas were accepted, PEP 723 wouldn’t make a lot of sense, and I’d want to write something to describe analogous functionality that did make sense in the new framework. The only remaining aspect that makes sense, I think, is the idea of a dedicated tool to extract PEP 723 data and write a separate file from it. Whether even that makes sense will also depend on further discussion.

kknechtel · November 15, 2023, 1:28am

It’s nice to be proven wrong so quickly

@brettcannon came up with the idea of an explicit mechanism for dependency lists to reference each other for inclusion / extension (rather than having any pre-defined relationships between them, as in my proposal).

I should clarify here (from the discussion of extant uses of requirements.txt that, while locking installers might see a requirements.txt with only “root” dependencies as input that’s specifically oriented towards making a lockfile, that’s still fundamentally just an “alternate list of dependencies”. In principle, any list of dependencies can be solved for, and any solution can be locked.

ofek · November 15, 2023, 1:42am

This is the most succinct summary of the state of lock files for Python that I’ve seen

sirosen · November 15, 2023, 7:12am

Just to offer a sneak preview about what the variations are here, in my mind, it’s a question of which things should be defined in a flat/simple way, and which should be tables/dictionaries from the outset.

For example, should each list of dependencies be it’s own table so that it can have extra fields in the future? I’m not sure. I can imagine useful future fields, in theory, but it also makes things more complex.

I’ve been trying to keep up with reading everything that comes through my inbox, even though I haven’t been replying, and I wanted to explicitly call something out:

Thanks for starting discussion around reserved names for dependency groups!

I was puzzling over whether or not there should be any system of reserved or special names, and the discussion around that has pushed me away from it. There are only a few cases which would benefit, and it’s a messy space to wade into. The conversation here crystalized that for me into a pretty firm conviction that we should not try to enter that territory right now.

sinoroc · November 15, 2023, 6:44pm

Sorry, I do not have the capacity right now to read all posts completely, so I skim. Please forgive me if I misunderstood things or if I react to things that have been addressed already or (here or in another thread that I have not read yet).

Need to be careful here, in particular 2 things come to mind:

We still have the issue of abstract vs. concrete dependencies for example. Not sure it is relevant here, but more generally we need to pay attention to not put things in the pyproject.toml file that are not relevant to all contributors to the project. Typically I do not need to know which PyPI mirror someone else is using.
This is a bit of a pet peeve of mind and might be only tangential here again I guess… I do not want to have a file a hundreds lines long containing all kinds of settings from all kinds of tools and all kinds of stages of the development and deployment workflows. This would need a separate thread and I would be glad if someone could pick this up, but in short I would like something like pyproject.toml.d/*.toml, a directory where I can split pyproject.toml into multiple files.

Otherwise I am glad @kknechtel you are trying to shake things up, and in a friendly manner. Good that you try to split things into smaller pieces, but I encourage you to keep your big picture in mind nonetheless.