Seeking a consensus about the purpose and future of `pyproject.toml`

kknechtel · November 28, 2023, 6:04am

First off, sorry I took so long getting around to making this post - the one I apparently planned almost two weeks ago. (If you happened to see a deleted thread with a similar title, that’s because I had started an empty draft and then accidentally submitted a different post using it.)

In truth, I mentally went back and forth several times about it, half-convincing myself that a separate thread wouldn’t be necessary because other existing threads seemed to be coming back around and finding better focus. In particular, the “projects not intended to build a wheel” thread saw another flurry of interesting ideas that seem better connected to the central topic, and the PEP 735 discussion thread has come up with quite a few thoughts about how existing projects use dependency groups and what would make the concept useful.

However, after trying to read through it all, I get the general impression that we’re coming up with more good questions than we are answers with a solid consensus; and the discussion is still in a sense fragmented. There are also a lot of things that seem important to me that aren’t being addressed.

Therefore, I want to try to focus and centralize discussion around pyproject.toml - both the interpretation of its current form, and plans for the future - by writing down what I see as the agenda for the topic.

The best way I can think of is to write out a bullet-point tree of the questions that occur to me, and thus try to give that structure to an overall discussion, even if it feels awkward. Note that I don’t think that every question here necessarily needs an answer in order to move forward; but I have tried to choose questions that I think will be useful to try and get everyone on the same page and seek a common vision. Sorry if the list seems overwhelming - but I think that goes a long way towards explaining the slow progress on these issues so far.

Without further ado:

How shall we handle compatibility issues related to changes in the pyproject.toml specification?
- In particular, is there any interest in some kind of versioning scheme, like how the core metadata has a metadata-version?
  - If not, why not - what harms could it cause? If so, how might it work, and what are the implications for existing pyproject.toml files?
  - Since the [project] table reflects value that will be used to create core metadata in a wheel, don’t changes to the metadata spec imply a pseudo-version for pyproject.toml anyway? Is there value to recording the version of core metadata that will be created? Could that vary independently of other changes?
- How can we expand the options for specifying a value, that wasn’t originally designed or specified for such expansion? For example, if a value in pyproject.toml is currently specified to be a PEP 508 string, but other types or string formats could be useful, how can we expand that functionality without breaking existing tools? Or else, what kinds of breakages do we consider acceptable?
- Should there be a standard, perhaps even built-in, validator for the pyproject.toml schema? How strict should such validation be?
  - Should tools, build back-ends etc. care if a part of the file that isn’t necessary for their task is invalid? (Existing PEPs don’t seem to address this topic area at all - but maybe I overlooked something.)
- Is it intended that third-party tools should be able to collaborate and try to advance new standards proposals?
  - In particular, what if two such tools want to use a common format because they both do something useful with metadata/config data that has the same meaning? There doesn’t seem to be a provision for “sharing” such data, because each tool is supposed to have a separate namespace.
How do we feel about existing tools updating the contents of pyproject.toml programmatically? (TOML is designed to be both human- and machine-editable, and indeed this was part of the motivation for the choice earlier. However, edits by tools might destroy the user’s comments, normalize formatting, etc.)
- How do we feel about existing tools dynamically generating pyproject.toml - e.g., by creating a new temporary such file for a one-off build, based off a tool-specific configuration? (I could imagine a CI tool that works this way, although I’m not at all familiar with CI on a practical level.)
Exactly what is an “editable install”, and what support do we need for them?
- How much of what Pip does is essential to the concept, and how much is an implementation detail?
  - If it’s supposed to mean specifically the thing whereby the current code becomes importable as if it were installed normally, but no code is copied or moved and no build step occurs - maybe the name “editable install” is suboptimal?
- Does the concept “editable install within an isolated environment” make any sense?
  - If so, how might it work?
  - If not, what are the implications for tools that might conditionally create an isolated environment for a task?
  - Actually, am I jumping the gun here - does “isolated environment” only mean something temporarily created for the task, or could it be some kind of cached environment that isn’t normally used but still persists after the task? Or does that matter - maybe an “editable install” can only use the current environment?
What is a “project”, anyway?
- Do we actually need to define the concept in order to talk about it meaningfully? Would it be useful for future PEPs to refer to a formal definition?
- What fundamental kinds of project exist? (Can they actually be listed exhaustively? Is there a purpose to trying?)
  - For each kind of project, what common tasks are they likely to need to implement?
    - Actually, maybe we should be thinking directly in terms of tasks rather than project types?
    - Overall, what tasks seem like the most common and natural? Should our designs privilege these in some way?
    - For each task, what might we configure about it? Should we offer explicit and task-specific support for it?
    - Or maybe there’s some kind of general way to help support configuration for tasks?
Notwithstanding the definition of “project”, but supposing that we acknowledge non-wheel-building purposes for pyproject.toml: what does it mean to say that a “project” is “intended” to build a wheel?
- Do tools need to be able to determine this intent?
- Can intent be unambiguously inferred - from the current pyproject.toml, from hypothetical future standards for such, etc.? Or might there be a need to record such intent explicitly?
Shall we accept that pyproject.toml has the purpose of configuring things (technical term, I know) that are not related to building a wheel?
- Does it matter whether the corresponding code will also be used to build a wheel, aside from whatever other tasks?
- In particular, does preparing sdists merit special consideration? (Seeing as how they use the same metadata? Keeping in mind that a given sdist could either be intended to create a temporary wheel on the client side for installation, or just be a way to distribute an “in-place application”)
- If it’s only about building wheels (and possibly sdists):
  - Can we be more precise about the intended purpose of the [tool] table?
  - How do we feel about tools reinventing the wheel (pun intended) by duplicating information specified in the standard (or coming up with an implementation that is later overlapped or obviated by a new standard) using custom [tool] data?
- If it can be used for other purposes:
  - What other tasks make sense here?
  - Is it incumbent upon us to enumerate specific tasks, or does it make more sense to try and come up with a general design? Or maybe we favour specific “standard” ways to use the code while trying to support others?
  - Does it matter that the current structure of the [project] table closely mirrors core metadata - is it okay to break that relationship? Is that a core aspect of the design intent?
What kinds of purposes for configuration data are we trying to standardize, anyway? And why?
- If we standardize formats for configuration data in files that aren’t pyproject.toml, should they use TOML just so that all the “standardized-format configuration files” use TOML? Or should that be judged individually?
- Should it be our responsibility to define a standard format for environment configuration, even if that uses a separate file (or files) rather than pyproject.toml?
- Should it be our responsibility to define a standard format for lockfile data, even if that uses a separate file (or files) rather than pyproject.toml?
- Notwithstanding any other consideration of possible tasks to configure, there seems to be clear identification of a “running a script” task that we want to support explicitly (thus PEPs 722 and 723). But how do we intend for PEP 723 inline script metadata to be used in the long run?
  - Is it only in TOML format so that people get used to the idea?
  - Or do we want tools to (be able to) extract that information to a separate file (i.e. have a standard for using such files, and then describe “inline” data WRT that format)?
  - Or should they even update pyproject.toml with that information (depends on compatible answers to previous questions)?
- Does it make sense for projects to use configuration files that aren’t pyproject.toml, but leverage its specification (or a subset thereof)?
  - Could users benefit from having “alternate” configuration files in the same format but different filenames, and tool support for choosing a different file to use (with pyproject.toml simply being a default rather than a specifically required name)? For example, to implement monorepos? (This approach seems to have worked well for requirements.txt files - notwithstanding their lack of standardization).

kknechtel · November 28, 2023, 6:15am

I’m sure I’ve forgotten important issues in the above, BTW - already quickly edited in one minor point, and then there’s the whole issue of potential redundancy in pyproject.toml resulting from seeking to reorganize and generalize dependency groups (we’ve already hit our heads against this a few times trying to come up with new designs). But I can’t even really come up with the questions for that topic area.

So, please feel free to add your own questions or concerns to the pile. I think it’s important that we have a proper picture of how much there is to think about.

Another meta-level issue that occurs to me: as things stand, it feels like the discussion is dominated by core devs, existing tool developers and maintainers, and visionaries (possibly would-be tool developers or maintainers) like myself. But we’re trying to design something to satisfy the use cases of general users. Especially when it comes unconventional "project"s or other ways to organize the code - how can we pull more of the relevant users into the loop? For example, one user story I’ve seen (and I think it was covered by @pf_moore 's list) is that the code for an “application” sits in a GitHub repository and not on PyPI in any form, and users are expected to “install” it by cloning the repository and “run” it by e.g. python driver.py or python -m src.main. It seems like there are some very large and/or popular programs out there like this, such as GitHub - CorentinJ/Real-Time-Voice-Cloning: Clone a voice in 5 seconds to generate arbitrary speech in real-time (nearly 49 thousand “stars”; “only” about 5k LOC in the repository, but several heavyweight dependencies listed in the requirements.txt). It seems pointless to try to have the discussion without hearing from those sorts of authors.

sinoroc · November 28, 2023, 9:47am

Some interesting questions.

Some random points from my side…

Not sure if it is in scope for where this thread is meant to go to but something I would like to see is splitting pyproject.toml into multiple files, something like this:

.
└── pyproject.d
    ├── linting.toml
    ├── packaging.toml
    └── testing.toml

And for example I would have [build-system] and [project] in pyproject.d/packaging.toml.

That would be a breaking change, wouldn’t it?

Regarding versioning pyproject.toml this is the closest I know of: PEP 518 – Specifying Minimum Build System Requirements for Python Projects | peps.python.org

Anyway, my feeling is that each table/section could potentially have different version number, they can evolve at different paces.

I guess the points about “editable” are independent from pyproject.toml. Not sure why they are here.

Poetry’s tomlkit library offers making changes to a TOML file while keeping things like formatting, comments and so on.

kknechtel · November 28, 2023, 9:53am

It is within the scope I had in mind, yes. And I’m sure it would break lots of things. Which is part of why I included the questions about how to handle breakages

… But actually, that inspired me. What if we consider the problem the other way around: instead of trying to fit other things into pyproject.toml, we try to fit pyproject.toml into a larger framework of some sort? That is, design some kind of omnibus config format, with relatively little detail, but making sure that it a) is extensible in a clear way and b) supports factoring out parts into smaller files. Then we can set it up such that existing pyproject.toml files can be a valid component of that system.

Mainly because of ongoing discussion around PEP 735. One of the key reasons for wanting to expand the scope and functionality of pyproject.toml is to track lists of dependencies (whatever that means, and whatever the consequences might be); and this has motivated a ton of discussion around the idea of doing “editable” installs of the dependencies in such a list.

pf_moore · November 28, 2023, 9:55am

Way too much for me to think about, much less respond to, right now. But on one specific point, my view is that writing to tools writing to an existing pyproject.toml should be pretty strongly discouraged, if not prohibited, and if it does happen, it should be required that all existing comments, formatting and layout is maintained unchanged.

In my view, pyproject.toml is a human-edited file, and that should always be its core purpose. Tools that want machine-writeable files should use their own file and format.

I’m frankly sick of tools that make “helpful” changes to my personal config files, and doing so in a way that impacts the readability/structure of what I wrote. “# Don't change anything below this line”, I’m looking at you!

jeanas · November 28, 2023, 9:59am

While I appreciate the energy put into this, I honestly doubt that discussing so many different issues all at once will end up productive.

pradyunsg · November 28, 2023, 10:13am

Yea. That is a large set of questions and it’s extremely unlikely that we’d be able to engage with that meaningfully in a Discourse topic.

@kknechtel is there anything specific you’re looking to get out of this?

kknechtel · November 28, 2023, 9:26pm

I recognize that there’s a lot. My primary purpose is to set the agenda.

The thing is, I’ve been getting the sense that a lot of existing threads are either about one or more of the topics listed, or at least dancing around them; but none of those attempts seem to pull out the questions explicitly and focus on any of them, and there’s a constant sense that trying to figure out an answer is blocked by some other thing on the list, with no clear starting point in sight.

So, I tried to get all the relevant questions, as best I could think of them, listed as explicitly as I could think of, and organized in the way that matches my current thought process. I’m hoping this can at least contribute on a meta level, to direct further discussions.

To be honest, it’s consistently felt frustrating trying to discuss the entire topic of packaging, simply because of the sense of contradictory needs and wants. In particular, the sense that needs aren’t being met and changes need to be made to improve the system so that needs can be met, but at the same time everyone seems to be allergic to the idea of “everything” changing again “already” to add a bit to the pyproject.toml etc. specification (especially if there’s even a hint of risk of backwards incompatibility).

pf_moore · November 28, 2023, 10:26pm

But the agenda for what exactly? Why do you even feel there’s a need to get a “consensus about the purpose and future of pyproject.toml”? Personally, I don’t care much about pyproject.toml. It’s a file, with some configuration in it. What matters to me is tools. They can put their configuration wherever they like. Certainly, some level of consistency and interoperability is important^[1], but what matters more is the user experience for someone trying to write and share code in Python.

I’d rather see a discussion about tools than about pyproject.toml. But much better than either, I’d rather see people developing tools instead of talking about them. We can standardise later, once the tools have established there’s a benefit to be had from a feature, and there’s something worth defining a common structure for.

As the standing delegate for interoperability PEPs, I’d better think interoperability is important ↩︎

sirosen · November 28, 2023, 11:40pm

It’s extremely hard to engage with a thread which starts with such broad scope. As someone who is finding that this forum is eating a disproportionate chunk of his schedule (bad habits, etc), it basically means I can’t comment much on such a thread.

I think this has come up elsewhere, but regarding the use of pyproject.toml for non-packaging (by which I mean building distributions) standards, I think we just need to “start doing it”. The [tool] table already sets the precedent here.

Basically, I don’t think it’s self-consistent to think that pyproject.toml is “for packaging only” and that the inclusion of the [tool] table was okay.

You can think that the inclusion of [tool] was a mistake, and that pyproject.toml should have been scoped to packaging only, but that’s not what happened.

Perhaps as a leading and useful question, what is the question behind these questions?
You’ve said you want to set the agenda for a discussion about pyproject.toml. Okay, but why? What do you want to see change about the way standards are approached or pyproject.toml is discussed?

(For my own part, I want to find a way to have much more focused conversations, so that fewer important interchanges are lost or buried in unreadable megathreads!)

brettcannon · November 28, 2023, 11:56pm

Much like others, the list is too long for me to dive into entirely

Nope. We purposefully didn’t version the file because we felt adding a version would encourage people to break existing tooling by changing things in backwards-compatible ways (it looks like we left that part of the discussion out of the PEP by accident).

Either people do it without messing with the formatting of things that are already there (as @pf_moore mentioned), or you don’t do it.

The one use-case I can think of is a tool that installs/manages dependencies. I’m of the opinion that anything you install should be written down in a file so you can reproduce your environment later (if you want to; and if you don’t care then ignore the file). That means either you write down what you want installed yourself and then run your installer to make your environment match the things listed, or you tell your installer what to install and it writes it down for you. The former means you update pyproject.toml yourself and then run your installer, while the latter has your tool update the file on your behalf. Different UI that have the same outcome.

bernatgabor · November 29, 2023, 12:22am

This sounds to me like some kind of lock file. Which I think should not be part of the pyproject.toml. Perhaps you want to make it pylock.toml instead, but I’m not personally 100% that this file needs to be human-readable, so not sure on the toml extension either… because that way can get quite big

bernatgabor · November 29, 2023, 12:25am

This has been answered in PEP 660 – Editable installs for pyproject.toml based builds (wheel based) | peps.python.org.

willingc · November 29, 2023, 12:42am

Thanks, @kknechtel, for caring about packaging and putting together your thoughts.

In general, Discourse, Slack, and Discord are very difficult places to have an in-depth discussion to build consensus. The cognitive load of long threads and fragmented thoughts prevents exactly what you are trying to achieve: clarity and direction.

The ideas/questions that you are sharing are something more along the lines of a skeleton for an Informational PEP.

brettcannon · November 29, 2023, 12:55am

The difference to me between a lock file and “recording what you asked to have installed” is the former includes transitive dependencies while the latter does not. So I’m thinking of what you use as input into pip-compile as the thing you record what you want to want to require to be installed and what pip-tools writes out as necessary to meet those requirements is a lock file.

plannigan · November 29, 2023, 1:26am

To me, there are two tiers of dependencies:

Direct dependencies: packages that expose a Python API that code in my project directly calls.
Indirect dependencies: packages used a direct or indirect dependency of one of the project’s direct dependencies.

Direct dependencies are the things I want to manually (or use a tool) to actively update the pinned version. When these change, I may want to or have to alter the code of my project. I want this set of dependencies managed in one clean, human & machine readable place. I believe this was the type of thing Bret was referring to.

This is different from a lock file that specifies all of the direct and indirect dependencies with pinned dependencies ^[1], which is also important. I view the versions of the direct dependencies I manage as the inputs to “some tool” that produces the lock files.

Since Bret wrote a proposal for a lock file format (PEP 665), which did not use pyproject.toml, I suspect that he agrees that it should be something separate.

and possibly more details based on your specific use case ↩︎

plannigan · November 29, 2023, 1:27am

I guess Bret types faster than I do.

BrenBarn · November 29, 2023, 5:49am

There are some important questions and issues raised here, but I agree with others that it’s hard to see how a discussion can get traction with so many things on the table at once.

I have been wondering if for some of these thornier issues some kind of collaborative document or wiki-like collection of linked documents might be helpful. I do think it’s important to hash out what we think about some of these things, but a forum like this has a “multiple linear threads of discussion” format that can cause good ideas to get lost in the shuffle.

Also, I’ve said this before, but I’ll say it again: I think caution is warranted with trying to move forward and “solve a piece of the problem” when we have so many unresolved questions like those shown in this post. It’s not to say that no incremental progress is possible, but we really want to avoid a situation where we make some changes and the result is that X years later we make another bullet-point list like this and it has even more open questions and loose ends than we had before.

kknechtel · November 29, 2023, 10:17am

Well, actually you already give an object example

See, I agree with this. But I get the impression that some people don’t. And I want to be able to clear the air about that: if my impression is false, then it will be better that this is common knowledge, so that people don’t worry about it. And if people do think that way, then it’s better that everyone a) understands why; b) actually decides whether this dissenting view should obstruct potential future changes.

Which is why I put it on my list:

Can an informational PEP really just ask a bunch of unresolved questions? I hardly think I’m in a position to impose my opinion about all of these questions on everyone else. I’m not even sure that I have a solid opinion on all of them.

(Not to mention, my understanding is that a PEP would require a discussion thread anyway!)

I think you may be right, except I don’t really know what the discussion norms are like in that kind of editing environment. (Although it does eerily remind me of a major project idea I shelved back around February, as something that would require way more initiative and free time than I actually have…)

Yes, you understand my motivation exactly.

rgommers · November 29, 2023, 11:11am

I’ll try to phrase this as nicely as I can, but: it seems to me that you are breaking a few norms with your activity. It looks like you’re genuinely trying to help and have a lot of energy for improving Python packaging - but you are lacking a lot of context, and your many replies and threads are eating up a ton of energy from many of the most active/senior devs across the packaging landscape.

The vast majority of contributors to discussions here are maintainers of packaging tools, or develop open source projects in which they run into limitations of current packaging tools and standards. And as a result, the come to this forum to discuss or, typically after exploring solutions elsewhere, even to draft and discuss a PEP. In contrast, this forum is not much like a regular open source project, where you show up as a newcomer without much context and try to push something forward.

I’d encourage you to work on some kind of packaging tool - even starting with simply contributing bug fixes to your tool of preference, and expanding your scope from there once your understanding of the project and how to contribute to it increases - and spending no more than ~10% of the time you invest in Python packaging as a whole on interactions on this forum.