Projects that aren't meant to generate a wheel and `pyproject.toml`

brettcannon · October 27, 2023, 11:59pm

Yep, that’s what I meant.

Very true, but you can have top-level keys in TOML, so requires-python can actually be all by itself and not nested within a table. And since we control the top-level namespace in pyproject.toml we can choose to have such a top-level key.

I agree that this discussion seems to be taking us in the direction of needing to decide on this matter.

Yeah, I think we are starting to focus in on the use cases that we want represented and then the solutions to be flexible enough to accommodate something that covers what people are already doing. So far we have yours and mine, but I’m not sure if others have ones they want to present. It also might be yours and my use cases are disparate enough that they cover a good gamut if possibilities if we can make them both work comfortably.

As do I. Personally, I think pyproject.toml is more important than PEP 723 and so I would not want to see a compromise here just to get PEP 723 accepted. If the acceptance condition for PEP 723 doesn’t occur because it turns out [run] as proposed isn’t a good fit for what we ultimately want then I’m fine with that (and hopefully Ofek is as well ).

BrenBarn · October 28, 2023, 6:10am

In my view almost nothing actually needs those, if we’re talking about informal “distribution” channels like emailing someone a zip file. It’s not even clear to me that wheels really need them, in the sense that it might sometimes be convenient to build a wheel to email to someone without specifying certain required metadata. I’m not saying that means we need to loosen requirements for wheels, but just that we might want something that’s like a wheel but a bit looser.

Me too, but I do think it’s useful to think about “why would someone not want to generate a wheel from their project”, because it lets us maybe see the barriers the wheel system puts up. In the end we want to understand what they do want to do, but to understand that we may need to start by looking at places where people are thinking “this isn’t working”. And, to continue my thought from above, one reason may be that they don’t want to have to even think about specifying things like a name or maintainer.

Not too radical for me! (But then again what is? )

As I mentioned earlier in the thread, I think that some kind of dependency information is conceptually anterior to any distinctions between running, building a wheel, or anything like that. If your code has import blah in it then there is not really any universe in which you can run, import, test, or really do anything with that code without blah installed; and if your code does blah.do_stuff() and do_stuff() was added in version 2.0 of blah, then similarly there is no universe in which you can do anything without blah >= 2.0. Different contexts like wheel-building or running or app deployment may add wrinkles on top of that^[1], but the baseline requirements are defined by the code itself and not any metadata. It seems to me that what we want is a way for people to conveniently specify different elaborations of those baseline requirements.

It does if we can get to a place where we can consider Python as just another dependency. Just sayin’. . .

Well, true to some extent, but I think (or hope) that we’re not totally talking past each other but maybe gathering some different perspectives even if each person doesn’t fully grok other people’s perspectives. In particular. . .

This sounds very similar to a lot of things I do. When I do this I often kind of get “in the zone”, creating a bunch of random stuff that I gradually boil down to a more coherent form. And, related to what I mentioned earlier about wheel metadata, I think one of the obstacles the current system throws up is that there’s a requirement to break focus and think about stuff like “what’s the name and version”. I think that’s probably a good gate to have in place for publishing code, but not necessarily for every kind of copying/emailing/etc. that someone might want to do with a “project”.

Yes. To be honest I find that to be somewhat the case even now, although part of that has to do with the Python import system itself (specifically the problems with relative imports from __main__) rather than any packaging tools.

But from a packaging perspective, I think one way to reduce those barriers is to lean into the implicit conventions that people use when doing this kind of exploratory programming. I say “people” even though for now I really just mean me :-), but I’m curious whether you (or others) also use some of these shortcuts:

There are no explicitly specified metadata names for anything. The project name is the filename or directory name. If you want to do imports you put them in a folder and the folder name is the package name. That is, the package structure is derived directly from the filesystem information.
There is no “installation”. You want to have some directories and files within them and have them work with each other. In particular, although the project may need to import external libraries, there is little or no consideration of whether anything external needs to import the project (e.g., whether the project files are on sys.path).
There are not necessarily any versions of anything. As you noted, it’s a bit of a mess, but that’s okay for this kind of work.
There is no thought given to things like maintainer contact info, licenses, or any of that. That’s for when the dust has settled.

One thing that I think would help in this context is poetry-style tooling which stitches together the tasks of installing a dependency and updating a metadata file. I don’t really use poetry but I played around with it a bit recently and I can see why this feature makes people gravitate toward it. It can mean that you really don’t have to “break the flow” at all. If you’re doing exploratory programming and you want pandas, there’s no avoiding typing “install pandas” at some point; it’s just that with a poetry-like tool, when you do that, you also register that pandas is now a dependency of your nascent project.

This tooling can’t actually be poetry as it is now though, because (as far as I can see) poetry is closely tied to the directory-bound project structure and wouldn’t support something like a directory full of files with different dependency requirements. (To be fair, that situation thing is hard for me to keep straight in my own head, so maybe it falls under the category of “the tool can’t magically handle everything”.)

Sounds great to me!

I agree with that as well. Although I think something like PEP 723 is useful, as I mentioned in other threads, I don’t see any urgency for such things. Exploring the overall problem space is time well spent.

and there are other wrinkles, like that the distribution that provides importable blah may not be called blah too ↩︎

pf_moore · October 28, 2023, 10:56am

A wheel without required metadata wouldn’t be installable, so this isn’t realistic. By the time you changed every place that tools currently assume the current rules hold, what you’d have wouldn’t be a wheel any more (you can’t even form a wheel filename without a project name and version).

Brendan Barnwell:

There are no explicitly specified metadata names for anything. The project name is the filename or directory name. If you want to do imports you put them in a folder and the folder name is the package name. That is, the package structure is derived directly from the filesystem information.

There is no “installation”. You want to have some directories and files within them and have them work with each other. In particular, although the project may need to import external libraries, there is little or no consideration of whether anything external needs to import the project (e.g., whether the project files are on sys.path).

There are not necessarily any versions of anything. As you noted, it’s a bit of a mess, but that’s okay for this kind of work.

There is no thought given to things like maintainer contact info, licenses, or any of that. That’s for when the dust has settled.

All of those are very much how I work. Even the directory name isn’t in any practical sense the “project name” - I have even been known to have (short lived!) projects in directories called xyzzyx…

I like to think of the individual “bits” of such a project as “tasks”. They generally aren’t formal packages, but they may well be more than just a single script. Some will be runnable, others will be intended to be imported from a REPL (command line, or Jupyter). I’ve had a PyPI analysis project where the “data download” task contains a full-fledged parallel job framework, complete with progress reporting and logging. A big task like that will almost certainly have its own set of dependencies. And sometimes those will be a mess (such as 3 different logging frameworks, as I try to work out which one is best for my use case).

PEP 722/723 style “script dependencies” are often too fine grained for this sort of thing. So having some way of collecting the 3rd party libraries a task needs into one place is definitely useful. But I don’t want to combine all sets of dependencies into one - the “data cleansing” task could have its own logging needs, and those could clash with the download task’s, for example.

Another good example here is that I might well have a Jupyter notebook or two, for analysing the data. Jupyter is a big application, with its own complex set of dependencies, and I don’t want to have to ensure that the dependencies of my custom code are compatible with Jupyter. So that would nearly always be a separate “task” in my project. And I’d like to keep the Jupyter dependency separate from the dependencies of my analysis - so I want to have a set of requirements that says “numpy, pandas, matplotlib, sympy, packaging, …” and a separate one that says “Jupyter”. Then I might create an environment containing both for interactive analysis, and one containing just the first for finalised, standalone report generation.

The key here is that the starting point is a complete mess - and that’s to a good extent by design, because I don’t know yet where I’m going to end up. But as things progress, and I start to get organised, I want to be able to factor out and isolate individual tasks, and make them more “production ready”, but do so while still keeping them within the project - other parts of which may still be a mess. So having ways of factoring out groups of dependencies, of running scripts or workflows in their own isolated environment, of installing helper tools (such as snakemake, huey, or doit) in a dedicated environment for a workflow, are all important for me when it comes to fighting back the chaos

Most of this is not so much about packaging at all, to be honest. It’s about (virtual) environment management. And to that extent, pyproject.toml is really rather peripheral to the whole issue - nothing that I’ve described above needs to go in pyproject.toml. But if the community is moving towards “a Python project is a directory containing a pyproject.toml with the project configuration in it”, then tools like PDM, Poetry and Hatch, which are starting to try to support more general project workflow models^[1], will likely follow that standard. And IDEs like VS Code and PyCharm will also likely make that model their primary focus^[2]. So making sure pyproject.toml supports these sorts of extremely non-packaging workflows becomes far more pressing, if we don’t want to sideline a whole class of Python developers.

i.e., beyond “I wrote a library and I want to publish it as a wheel” ↩︎
I know @ofek is on record as having said Hatch intends to follow the lead of packaging standards, and @brettcannon has said the same for VS Code. ↩︎

tmk · October 30, 2023, 9:55am

I concur with others that this seems like the most principled solution. I could imagine something like this:

[project]
name = "spam"
version = "2020.0.0"
description = "Lovely Spam! Wonderful Spam!"
readme = "README.rst"
license = {file = "LICENSE.txt"}

[setup]  # or [install]? or [run]?
requires-python = ">=3.8"
dependencies = [
  "httpx",
  "gidgethub[httpx]>4.0.0",
  "django>2.1; os_name != 'nt'",
  "django>2.0; os_name == 'nt'"
]
[setup.optional-dependencies]
test = [
  "pytest < 5.0.0",
  "pytest-cov[all]"
]

and the tools then know that if requires-python and dependencies can’t be found in [project], they should also check in [setup].

And while we’re introducing breaking changes to PEP 621 anyway, I would love to see official support for something like the environments from hatch or the dependency groups from poetry, which is not the same as extras because they can refer to a set of dependencies that’s completely disjoint from the set of normal dependencies (for example for linting).

Adapting hatch’s syntax, it would look like this:

[setup.envs.docs]
dependencies = [
  "sphinx",
  "furo"
]

(If you really want to make me happy, you could consider PEP 633 syntax for the new [setup] table – to make a clean start, sort of, while we’re not in a rush to get something out there quickly, design something nice for “the next 10 years” – but that’s probably pushing it too far…)

plannigan · October 30, 2023, 11:49am

I’m starting to think dependency groups/environments feels like a requirement based on the breadth of use cases that have been discussed. The “projects” I’ve worked on generally fall into 3 categories:

library distributed as a wheel
CLI applications distributed as a wheel (because it is easiest)
web applications where files are just copied into place

All of these have at least 2 requirements.txt files and some have more^[1].

In my experience, there is always one set of dependencies that represent the set that is needed to “run” the “project”. There likely is, but not required, an additional set of dependencies used to perform automated testing of the “project”. By its nature this set is additive to the previous set. From there a project could have linting or static analysis tools, which may or may not want to be additive to the first set. Another group I frequently have is documentation generation.

I don’t have direct experience with Paul’s use case (directory of conceptually related tasks), but it does feel like being able to specify N groups would support N tasks. The only wrinkle for this would be that there might not be an obvious “run” group since each task is a thing that runs^[2]. But it does seem flexible enough to support the transition from “I’m just experimenting with things” to “I understand the distinct task groups”.

I’m also hesitant to follow the extras model of always being additive. It generally works well for wheels, but wouldn’t work for directory of tasks if a secondary task had a dependency that had a conflict with the primary task. Or something as simple as: Documentation generation doesn’t need the runtime dependencies and I don’t want to install those to make the CI pipeline step faster.

Being able to clearly specify these dependency requirements and the logical groups they fall into fells important. That hard part is names for these concepts considering if they play nicely with [project].

PS: +1 to the above statement about a lock file being a distinct thing that is generated from a dependency group^[3] discussed here.

I’m actually in the process of migrating to hatch to have more control without needing to had more files and replacing some “setup” scripts. ↩︎
I hope that “just pick one” is acceptable if there isn’t a clear primary task. ↩︎
maybe multiple ↩︎

encukou · October 30, 2023, 12:15pm

When designing a new thing, consider that the current install_requires is a misnomer. These are run-time dependencies, not install-time ones.
An install-time dependency would be something that can be removed after installation – for example, something that can add a GUI app’s entry to the system menu. Alas, we don’t really have those in the Python ecosystem yet.

pf_moore · October 30, 2023, 2:50pm

I think we should be very cautious about following the extras model. It has a lot of problems that make handling extras in pip and resolvelib tricky. And while most of those are unique to dependency resolution, I think we’d do much better viewing requirement sets as a completely new idea, and working out a model that matches the actual use cases, rather than assuming extras are a good starting point.

kknechtel · November 8, 2023, 5:30am

Introduction

I have a pretty simple idea for solving all of this that has even been approached several times in the thread already… I’d like to lay out a bunch of quotes first, give my observations / reasoning and finally lay out a full proposal.

So, first: these quotes (click to expand) lay out what I think are the essential ideas to consider.

Paul Moore:

…I think using the existing [project.dependencies] field locks us into the PEP 621 metadata in a way that might not be appropriate for all use cases of lockfiles (which I see as basically the same as the use cases for this proposal). I’m not suggesting we have another go at lockfiles, just that we look at this proposal in the context of what people want lockfiles for.

…

I certainly don’t want to be asked to specify a name and version every time I start a new project, and I’d be very unhappy if VS Code put some sort of “dummy” value in there - that would be just as bad as setuptools’ current behaviour of using UNKNOWN-0.0.0.

…

Speaking very much as a user, I have things that I absolutely consider a “Python project” that are nothing more than a single script, or a single Jupyter notebook. These might well be stored in a directory that contains lots of other files and projects, none of which are in any way related to each other or to the Python project in question.

…

But can we agree that any solution that mandates a specific named file would lose one important benefit of the requirements format, which is that the filename is up to the user?

…

One other issue I have, which is vaguely related to the “putting everything in pyproject.toml” comments, is that pyproject.toml is fundamentally a user-edited file. That’s not mandated by any spec, but PEP 518 listed “Reliable comments” and “Easy for humans to edit” as key factors in the choice of TOML as a format.

Paul Moore:

Maybe there is no “one solution fits all” answer here. But I firmly believe that many of the people complaining “Python packaging is hard” are saying that because they are at stages 1 and 2, and what we currently have is inappropriate for them. Making it easier to work at a stage they aren’t even interested in yet, simply isn’t the solution for them.

Again, this is just an anecdotal data point, but most of the beginner Python users I’ve taught (who are using Python in support of a job that is explicitly not “Python programmer”) will never need to write a reusable library, or a standalone app. They will write adhoc scripts, or notebooks, and their “project directories” will be full of shell scripts, or Excel spreadsheets plus data for analysis, or things like that - emphatically not “Python projects”.

I also specifically want to highlight the use cases @pf_moore identified in post 92:

Paul Moore:

A directory that will be built into a single wheel. That wheel may be intended as a library, or as an executable application that is run via an entry point and installed using something like pipx, but either way the key is that there is a build process that generates a wheel.

A directory that contains the source code for a standalone application. This is also built, but not using a typical build backend, and the resulting distributable artefact is a runnable binary of some sort. GUI applications often work like this, and CLI programs would probably work well this way, but the overheads involved often mean developers prefer to distribute CLI applications as wheels (see previous case).

A directory that contains an application that is run “in place”, typically using some form of management script. Web applications often take this form.

A directory where the developer executes many inter-related tasks, all aimed at one fundamental goal, but often independent. Some tasks may not even be written in Python, and the tasks may have very little in common beyond the overall goal (data acquisition, cleansing, analysis and reporting tasks are very different, for example). Data science projects often take this form.

A directory containing scripts all working on the same environment, but often with very little else in common. This may be the same as the previous case, although the lack of a unified goal may be significant. Automation projects often work like this.

A monorepo project, containing multiple sub-projects. I don’t know much about this workflow, except that such projects seem to have trouble even with the existing capabilities of pyproject.toml.

Observations and proposals

On `pyproject.toml` itself

(Click to expand: thesis and antithesis)

The pyproject.toml design has overall been quite successful - so successful that there is a large demand for using the file in a wide variety of ways outside of the original design intent.
Fundamentally, a pyproject.toml file is meant to convey three categories of information: what the local code fundamentally “is”; what things are required in order to create a wheel from the code; and what things are required in order to install the wheel (and thus make it usable for others).
But this is really a special case of a broader view, in which a file formatted like pyproject.toml conveys two categories of information: what the local code fundamentally “is”; and what things are required to use the code in a specific way.
Sometimes, various subsets of files (overlapping or not; possibly all the files; etc.) in the “project directory” can have distinct, meaningful use cases. It would be great to be able to describe what those subsets are (which means both: which files are in this subset? and: what is this subset called, what is its core purpose etc.?) and what things are required for the use case. It is, obviously, not so great to have a design that assumes that every use case involves building a wheel. It is also not so great to try to jam multiple use-case-descriptions into the same file.
On the other hand, it’s nice that Pip specifically can make assumptions based on the fact that its core purpose is to install something - and therefore, it should be expecting to find metadata that explains, if not directly how to build a wheel, then at least what is needed to install the code and for that installed code to be useful. (And at least for now, if it doesn’t find that metadata, we accept that it can fall back on a legacy system that runs some code in setup.py and leaves it up to the author to orchestrate the whole thing.)
As a meta point, it feels to me through the previous discussion that certain ideas kept coming up again even though they’d nominally been addressed or even dismissed. This suggests to me that there’s a certain amount of power to them.

Synthesis: When I read through the prior discussion, and over these points, I keep coming back to the same conclusion. The fundamental problem is the assumption that there will be a single such configuration file named pyproject.toml. Of course, it works great having Pip be able to rely on that name, so I think we should leave that alone. But the natural way to solve all the other problems is to just have people use separate TOML config files.

But that is clearly not enough guidance by itself, so here is a more concrete proposal:

The PEPs that describe pyproject.toml now describe a more general TOML configuration file format.
pyproject.toml (in the current working directory, or at the root of a freshly downloaded sdist) is the “default” such file, in the sense that it’s the only one that Pip will care about (and thus, published sdists need to include it).
There can be any number of TOML config files, named as the user likes, with no correspondence between those filenames and anything else. Third-party tools will expect to be told which TOML file to use via a GUI selection or command-line option, and otherwise default to pyproject.toml. Pip should not need such an option, although I suppose it might facilitate certain tasks with certain folder layouts in monorepos. There can be multiple pyproject.toml files, in different folders, for a local repository; Pip will use the one in the current working directory. It is not required for any of the TOML config files to be named pyproject.toml, locally.
When present, it is understood that pyproject.toml describes a use case that involves building a wheel, or at least supports building a wheel. It will retain the special treatment that it currently has: a default backend will be assumed if none is specified, and project.name and project.version are mandatory. When tools create an sdist based on information from some other config file, they need to write an appropriate/compatible pyproject.toml for it (so that it can be publishable on PyPI in such a way that Pip can then build the sdist into a wheel).
Other config files can advertise a use case that involves building a wheel, by including the [build-backend] table. If this table is present, then project.name and project.version are also mandatory; otherwise they are optional.
An “editable install” should just ensure that all runtime dependencies and all dev dependencies are available, and also write the .pth file needed to make sys.path setup work. That’s a perfectly well-defined semantic that other tools should be able to implement as well as Pip does. I guess it should be possible to do this without actually building a wheel, since there isn’t (AFAIK) a need to put any more than the .pth into site-packages. So maybe pip install -e could be modified to accept a custom config file name, while “real” installs still expect pyproject.toml.
Later, we can think about mechanisms to inherit content from one config file to another. Also, maybe we want to be able to group dependencies in project.optional-dependencies in more complex ways.

I think this setup can meet everyone’s needs. Single-wheel directories can keep using pyproject.toml as they already would. Standalone applications can potentially build a wheel with an entry point via pyproject.toml, or build something else (maybe even a standalone executable!) via a different config file. In-place applications can use a differently named config file for an “editable install”; or if they’re “distributed” purely as e.g. a Github repo then maybe they include a wrapper script or instructions to use PipX. A “project” of independent tasks performed by utility scripts (whether or not related to each other) could have separate config files for any given task that needs one. A monorepo could use separate files for separate sub-projects, and might even be able to name them all pyproject.toml, depending. We automatically avoid the problem of “mandating a single-venv structure”, because we can associate separate TOML files with different venvs (leaving it up to tooling to implement this). Or rather, each project only uses one venv, but the “project” concept is now much more lightweight such that this doesn’t matter (projects might even share a venv!).

On the [run] table proposal and PEP 722/723

(Click to expand: thesis and antithesis)

These PEPs were intended to address two entirely separate concerns: the desire to be able to “distribute” single-file projects informally without needing a separate file to describe requirements, and the desire to simplify the process of specifying requirements as well as ease the transition to using pyproject.toml fully.
After considerable discussion, we came to the conclusion that the TOML format itself is not onerous, and that it would be better not to have to migrate the data to a different format when creating pyproject.toml.
Of the possible contents of [project] currently, there’s IMO nothing really that would be out of place to include as metadata in a PEP 723 way, and a lot of it would be useful. Certainly it would not be actively harmful. For a particular example: nobody that I saw mentioned project.license in that discussion, but I can easily imagine a script runner that looks for that metadata, then downloads and displays that license before proceeding to the program’s entry point. And sure, we could prevent people from supplying, say, trove classifiers to a script runner that doesn’t care about them; but since they’re already optional, what purpose does that serve?
It’s hard for me to think of anything else that a theoretical [run] table might include, that isn’t already covered by [project] - except for information used to configure an environment (create or choose a venv, set environment variables, etc. But it’s easy enough to imagine wanting that information for other use cases, too.
The [run] table @brettcannon described as possibly useful for a dependency resolver, meanwhile, would contain stuff that also theoretically fits in the existing structure. Maybe we want to add project.dev-dependencies explicitly. Or maybe we privilege the name dev inside project.optional-dependencies.

Overall, I just don’t think the “ensure requirements are met and then run the code” use case is all that special, even though supporting it is clearly valuable. Setting up or managing isolated environments is already a problem that needs to be solved for building, after all.

Further: given the previous proposal for using separate TOML config files, it would make perfect sense to implement PEP 723 by simply extracting the appropriate text into a temporary such file, and then using it normally.

Finally, I disagree that it really matters that this data is semantically different in the case of “running a project” versus building a wheel. Or rather, since I’m already trying to break the association between the [project] table and building a wheel, I don’t see a reason to use a separate table. In my mind, either way one has a project, and the use cases for that project are a separate matter. If we’re supposed to name the table either [project] or [run], and not have both, then to me that feels like meaningless book-keeping. Besides, some wheels are not meant to be used as libraries at all, only by their entry points. It would feel strange to be told that there’s a separate [run] table for applications, but you should still use [project] because you want to build a wheel.

Therefore, I propose:

Don’t describe a [run] table, at least not now. Later, we can think about what would make sense in an [env] table.
Update PEP 723 so that it says to use project.requires-python to specify the Python version requirements. That’s the Way To Do It that we already defined, after all.
Update PEP 723 so that the block header does not have the pyproject text. There is no reason why it should say this, given the previous proposal - since the data will represent an “ordinary” config file not intended for building a wheel. This also makes the block header and footer match, which then means we no longer have to define semantics for nested such blocks (as this is no longer possible).
Offer a standard tool that parses out PEP 723 config information and writes it out to a correspondingly named TOML file.
If a script runner encounters a request for dynamic data that it has no way to calculate, it should just report an appropriate error message. If someone wants to try to create a script runner that figures out dependencies dynamically from import statements, we shouldn’t forbid that, however difficult and doomed-to-flakiness it might sound.

On lockfiles

Two quick observations:

Third-party tooling already expects to write and/or update pyproject.toml files.
The primary purpose of a lockfile is to pin dependencies, i.e., specify exact versions. In most cases, for most dependencies, this will be enough information.

I don’t think there is a good reason why “solved” and thus “locked” dependencies need a separate format versus broader dependency specifications. In the same way that “x = 1, y = 2, z = 3” is technically a system of three equations in three unknowns (which, when “solved”, gives the same thing back), thus foobar==1.3 is a solution to foobar>1.1, <2.0, and is also a solution to foobar==1.3. Yes, we already established the need to draw a distinction between runtime dependencies and wheel-metadata dependencies, between pins for an application and ranges for a library. But the previous proposal sidesteps the problem: just use separate config files that happen to have the same format (because they can, and there’s no real reason not to).

In some remaining fraction of cases, people will care about hashes, specific supply chains, etc. (Of course, there are arguments not to specify these in an open ecosystem, and there are also arguments to specify them in more closed contexts.) I think the right way to solve the problem is to expand the format for requirements specifications to cover hashes, source URIs etc. as needed, and then not worry about explicitly creating lockfiles.

And that's about where I stop being able to offer *informed* ideas and start speculating wildly..

If the complexity is really needed, instead of expanding that string format, maybe we could allow for requirements listed in a TOML file to be TOML sub-tables instead of strings, and define what those tables should contain. But honestly, I suspect that most of the complexity currently seen in the requirements.txt pseudo-format, in poetry.lock etc. is only there because tooling makes it possible for it to be there. I think we should prove real use cases before designing more than we need.

I don’t think that requirements.txt should be thought of seriously as an existing “way of doing things”, because it’s really more of a hack to automate custom Pip usage. If we’re seriously entertaining the idea that we’ve consciously designed a system that allows for replacing Pip as a frontend, then we can’t take What Pip Does as inherently imposing requirements for everyone else. The whole point is that different frontends can have different feature sets. So maybe some of this information really belongs in the tool table (yes, this does mean that some users might want to write [tool.pip]).

Like I said before:

So. Instead of having to standardize a separate lockfile format, then, we just expect tools to take in one TOML file with unpinned dependencies, solve them, and output another TOML file that is essentially the same thing with the dependencies pinned (and perhaps hashed, made to refer to a local cache that was populated during the solve, whatever). Currently, poetry.lock output for example is also TOML, but structured very differently… and seemingly recording a lot of information that is just not actually needed for the purpose of securing exact dependencies. I would be happier in the world where the output I get is instead just like pyproject.toml but with a bit more detail on specific dependencies.

sirosen · November 8, 2023, 8:40am

Thank you for that synthesis of major points in this thread. As someone trying to read and catch up, it is immensely helpful.

There’s a lot of use for the feature which poetry calls dependency groups, and which today can be handled with extras. It is also the problem solved by a deps config in tox (with no factors). It would be very good to have a solution for this within the pyproject.toml format.

Separate files would work, but I’d really just like to move my pytest and mypy deps into “test” and “typing” groups.

tox config is relevant here, as is hatch environment specification. There’s a lot of drive to consolidate things into a small number of files. I think failing to do this fails to offer any benefit over requirements files – or at least, it starts to become unclear why such a change is superior to the status quo.

It may be that some of the other user stories, like data science usage, are better suited by multiple files. But multiple files are a poor fit for moving dependency specs out of tox.ini .

A thought on run.dependemcies:

Is it viable to define it as a union, either a list of dependency specs or a dict (table) of names mapped to dependency specs?

I’m imagining a possibility, following the idea of having a default group named “default” that the following two configs could be considered semantically identical:

[run]
dependencies = ["foo"]

[run]
dependencies.default = ["foo"]

kknechtel · November 8, 2023, 8:39pm

Stephen Rosen:

There’s a lot of use for the feature which poetry calls dependency groups, and which today can be handled with extras. It is also the problem solved by a deps config in tox (with no factors). It would be very good to have a solution for this within the pyproject.toml format.

Separate files would work, but I’d really just like to move my pytest and mypy deps into “test” and “typing” groups.

tox config is relevant here, as is hatch environment specification. There’s a lot of drive to consolidate things into a small number of files. I think failing to do this fails to offer any benefit over requirements files – or at least, it starts to become unclear why such a change is superior to the status quo.

It may be that some of the other user stories, like data science usage, are better suited by multiple files. But multiple files are a poor fit for moving dependency specs out of tox.ini .

I have more than 15 years of Python experience, but effectively none with modern CI and such; the most sophisticated “dev” things I do normally are to use version control and run Pytest from the command line. So I would very much appreciate some more detailed user stories about how people use tox (and what problems it solves), the benefits they get out of third-party packaging toolchains (I’ve only tried Poetry and I barely actually use it for anything) etc. In particular I’m not clear on the need to track the dependencies for various development tools (pytest, mypy etc.) separately; and it’s not clear to me why they would have dependencies that vary per-project. If they don’t, then it comes across to me that things like “make sure Pytest can launch in the new venv” are more like environment-management tasks than dev-dependency listing.

Would it be valuable to have an “inheritance” mechanism designed up front, so that config files could e.g. share common metadata while describing separate dependency listings? Alternately, what could be improved with how project.optional-dependencies works? For example, do we really need a one-to-one correspondence between groups of dependencies that are named and recognized in a config file, and “extras” that are explicit installation targets for Pip?

jamestwebber · November 8, 2023, 9:17pm

There are X projects with Y contributors in a many-to-many relationship. Each project wants to pin their version of these tools so that contributions are developed consistently.

If each contributor had a single installation of each developer tool, everyone would have to keep those versions in lockstep across the whole ecosystem.

brettcannon · November 8, 2023, 10:42pm

So then how are people to specify what they want to see available in the virtual environment their script is to be run in? The hypothetical [env] table will somehow solve this? That does mean we are postponing solving the problem which very well would mean PEP 723 gets rejected.

So it’s still TOML as available in pyproject.toml, but you just want to change # /// pyproject to # ///?

From what you’ve written I’m not piecing together how you are proposing script runners figure out what to put into the virtual environment.

sirosen · November 8, 2023, 11:05pm

I will do my best to provide some useful background, but this is of course a broad topic.

Tracking dependency sets for different tools independently has some simple and subtle implications.
Simple: if members of my team add a dependency with which I am unfamiliar, I can instantly contextualize it upon reading where it goes. (e.g., adding freezegun to the test requirements)
Subtle: I may want to use conflicting or non-overlapping dependencies for different purposes. (e.g. A run of my testsuite which ensures that it passes without typing-extensions installed, while that package is used for my type checking run.)

As for having independent versions between projects… Someone has already mentioned reproducibility, which makes this desirable. But it’s not just desirable – it’s crucial. If I maintain two packages, and one of them is incompatible with the latest major release of pytest (for example), I must be able to pin pytest back a version, at least temporarily, until a fix is possible. Holding back the version of a dev package for all of the packages I work on is not a reasonable solve.
I’ll also offer a related case: I test my libraries against both the latest and the minimum supported versions of their dependencies. Although not very prevalent, I believe that such testing is a best practice and ought to become more commonplace.

Regarding poetry, the main thing I gain from it is the lockfile for reproducibility. Others may like the dependency groups feature, and I use it in applications which need the lockfile, but I find it inferior to extras in most ways. Notably, tox has no way to natively reference these dependency groups, which makes integrating these two tools needlessly fussy/tricky – if the poetry groups were part of a standard behavior, tox could use them directly. However, dependency groups do not get published, a characteristic I cannot get from extras, and which is best supported by requirements files in a more “traditional” setuptools project.

I’ll also note that poetry guarantees in its lockfile that all dependency groups are mutually compatible. I find this behavior mostly destructive, as layering of dependency groups is not important or useful to me, and it directly conflicts with my “min deps” testing need.

kknechtel · November 8, 2023, 11:05pm

With the [project] table, because the “project” that is naturally and directly associated with a single file is one where the use case is running that file (i.e., treating its top-level code as an entry point).

If an [env] table is needed / later designed, I imagine it being used to specify things like the path where a virtual environment should be created for the project (or an existing one should be reused), environment variables to set temporarily for the (run | build process | whatever else is the objective of the “project”), and so on.

Yes. In the context of my other suggestions, it’s the only thing that makes sense to me.

Supposing a typical script-runner’s default flow is:

Identify and show a license, if applicable, and prompt to continue.
Set up a temporary virtual environment for the run.
Install dependencies for the code in that virtual environment and activate it.
Invoke the virtual environment’s Python to run the code.
Clean up afterwards.

then “what to put in the virtual environment”, by my understanding, means exactly the code’s dependencies; and those are specified in project.dependencies, exactly as they would be for a different project that intended to build a wheel from the code.

Under my proposal it would also be possible to have a PEP 723 header in a “driver” script for a more conventional project, wherein most of the code is a “library” that gets built into a wheel and then there is a simple CLI implementation in the driver. But in this case I see much less value in packing the TOML data into the same file. This would be a primary use case for the “unpack PEP 723 data to a separate file” tool, since that file would then be usable by standard build toolchains without them needing to implement any kind of PEP 723 support - just support for choosing a config file rather than defaulting to pyproject.toml.

brettcannon · November 8, 2023, 11:12pm

So are you then suggesting to loosen the requirements on the [project] table to not require a version and name? I guess I’m not seeing how this ties back to PEP 723 beyond "just use [project] which was already deemed a bit burdensome based on how [project] is currently defined.

kknechtel · November 8, 2023, 11:12pm

Okay, I understand then - it’s primarily about the actual dev tools, then. I was imagining that big projects have some justification for importing something different while the code is being tested, outside of the actual software implementing the testing (sounds strange to me, but I didn’t want to rule it out). I think maybe I just underestimated the risk that a new Pytest version might actually break the validity of existing tests, but it does make sense that people would need to be able to deal with that.

That said, it sounds to me like more of a reason to use separate files - then there’s an obvious place to specify a different version of Pytest for the exceptional project (supposing for example we have a monorepo setup or some other “collection of related scripts” setup) that needs it.

kknechtel · November 8, 2023, 11:14pm

I described rules that determine whether a given config file “intends to build a wheel”; and then when that intent is not present, requirements that are motivated by wheel-building would be skipped or relaxed, yes. In particular, a file named pyproject.toml would always have the existing restrictions for compatibility reasons; other files, as well as PEP 723 sections, would be able to opt out.

Sorry for the length of the post; I tried to make it less intimidating using the folds but that may have been counterproductive.

The overall “how to teach this” that I envision looks like: first people learn to use PEP 723 to their advantage for the most common use cases (i.e. script runners); then they have a tool that can split that to a separate file, and learn about the more general functionality; then they can learn about wheel-building in general and just pick up the knowledge that certain keys are required in this case because a wheel can’t be a wheel without them (i.e. the previous points about how name and version will be used to construct the filename, put the thing in the right place on PyPI etc.). It probably wouldn’t ever be necessary to create exactly pyproject.toml by hand, since when tools make an sdist in this future, they would normalize and rename a copy of whatever config they were told to use (and then when Pip downloaded an sdist, it would still be assured of having pyproject.toml to work from).

In the far future I imagine a world where install tasks that can’t be handled this way (e.g. locating a C compiler and properly configuring to use it) have a more intelligent (and dedicated) interface than setup.py; but that’s a separate topic

sirosen · November 8, 2023, 11:20pm

I’m confused by this – it sounds like we’re potentially talking past one another a little.
Are you talking about “projects” as the various components being managed as a common codebase? I took that word to mean an application or library. I would refer to the environments and executables under a single codebase as either “environments” or perhaps “tasks”.

i.e. I call tox a “task runner which is also an environment manager”

This potential misunderstanding aside, I think it’s still important to support conflicting dependency sets. You could think of “min deps” testing as one of a category of “adverse” environments in which I want to run my test suite. Perhaps there are others, like an environment which has a known-bad version of a dependency which my library patches over.

kknechtel · November 8, 2023, 11:27pm

I mean something like “a subset of the mostly-.py files in a directory that have a cohesive function”, basically including any of the possibilities that @pf_moore laid out. If the directory will be built into a single wheel (or multiple wheels that only differ in terms of target platform, target Python version etc.), that’s one “project” (whether it’s a library that will become someone else’s dependency; an “application” primarily intended to be used from the command line; or a library that also has clearly defined command-line entry points). If it will also be built into a standalone application that nominally does the same thing, that could potentially be a separate “project”. If someone wants to publish the entire package on PyPI as one installable dependency, but also separately publish a subpackage for people who only want that subpackage, the subpackage would be a separate “project”. Similarly for monorepos used to develop several packages that make sense to publish separately; each… publishable would be a separate “project”.

I think I would need to look at a MRE (say, as a Github repository) to properly understand the situation you’re worried about. But it sounds like you want to treat different test configurations as part of the same “project” as long as the tests are oriented towards developing the same deliverable end product?

… Does pyproject.toml not already meet your needs? Or perhaps there’s something you find inelegant about it? Again, a concrete example would really help.

abravalheri · November 8, 2023, 11:48pm

The entire project table is non mandatory. project.name is only mandatory if project is present. project.version can be omitted if project.dynamic is properly configured.

Projects that aren't meant to generate a wheel and `pyproject.toml`

Introduction

Observations and proposals

On pyproject.toml itself

On the [run] table proposal and PEP 722/723

On lockfiles

On `pyproject.toml` itself