Projects that aren't meant to generate a wheel and `pyproject.toml`

sirosen · November 10, 2023, 7:40am

This is fair. Maybe I’m mistaken in thinking that the two are different in meaning.

I accept the idea that the field in project.toml could be understood to be the same in both contexts. But the meaning of the field this translates to in a built package is specific and distinct.

The trouble is that there has historically been no other meaning for the field. So assigning it a new meaning – one associated with the config field but not the metadata field – is a valid way to introduce an abstraction here which separates the meanings of the two.
Because the config field has been the verbatim content of the matching metadata field, there effectively has been no abstraction here.

Maybe I’m harping on a bit of minutiae here. We’re discussing specs, so it becomes hard to tell. But as far as our past experience goes with python_requires, it has only had one relatively narrow definition.

And my expectation is that abstractions are leaky, so the underlying behaviors expressed by tools will be reflected in this field. I’d dislike a future in which I can’t configure two different kinds of tools to behave differently in a single project because they’re reading the same field but interpreting it in different ways. (e.g. wanting an exact pin for one tool and a lower bound for another)

pf_moore · November 10, 2023, 7:42am

One thing I would specifically caution people about is that “unblock PEP 723” is explicitly not a requirement here. I very much want to see script dependency metadata standardised, but I won’t let that desire get in the way of making the right decision here. For example, if [run] turns out not to be the appropriate section name, then sorry PEP 723, you made the wrong guess and you need to change. Conversely, arguments that [run] should be used “because it’s what PEP 723 specified” aren’t valid - justify the choice on its own merits please.

sirosen · November 10, 2023, 8:24am

I agree that we shouldn’t compromise the future in service of script requirements (723). At the same time,

we should capitalize on any beneficial momentum which we can pick up from it (I’m getting involved because I’m somewhat concerned about what happens if this effort loses energy and 723 goes into deferred/rejected)
PEP 723 seems like a valid user story/use case to consider

Not that the fields need to stay identical, but the idea of it is important – having a format which can be embedded in scripts is useful.

I know I’ve tuned in quite late here. Are there proposals other than the ideas Karl is discussing (alternative files with pyproject-like contents) and the run table which we should be evaluating?

run.dependencies.default is a very clear proposal in my mind – an extra-like name to requirement mapping, with various details to hammer out.
But are there others? Do some oppose the idea of this being a mapping, rather than a list of requirements (singular)? If so, I’d like to discuss that and see if we can reach consensus.

pf_moore · November 10, 2023, 10:01am

In my personal view (as opposed to PEP delegate, which was more the perspective of my previous post):

The momentum is absolutely a good thing, getting people motivated to be involved is crucial, we want as many perspectives as we can get.
PEP 723 is provisional, so IMO^[1] it’s OK for us to require changes to that PEP if this discussion goes in a different direction.
The script use case is similar, but separate, from the case of a “project directory”. In particular, having PEP 723 enabled scripts inside a project is entirely reasonable, so if the two ideas overlap (which may or may not be the case) then we need to define the rules for how they interact.

As far as proposals are concerned, the ones I’m aware of are basically Karl’s, plus a number of variants of a section in pyproject.toml. It may be called run, but other names have also been suggested (for example, application). It may be of the form of run.dependencies.<name>, or run.<name>.dependencies. Other keys may be allowed, or not. Personally, I’m using [run] as a convenient shorthand for “put the data in pyproject.toml”. I don’t consider anything about how it gets stored in there as having been agreed on, PEP 723 notwithstanding.

And most importantly of all, no-one has yet done any significant work on describing actual use cases and how they would be handled with any of the proposals. There’s been a lot of talk about syntax, but very little about semantics.

For example:

When building a standalone application, how much of [project] makes sense, and where do we need a new key like [run]? So do we need to allow both? Do we need to look at what tools like pyInstaller do? If they infer dependencies from imports, how does that fit with the model of putting dependencies in pyproject.toml? Do we count that as out of scope, but if so, how do people transition from a “private” application to one they want to distribute?
How do we handle “run in place” projects with many ways of being run? For example, a webapp may have a debug version, or versions that run async or multi-process.
How do tools like tox get their configuration data from this new metadata? Not just requirements, but other information (like python version) needed to set up a run environment.
How do projects with multiple independent tasks structure their dependency data? Do we need some sort of inheritance (core data plus task-specific data)?
Requirements files are often used in projects that do build wheels (for test dependencies, or doc builds). How do we allow that use case (it would mean that [run] and [project] can both be present in a single pyproject.toml under the [run] model)?

I do not think that the argument “we define syntax here, that says what data is available, but it’s up to tools what they choose to do with it” is sufficient here. Like it or not, we have to consider semantics. And yes, it will be hard to do so, but going back to the survey, users want that level of well-defined behaviour from the packaging ecosystem.

To be honest, though, I think we would be better not trying to aim for the big picture here - I know other people want to think longer term, so I’m trying not to block that approach, but it does lead to long, complex and rambling debates where we find it hard to get consensus (the way this discussion is going!) So all of the above is not actually what I think we should be doing. I’d much prefer it if we simply tried to standardise requirements files:

Forget the whole “run a project” terminology and semantics debate. Actually, ignore the question of “what is a project” altogether.
Remove workflow and project types from the discussion, and focus solely on a scope of “if you currently use a requirements file, this is the replacement”.
Focus solely on lists of requirements. Introduce equivalents of pip-specific options cautiously, if at all.
Define a section in pyproject.toml that contains a series of key = [req, req, req, ...] entries.
Don’t worry about the idea of a “default” set of requirements for now, make all sets be named.
See how far that gets us.

Which is more or less what you suggested above. I’m sure someone will object to this on the basis that we need to think longer term, but

although Brett is PEP delegate on that one, not me ↩︎

kknechtel · November 10, 2023, 2:15pm

I don’t think I understand the distinction you’re trying to draw. I assume it has something to do with the spec for METADATA (inside a wheel) itself, so I guess PEP 427 or one of its successors?

Perhaps unsurprisingly, I agree with your reasoning

My original proposal is explicitly built around using things that look like the existing pyproject.toml in order to define requirements, and predicated on the assumption that different subsets of the Python source in a repository could represent cohesive units that have distinct requirements. So I can’t really ignore it.

Reading the rest of what you suggest here, it sounds completely workable, and simple - as long as we ignore half the possible conceptions of a “project” that you identified in the first place. Which you now seem to be advocating for, which throws me a bit because you were the one who brought it up. However, I think I can incorporate this into my ideas, and it would then mean only creating separate configs in order to support cases like monorepos or data-science “projects” with severable components. I was already planning to re-present the concept as multiple layered proposals, so this is just one more.

pf_moore · November 10, 2023, 5:57pm

Given that all of those forms are being handled today using requirements files (and/or pyproject.toml) I’m not clear how you can say that?

To be clear here, I’m not making any sort of proposal, and I won’t be turning anything that I suggest into a PEP. I’m simply offering ideas and pointing out places where I either don’t understand other people’s proposals, or I think there’s things that they don’t cover.

With regard to your proposal specifically I’m still very unclear how it will work in practice, so I’m looking forward to more details.

jamestwebber · November 10, 2023, 6:05pm

I thought the issue is that they aren’t all being handled, at least not optimally?

kknechtel · November 10, 2023, 6:08pm

I kinda have to agree with @jamestwebber here. They’re handled in the sense that people are building their code, but it doesn’t seem very elegant, especially for large monorepos. I’ve already seen at least one example of a custom system whereby a tool (itself part of the monorepo) dynamically generates pyproject.toml files based on importing a module and checking for an attribute - implementing something like a precursor to PEP 723 but for building instead of running. And I thought part of the goal was to standardize requirements files.

Coincidentally, you posted just as I was submitting my outline and intent to produce “more details”, and in the few hours since my previous post ITT, I spent most of my thinking time on the point (1) described there which comes primarily out of your feedback here. Your insight is greatly appreciated.

pf_moore · November 10, 2023, 8:16pm

Possibly. I don’t know the workflows for the problematic cases - there’s a lot of speculation going on here^[1], and my list was more of a call for people to describe the needs of those use cases, than being a description itself…

by me, at least ↩︎

ofek · November 10, 2023, 9:27pm

They would not interact at all and any such proposal I would push back on heavily. As an example of what you mean, you can take a look at scripts for Hatch and Hatchling. I have to maintain dedicated environments for them (sometimes wastefully requiring all dependencies) whereas in the new way I can just point to the scripts and run them.

I’m not closely following this discussion since the scope has expanded and demotivated me so I can’t comment on anything else.

sirosen · November 14, 2023, 7:24pm

I was pretty concerned about this and I suspect you aren’t the only one who tuned out a bit. I know I had trouble just reading everything between this and related threads.

I intend to refrain from commenting more here, for the most part, to turn the volume back down.

After my vacation (1 week from now), or maybe during travel if I feel up to it, I will try to write a very short concrete proposal with enumerated use-cases. My goal will be to (re)define a subset of the use cases discussed here and describe a solution using a new section for dependency data.

My goal is to provide a standard solution for writing dependency data outside of what is published, with the expectation that a success would unblock PEP 723. I do not intend to call the table [run], but am not ready to commit to a name yet ([package_groups] appeals though).

There’s an obvious path here for something exactly like what Paul mentioned – names mapped to lists of requirements – but I would like to think more about alternatives. It’s possible/likely that I’ll return to that as best, since it’s easy to understand.

jeanas · November 14, 2023, 7:40pm

I know I did.

brettcannon · November 15, 2023, 1:17am

Sorry, I thought I had for the Django case I outlined earlier.

OK, then here’s a strawman proposal…

We want something easy to write inside of single-file scripts; not too verbose while being self-explanatory to anyone reading it… We also want something that can substitute for requirements files which are used for a similar purpose for a larger grouping of Python code to cover the same use case, just at a different scale. You also want something that can help install a group of things to be used for some purpose with the code which may not directly require the code itself to be installed (e.g. linters, building the documentation).

Simple case

Top-level key named requirements which holds an array of strings representing distribution requirements to be installed (e.g. project.dependencies, but at the top-level for the file).

We can do the same for requires-python.

This covers the PEP 723 case (although it isn’t meant to be exclusive to single-file scripts). It’s short and to the point if you needs are simple.

requires-python = ">=3.10"
requirements = ["requests", "rich"]

Complex case

A [requirements] table of arrays of strings (e.g. project.optional-dependencies).

To refer to another key, use either .[key] or simply [key] (I personally don’t care which format, but the former has some precedence thanks to pip install -e). That gets you the equivalent of -r from requirements files.

This covers the requirements file case where you have multiple configurations which may or may not be related. I’m personally not concerned about specifying different requirements for different Python versions as markers handle that along with requires-python.

Since you already need to use -r with pip install to read in requirements files I don’t think it’s too much of a burden to skip the default requirement group name for now (although I bet it becomes “default” by convention really quickly), but I do think it will eventually come up.

requires-python = ">=3.10"

[requirements]
default = ["requests", "rich"]
test = [".[default]", "pytest"]  # Notice how it requires "default".
lint = ["ruff"]
docs = ["sphinx", "Furo"]

This still leaves the door open for external lock files since you would be able to specify what requirement group(s) you wanted to be locked.

kknechtel · November 15, 2023, 1:35am

This seems brilliant. Some thoughts:

Implementations would need to guard against recursion (cycles), such as:
```
[requirements]
chicken = [ ".[egg]" ]
egg = [ ".[chicken]" ]
```
or even
```
[requirements]
meal = [ ".[meal]", "spam" ]
```
One possibility is to require that such references can only be to previous keys in the table (since “graph has a cycle” <=> “graph cannot be topologically sorted”). But that requires the TOML parser to preserve key order.
(Actually, since you mention the -r flag, I guess Pip already knows how to do this.)
It’s not clear to me whether this is meant to imply an extension to PEP 508 syntax itself, or an extension to the requirement-list syntax used by e.g. [project.dependencies] (i.e. “a list of strings that are either PEP 508 format or this new thing, using the first character to discriminate”).
Speaking of which, why not view [project.dependencies] as the “default” list, and make it possible to reference it in the [requirements] section?

jamestwebber · November 15, 2023, 1:43am

I don’t know that they need to guard against it (this is a solvable set of requirements, after all), but they need to be aware of it.

Circular dependencies between packages are currently allowed, so I imagine this situation would be too.

kknechtel · November 15, 2023, 1:44am

I really only meant that they need to not get in an infinite loop / stack overflow while enumerating the dependencies

jamestwebber · November 15, 2023, 1:45am

Sure, good advice for anyone writing any code

pf_moore · November 15, 2023, 9:14am

Apologies, I forgot about that example in the flurry of discussion prompted by @kknechtel’s idea.

That case is definitely something we should support. Web apps are one of the common (and possibly the most well known) examples of “not a wheel” projects and covering them gets us a long way.

brettcannon · November 16, 2023, 12:48am

It’s not a concern as that’s up to the resolver to take care of and they are designed for this sort of thing. Basically the resolver gets a list of requirements for candidate distributions to install, and if it hits a steady-state of candidates in a loop of requirements then it considers its work done.

brettcannon · November 16, 2023, 1:04am

No worries! I figured that’s what happened.

I think my strawman proposal above covers it (or at least I tried to write it out based on how I would use it.

Once thing I did neglect to mention in my strawman proposal is how it might interact with [project]? I.e. does . implicitly include project.dependencies and does .[...] take project.optional-dependencies into consideration? If it does then I think this nicely solves the “dev dependencies” situation when doing development for something that is expected to end up as a wheel (if you don’t like using e.g. “test” as an extra for that sort of thing). But this can also be defined later if people like my strawman proposal and would rather keep it as small as possible for now.

In the name of simplicity I’m not touching . in use within project.optional-dependencies, although I know the idea of having extras being able to refer to each other has come up before. This would then keep the whole . idea scoped to just what would be acceptable in requirements.

And in case it wasn’t obvious, if people like my strawman proposal I am willing to write a PEP for it.