Projects that aren't meant to generate a wheel and `pyproject.toml`

kknechtel · August 12, 2023, 3:11am

But I’ve always understood that requirements.txt isn’t expected to specify exact versions. Of course it does when you generate it from pip freeze, but that’s only because Pip already knows the exact versions that are in the current environment.

FRidh · August 12, 2023, 5:55am

Brett Cannon:

Strawman proposal

With all of that in mind, my strawman proposal is a [run] table that supports three keys:

requires-python

dependencies

dev-dependencies

I think the first two are self-explanatory. The third would be like project.optional-dependencies. It might be nice to come up with a way to handle inheriting from other dev-dependencies entries, e.g. anything in just square brackets automatically includes that dev-dependencies entry. So having coverage = ["[tests]", "coverage"] would include everything in run.dev-dependencies.tests as well as "coverage". Or you could make it .[tests]. I realize this isn’t currently supported in any spec right now, but I think it would help with the case where people build up dependencies using -r in requirements files, and so something to include in the PEP (or a separate PEP worth writing since without a name you can’t use the self-referential hack to get extras to refer to each other, e.g. in a project named spam, use coverage = ["spam[tests]", "coverage"]).

As written, this could go into pyproject.toml. I’m not sure if people need more flexibility in how to write out various dependency structures like you can with -r in requirements files and thus require some more sophisticated structure (i.e., a way to specify independent top-level requirements that are in no way associated with each other), or even separate files.

I like this approach. Needing to define dependencies for various environments is a common need, one that is covered by tools such as poetry but lack standardization. Having this in place would also make it easier for say PEP 723 to either reuse run.dependencies or maybe even have a new run.self-dependencies to define the runtime dependencies of the script.

pf_moore · August 12, 2023, 9:30am

Using the `project` table or not

Isn’t the problem here that we’re confusing structural similarity with behavioural similarity? This is more commonly seen with methods - you draw a picture, and you draw a gun, but the two actions have nothing in common. But it’s still a problem with data. In this case, we’re thinking about the data needed to run a project, and because it looks like the [project] data used to build a wheel, we’re assuming a semantic similarity. But that similarity is an illusion, and keeps getting broken:

Name and version are mandatory when building a wheel, but optional when running a project.
Dependencies can be dynamic when building a wheel, but not when running a project.
In fact, “dynamic” data makes no sense in any context when running a project.

Using a different key acknowledges that this is semantically different data. There’s a certain amount of overlap, which in an object oriented context would be refactored out into a base class, but I don’t think we’re likely to do that here for a number of reasons, not least of which is the backward compatibility issues such a change to [project] would cause.

Also, use cases

On a related note, I think the discussion here is made more difficult because the motivation was originally framed negatively, as “projects that aren’t meant to generate a wheel”. By putting it like that, we both risk confusing multiple use cases, and at the same time not having a specific use case in mind at all.

Most of the discussion around “running a project” sounds to me like the use case of “projects that build an application”^[1]. In reality, people tend to just run such projects, rather than building a standalone application. But that’s only like doing pip install . rather than building a wheel and installing it. It’s just that there is no real solution within the packaging ecosystem for “build an application”^[2] and so the “build and run in one step” workflow is what people here think of.

For building an application, dependencies are typically the input into a locking process, and the lockfile is the output data (or maybe the actual locked dependencies if building a standalone app). But of course, we don’t have a lockfile standard yet, so we end up trying to make the dependency input do double duty, as the abstract requirements that are the input to the locker, and the resolved requirements that are what the runtime needs to make available to execute the application.

Maybe rather than [run] as a key for running a project, the key should be [application], for application metadata. Then, one of the things you can do with an application (maybe the only one, for now) is run it from source, and we have the “run my project” workflow.

For extra clarity, we could rename [project] as [library]. But the backward compatibility impact makes that almost certainly impossible.

There may be other use cases, but I don’t think anyone has called them out explicitly yet. ↩︎
There are solutions for this, like pyInstaller, but they don’t really interact with the packaging community - which IMO is a shame, because it contributes significantly to our blindness towards this area of packaging your code. Honestly, we should probably get input from projects like pyInstaller if we want to standardise this area. ↩︎

sirosen · August 12, 2023, 3:58pm

I like this proposal because

it’s very concrete (this thread is hard to follow, and I’m not the first to say it!)
it addresses the possibility of multiple separate or even conflicting dependency lists

For libraries, I typically have the following distinct dependency sets to manage:

package metadata ([project])
testsuite
testsuite with the minimum supported dependency versions pinned
style checkers (maybe delegated to pre-commit, maybe not)
type checker
doc builds

My “test-mindeps” use case breaks the idea of throwing all of your dependencies for dev into a common bucket. I know that’s a common usage pattern, but I’d personally be unhappy being left out in the cold if a new standard only gave me room for a single dependency list.

I think part of the goal should be to provide a better solution for the common dev and test extras which are published to pypi. Brett’s proposal does that.

brettcannon · August 14, 2023, 11:13pm

… eventually.

As an example, you should never set an upper-bound version requirement for a wheel, but that’s totally reasonable for your own code.

If this is in reply to me (there wasn’t a quote to tie this to anything), then notice I didn’t say “lock” anywhere, and that’s on purpose. Consider what I suggested as input to a resolver which will calculate what to install which could also be written out to some lock file. But any lock file proposal is entirely separate.

I’m not aware of any expectations in either direction around requirements.txt since it isn’t tightly specified; it’s basically just a file that writes down flags passed to pip. A requirements file can specify just top-level dependencies or it can pin to exact version like what pip-compile produces. There’s no rule or expectation as to what a requirements.txt file will contain.

Depending no who you’re speaking to, yes, hence my suggestion of another table. But maybe you’re hinting more at not even using pyproject.toml in this instance? Or maybe by “using a different key” you’re suggesting not even using the word “dependencies”? If that’s the case then I’m happy with “requirements” as a key name; as I said, “strawman”.

I’m good with that suggestion. Then we could change the keys to:

application.requires-python
application.requirements
application.dev-requirements

We could ask that both table names be supported, but I’m not sure if it’s worth it to ask tools to support that?

h-vetinari · August 15, 2023, 12:46am

I think [run] is actually the easiest to grasp and most generally applicable name so far. Everything (script, application, even libraries in some sense) needs to “run” in some form to be used, so I think it’s a pretty good candidate!

For example, it took me quite a while as a beginner to grasp the difference between library and application, not least because many projects (both open and closed) mix the two in various ways. But it was trivially clear from the very start that something (*gesticulates*) runs.

It also matches more advanced parlance in terms of separating build from runtime dependencies, for example. Plus it’s nice and short.

In short: I think it’s worth pulling on this string a bit more. For example, perhaps [run] could become its own section next to [project], and take over certain keys from there (like dependencies and requires-python), with some transition?

BrenBarn · August 15, 2023, 2:26am

I think I get what you’re saying, but it seems a bit confusing to word it as “dependencies for packages” and “dependencies for packages themselves”. Is what you’re describing just the difference between a package’s direct dependencies and the transitive closure of that? At one point in the post you refer to this as “the full dependency graph” which I think is also clearer. (If that’s not what you mean then maybe I don’t understand what you’re saying after all. )

It’s easier for me to understand these things if they’re talked about in terms of what their use or function is, rather than something like “they are a way to write down core metadata”, because writing down core metadata is itself just something we do to carry out some eventual purpose.

That said, with regard to the topic of this thread, I’m not sure I see that what we are talking about is actually “runtime dependencies” aka “full dependency graph”. It seems that a lot of what we’re talking about when we talk about projects that aren’t meant to generate a wheel is “applications”, and that when people want to write/distribute applications, they still do want to install or distribute them, they just don’t want to do so via a wheel. If I’m writing an application, I still want to just write down the direct dependencies of my code, and I still want some tool to figure out the transitive closure of that without me having to do it myself, and I still want some tool to eventually get that transitive closure (plus my app code) set up on someone else’s machine. The questions for me are more about which tools those are, which ones are for me to run vs. for the end user to run, when each tool will be run, and so on.

I’ll just take this point to reiterate that I think it would be a better world for all Python users if this separation were minimized as much as possible. The code has requirements; the Python version is just one among those requirements.

I guess my question here is, who fills this in? If you’re saying that these run dependencies would be generated from the project ones, that sounds okay to me, although I have some worries about them getting out of sync. If you’re saying that the author would need to specify these runtime dependencies explicitly, that seems not so great.

So by “run a project” are you envisioning the “I email you a zip file” situation where you want to just take an arbitrary directory tree and run the code in it directly? I think that’s an important use case, but I guess I’ve kind of lost track of how it relates to others that may fall under the “project but not a wheel” heading.

Beyond that, I’m not entirely sure I agree that dynamic data makes no sense when running a project. If the project is run via some tool that autocreates an environment, then that tool may well be able to make use of dynamic data.

jamestwebber · August 15, 2023, 2:36am

Yeah, if only because “dynamic data” in the context of pyproject.toml includes stuff like “extract the version from the source code” and “read dependencies from a text file”. Those can be useful features to have, regardless of the project’s purpose.

FRidh · August 15, 2023, 6:21am

Here I think we need to be careful. Indeed both applications and libraries need runtime dependencies, and I agree that it nicely distinguishes between build and runtime dependencies.

But it also matters how the runtime dependencies get used, whether for an application or a library. In case of an application you may want to lock your dependencies, in case of a library you do/should not want that. A project (like we have in pyproject.toml) can provide a library, an application, or both. The main aspect is just that it is distributed as a wheel.

When we consider dynamic attributes. In case of building a wheel it is fine that the build system delivers those, but when we’re not building a wheel, and we do not have a build system, we cannot have those dynamic dependencies (as pointed out also by @pf_moore). Hence it seems we want to be able to define applications that can be run directly (no dynamic deps) and distributed as wheels (potentially with dynamic deps). That same project could also offer a library which must be distributed as a wheel (and potentially uses dynamic deps).

A distribution (wheel) can also be containing applications. Sometimes all that matters is the application.

h-vetinari · August 15, 2023, 7:29am

I think the locking discussion is orthogonal here. Even a pure application has a set of dependency-constraints based on the features it uses (if you use features from foobar version X, you need foobar >=X^[1]), which would be well-described by [run] IMO. If/how you choose to lock that down (including all transitive dependencies), does not – and IMO should not – have something to do with that table.

In other word: A lockfile (while clearly worth recording) represents information derived from [run] at a given point in time, so it’s not on the same level.

and potentially foobar >=X,<Y depending on foobar’s API history resp. promises ↩︎

brettcannon · August 15, 2023, 4:32pm

The point of this proposal is for something to be distinct from [project], so that’s implied (sorry if that wasn’t clear).

But my proposal is to not move any keys since a key point of this topic is that what’s in the [project] table is meant for making wheels which is distinct from making an application run.

I think you’re misunderstanding the purpose of this proposal. You would either have a [project] table or a [run] table, not both. So if you don’t write this information then there’s no way for it be “generator from the project ones”.

That’s still a wheel, it just happens to have an entry point.

BrenBarn · August 15, 2023, 4:59pm

Okay, I guess I am.

The way I see it, it’s like, suppose I have some kind of “project”. Whatever that may mean, at a minimum it means I wrote some files that contain Python source code. That code has some import statements. I see the purpose of dependency specification as essentially listing the libraries that I import. That’s it.^[1]

So if my code has import numpy, I’m going to list numpy as a dependency. Why would I want to list that in a different way if it’s going to be run than if it’s going to be imported as a library? Either way, numpy needs to be installed before my file that says import numpy gets executed.

Why not? If my code depends on a library and I know that library made a breaking API change in moving from v2.9 to v3.0, so my code can’t run with version 3 of that library, why not specify "<3 " in my dependency listing for that library?

As has been noted, this could kind of almost be done automatically by scraping imports, except that we can’t link the import name to the distribution name, and (more important) the import doesn’t list the required versions. ↩︎

kknechtel · August 15, 2023, 5:03pm

If you know, sure. Realistically, you generally don’t know. People often preemptively add this kind of restriction because “omg a new major version will break everything, that’s what they’re for after all”. However, in general this is just forcing things to break when they might not have to, and it messes with dependency solutions for more complex projects that include your package.

Reference:

pf_moore · August 15, 2023, 5:15pm

Because if it’s being run, you might want to pin a version for reproducibility. If you’re building a library, you want to avoid pinning, so your users aren’t over-constrained. You need to read one of the articles around on application vs library dependencies - this one is quite old, but still very relevant.

pradyunsg · August 15, 2023, 5:16pm

Note that the referenced blog post is in the context of overzealous pins in libraries (i.e. something meant to be reused and installed as a Python package), not applications which are the typical target use case that is being discussed here.

BrenBarn · August 15, 2023, 5:36pm

Okay, I’m familiar with those arguments, but having a whole separate key just in case you might want to pin in one situation but not in the other seems a bit extreme to me. Is there any other difference envisioned between these two keys other than “one of them might use pinned versions and the other one shouldn’t”? And is “distributing an application (where you might want to pin versions)” the only non-wheel case that is being discussed here?

Also, even in the pinning case, still what I usually do (and maybe this is bad?) is try to derive the versions to pin from a list of unpinned versions, i.e., let the resolver try to find what it thinks is a working set of versions, and then just pin what it comes up with (or back off if it doesn’t). That’s why I was asking about whether the [run] list would be derived from a [project] list.^[1]

In either case (although I’m aware some might regard this as broadening the scope too much ), I’d say version upper bounds would not be such a problem if PyPI metadata were mutable, as discussed on pypacking-native. ↩︎

brettcannon · August 15, 2023, 5:56pm

Semantically they mean different things. If you look at all the metadata recorded in the [project] table you will notice it’s very much about the metadata you write down for a wheel (by design). That does not align with what you need to run your application (e.g., do you really need keywords?). That’s the point my blog post was trying to convey.

I view this about “running an application”, not distribution specifically. While this could help tools that build something to let you distribute your app, I don’t view it as the design goal here.

Nope, that’s a totally legitimate thing, but a a lock file is a separate thing in this discussion. I personally view this whole [run] table as writing down what is statically known about what an app needs to give to a resolver to calculate what needs to be installed.

pf_moore · August 15, 2023, 6:26pm

Using the same key for two fundamentally different purposes seems incredibly dangerous to me. What if there are situations where the two usages overlap?

You seem to be making the “structural similarity vs semantic similarity” mistake that I’ve already commented on in at least one of these threads, in response to basically this same point.

h-vetinari · August 15, 2023, 11:13pm

That’s a fine distinction of course. Though it would remain possible to “copy keys” instead of “move keys”:

[project]
# wheel build requirements
requires-python: ...
dependencies: [...]

[run]
# runtime requirements
requires-python: ...
dependencies: [...]

I can’t think of many scenarios where requires-python would diverge between the two, but conceptually they’re different, and I think it would help consistency (e.g. for scripts that have [run] without the [project]).

brettcannon · August 15, 2023, 11:19pm

I think another question is what scenario are you thinking of where you would want both a [project] and [run] table?

Projects that aren't meant to generate a wheel and `pyproject.toml`

Using the project table or not

Also, use cases

Using the `project` table or not