PEP 751: one last time

There wasn’t strong objections, but there wasn’t strong support either. In PEP 751: one last time - #16 by radoering, @radoering said they may use it in Poetry if we came up with a solution. uv has already said they don’t have a need for this feature. You asked a clarifying question in PEP 751: one last time - #22 by frostming and which I answered in PEP 751: one last time - #26 by brettcannon but then I never heard back from you. So that’s at best a -1, +0, and a +0 for a major feature which wasn’t enough to push me into trying to implement it.

But based on what you’re saying here you I would consider your vote for the feature a +1. As such, here is what I’m willing to do:

  • If you, @frostming , are willing to say PDM will use such a feature for supporting extras and dependency groups so PDM will standardize on this format for its lock file format
  • You, @radoering , and me can come up with an approach we are all happy with

Then I will try to bring back in extras and dependency group support in some way (I think I know how I would want to do it; see below).

I know, and that’s why rejecting this part the saddest rejection for me.

To be clear, the PEP has the file name be pylock.toml. If you want a file name change that’s a separate discussion.

:+1: I can plan to add something.

Thanks so much!

I didn’t see anything in the PR notes that suggest I need to change anything in the PEP, but if I overlooked anything then please let me know!

I would agree with that if the direct_url.json spec didn’t say wheels are possible:

When url refers to a source archive or a wheel

So that tells me that a wheel could be a direct install. That’s why I left direct as a top-level key in packages.

(It also is a bit ambiguous when it comes to sdists compared to a source tree, but maybe “source” is meant to cover both cases in that sentence? It doesn’t affect the PEP as-is based on how it’s currently written.)


Going with the assumption that @frostming replies with what’s necessary to have a conversation about how to support extras and dependency groups, I’m going to start the conversation here.

I think the best solution is to introduce extras in packages.marker, treat it as a container and not a single value/string, and introduce dependency_groups which would also be treated as a container.

Having extras instead of extra is because extra already has semantics where it is expected to be used with just a single string.

The container bit is specifically so that you can use in and not in in packages.marker since you need to be able to handle the full Boolean logic of extras and/or dependency groups; I think it makes more sense to say e.g. 'extra-1' in extras and 'extra-2' not in extras than try to magically make 'extra-1' == extras and 'extra-2' != extras mean something when you have multiple extras specified. This does require a change to ‘packaging’ since it only expects to work with string values for markers.

The introduction of dependency_groups is because we simply have no way to express that concept in a marker right now.

As you can see, this expands the PEP to start messing with dependency specifiers and that’s not a small thing and thus not something I want to do unless it’s going to truly have an impact.

3 Likes

As a “better requirements.txt” workflow, pip-compile crew have been awfully quiet.

Anyone know how to reach out to them?

From a rules_python (bazel POV), we would rely on extras in a lockfile format.

No, I just forgot the name, sorry.

2 Likes

Yes a direct URL install can definitely be from a wheel: pip install /path/to/some.whl will generate a direct_url.json.

In the Direct URL spec, archive is meant to cover wheels, standardized sdists but also other installable archives (tarballs). I make a note to submit a clarification to p.p.o for this.

So with the current PEP, to represent a direct URL “archive”, one could choose one of archive, sdist and [wheels]. archive could therefore seem superfluous, unless it is used only when subdirectory is needed, or for non-standard tarballs? Additionally direct=true with a single-item [wheels] list sounds a little awkward to me.

I feel some clarification in that area would help.

Another topic. Should the Installation section of the PEP mention something about generating direct_url.json ? I see a case for the PEP to mandate the generation of direct_url.json if direct=true (and forbid it otherwise). This would allow for more fidelity in the environment reproduction use case.

1 Like

I thought we had a while back.

Regardless, I have kept them in the back of my head this whole time and I don’t think there’s anything missing that they would need to make this work for them.

Good to know! I still need to hear from @radoering and @frostming before I consider resurrecting extras and dependency group support.

Correct, it’s meant for the latter case. I chose to make sdists and wheels distinct to make the data easier to audit, and so that meant file archives for source trees are also distinct.

If you can think of a better solution I’m happy to consider it, but I either make all files generic to get rid of the awkwardness (which I don’t want to do), we assume this is not a common enough case to worry about it, or we make direct be a key on packages.sdist and packages.wheels and drop packages.direct (all the other types of source are implicitly a direct install). Do people have a preference between the latter 2 options?

It doesn’t explicitly call out how to install anything, and I view direct_url.json as part of doing an install correctly. So I could add such detail if people feel it’s important, but I also feel it’s appropriately implied.

2 Likes

I think it’s probably implied by the fact that a lockfile essentially constitutes a “direct URL specification” (as opposed to name+version) but it might be worth being explicit as it’s not precisely a “direct URL” in the sense of a name @ URL specifier. It’s not a big deal though, IMO.

1 Like

Agree with that design, should we update PEP 508 accordingly about this kind of expression for extra key?

That’s great!

I mention pip-compile because they have a long history with “–strip-extras” (See Add `--no-strip-extras` and warn about strip extras by default by ryanhiebert · Pull Request #1954 · jazzband/pip-tools · GitHub and Always remove extras in compiled files · Issue #1613 · jazzband/pip-tools · GitHub) and entrenched usage of extras in the lockfile and/or in comments. So unless this feature is removed, they possibly wouldn’t be able to switch wholesale to the new pylock.toml format.

Without support for extras[1] and dependency groups, there is no chance that we can replace poetry.lock with pylock.toml. It will just be an export format for Poetry. (As already mentioned, this outcome would also be fine for us.)

Even with support for extras and groups, I cannot promise that we will replace our lockfile, I can only promise that we will evaluate. With the decision that the tool section can only contain disposable information, there is still the risk that we try to migrate to pylock.toml and notice that we would have to add non-disposable information to the tool section (and since that is not allowed just abort the migration).

Poetry can also handle the latter[2], but I agree it is more difficult to understand than the proposed new syntax. I think, extra == "extra-1" always had the meaning of "extra-1" in extras. Probably, your proposal would have been an easier-to-understand syntax for the extra/extras marker from the start.

Do you want to introduce a dependency_groups marker or just add packages.dependency_groups? (I think the latter would be sufficient.)


  1. Actually, I think extras are implicitly supported via markers so that from Poetry’s point of view there is no need for change on this topic but I understand that Poetry’s handling of the extra marker may be in a gray area of the spec and other tools may not understand ↩︎

  2. When trying to implement a marker logic with intersections, unions and inversions, the meaning of such a marker is only a logical consequence. ↩︎

1 Like

While uv is further away from being able to use the format as our default lockfile (as compared to PDM and Poetry), I’ll try to follow along closely here and will likely have some opinions since this would be a significant change if pursued (one that extends beyond the lockfile format itself, since it now intersects with the marker algebra).

3 Likes

If I’m understanding you correctly then it makes sense to say, “if a package entry is marked as direct then write out a direct_url.json as appropriate” and leave it at that.

Possibly, which is why I haven’t tried to define any of this as that’s not a small ask. The other option is to keep this all separate from dependency specifiers in separate keys. I discuss some ideas below.

That’s all I’m asking.

Or propose an update to the spec to add whatever support you found was missing; it’s versioned for a reason. :wink:

I’m actually not sure that it did. For instance, ‘packaging’ only allows for string values in the dict representing the environment for marker evaluation. The meaning of in and not in are also not defined in Dependency specifiers - Python Packaging User Guide.

I don’t quite follow that comment.

I was thinking the former, but now I’m not sure. If this tried to expand the dependency specifier spec then it simplifies things like defining the grammar and such. But then there’s suddenly this stuff in that spec that simply doesn’t (and shouldn’t) apply to Requires-Dist in core metadata (which I think concerns @charliermarsh as discussed below). If I do it as another DSL it’s a lot more work on my part as it does require defining the grammar, the operations, etc. and I would be worried about messing that up.

But upon reflection, what do we really need here? Extras and dependency groups are additive, so at worst using a specific extra or dependency group makes the constraint on using a package more specific at the exclusion of a version, but not the package itself (i.e. it never causes a package to be removed from consideration). As such, I think the only Boolean logic statement you can end up with is (... or ...) and not (... or ...) to express what extras and/or dependency groups require. I can’t think of a scenario with extras or dependency groups where you might have ... or not ... or ... and ... as operations.

Assuming that’s correct (and PLEASE tell me if I’m wrong), we might able to simplify this greatly. If we added an extras key and a dependency-groups key to packages, they each could take a table. That table would have include and optionally exclude keys that stored arrays. You would then evaluate whether to include the package based on if a requested extra of dependency group was in include and that none of the requested extras or dependency groups were in exclude.

E.g.:

extras = {include = ['extra-1'], exclude = ['extra-2']}
...
extras = {include = ['extra-2']}

(It that verbosity for what I assume is the common case of just include bothers people, we could consider saying that if the extras or dependency-groups keys were assigned an array it’s implicitly includes.)

This avoids touching dependency specifier syntax and alleviates needing a DSL for the Boolean logic. But this only works if I’m right about what the restrictive Boolean logic we need to support is.

I would also rename packages.marker to packages.environment. I would also probably list all acceptable extra and dependency group names separately at the top-level for easy inspection by users.

Makes sense. Also, does making it separate from dependency specifiers and thus not affecting the marker algebra alleviate that concern?

2 Likes

The default extras proposal invalidates this, as foo[extra] would remove the default extra from the list to be installed.

But that’s not a standard yet, and I’m OK if you want to say that’s a question for the default extras PEP to solve - the fact that default extras break additive to has been mentioned in the discussion there already.

From a tool UX perspective that’s true, but isn’t the default extra mostly just syntactic sugar to replace package_name with package_name[name_of_default_extra]?

In other words, by the time an installer is resolving a pylock file, wouldn’t it definitely know which extras were active or not active?

I’ve been following that thread too, but it’s possible I’ve missed some edge case, so please correct me if I’m wrong.

Sorry for the confusion. I just meant that the proposed extras marker is better than the existing extra marker defined years ago in some PEP - not only for lock files but in general. (Not relevant for this discussion.)

For real-world use cases that might be correct. In theory, Poetry allows each combination because extra is just a marker, but I do not know if anyone makes use of it. We would probably have to go through a deprecation phase to be able to switch to the restricted logic.

The edge case I’m thinking of is if two extras have different constraints on the same package suck that the resolved version is different when both extras are selected than when either one is selected on its own.

I think the only way to make this happen is to have at least one extra exclude specific versions. For example, imagine pkg_a has version 1.0, 1.5, and 2.0. extra-1 depends on pkg_a with the version constraint >=1.0,!=1.5 while extra-2 depends on pkg_a with the version constraint <2.0. Further assume the locker allows for incompatible extras and favors selecting the highest allowed version of all dependencies.

In that scenario, extra-1 on its own requires pkg_a version 2.0, extra-2 on its own requires pkg_a version 1.5, but activating both extra-1 and extra-2 would require pkg_a version 1.0. How would that be expressed without an … and … construction?

1 Like
[[packages]]
name = "pkg_a"
version = "2.0"
extras = {include = ["extra-1"], exclude = ["extra-2"]}

[[packages]]
name = "pkg_a"
version = "1.5"

extras = {include = ["extra-2"], exclude = ["extra-1"]}

[[packages]]
name = "pkg_a"
version = "1.0"
extras = {include = ["extra-1", "extra-2"]}

Well, @kapinga is right, so that kills the idea as-is. One possible way to tweak this is to make it three keys: any, all, and exclude:

[[packages]]
name = "pkg_a"
version = "2.0"
extras = {any = ["extra-1"], exclude = ["extra-2"]}

[[packages]]
name = "pkg_a"
version = "1.5"

extras = {any = ["extra-2"], exclude = ["extra-1"]}

[[packages]]
name = "pkg_a"
version = "1.0"
extras = {all = ["extra-1", "extra-2"]}

Does that cover all necessary cases? Another option is to have only and simply be more verbose.

2 Likes

To be specific, we could iterate the allowed, exact combinations that apply to a package. It’s more verbose, but it’s also very accurate.

We could also allow for only or any, or all since one of those should apply in any potential cases.

1 Like

An escape hatch that allows for a precise listing of the combinations of extras that would select a specific[[packages]] entry seems appealing. That makes sure that any edge case combinations we don’t think of in this thread can still be described.

any seems important to keep as an option, otherwise a pylock file with many orthogonal extras would have a combinatorial explosion to simply describe each package that’s tied to one extra. all seems less important, but is easy enough to include as an option.

To paint the bike shed, I’d propose exactly instead of only (or more verbosely, exact_combinations).

2 Likes

Are any and all in the suggestion meant to be exclusive with one another?
Each of these in isolation works fine, but I don’t know what that combination would mean.

I’m also not sure how an installer should select from these when multiple different options match the requested extras.

For example, if I remove the excludes a bit:

[[packages]]
name = "pkg_a"
version = "2.0"
extras = {any = ["extra-1"]}

[[packages]]
name = "pkg_a"
version = "1.5"
extras = {any = ["extra-2"]}

What if extra-1 and extra-2 are requested? Since the combination isn’t declared and there are no excludes, the data are ambiguous.

Is this an invalid lock? (How do we define validity?)

Unless we take steps to explicitly prevent it, ambiguous locks become possible. But that doesn’t mean we have to define and reject ambiguous locks as invalid.

One option is for installers to be allowed to reject the file at install time. It might mean some growing pains for users, as pip might reject a lock which poetry generated and uv accepts, but it would leave the ambiguities to the implementations. And nobody would be served by an ambiguous lock, so implementations would try to avoid it.

Yes. Any lock file that results in more than one version of the same package being selected is invalid. From the PEP:

[[packages]]

  • Type: array of tables
  • Required?: yes
  • Inspiration: PDM, Poetry, uv
  • An array containing all packages that maybe installed.
  • Packages MAY be listed multiple times with varying data, but all packages to be installed MUST narrow down to a single entry at install time

Thus it would be on the locker to ensure that the any is used only when the same package version is used any time any of the listed extras are selected.

1 Like