PEP 751: one last time

How would such a proposal capture cases in which a package is included when an extra is enabled, modulo an additional marker? E.g.:

[project.optional-dependencies]
foo = ["pkg"]
bar = ["pkg ; sys_platform == 'darwin'"]

In this case, don’t you need to mix the environment and extra markers in order to express the inclusion? (E.g., the package should be included when foo is enabled or bar is enabled and the user is on macOS.)

1 Like

The name works for me (if this proposal happens at all)!

Yes.

In so much as they both need to be set, but I don’t think they need to be in the same expression (unless I’m too tired due to a rough baby sleep night last night to understand the question appropriately :sweat_smile:) .

[[packages]]
name = "pkg"
extras = {exactly = ["foo"]}

[[packages]]
name = "pkg"
extras = {exactly = ["bar"]}
# FYI 'marker' will probably get renamed to 'environment'.
marker = "sys_platform == 'darwin'"

If you look at the outlined installation steps, step 4 is probably where the check against extras and dependency-groups would go. And in that instance it could go before or after step 4.1 when marker is evaluated to see if the package entry applies for the environment (I would most likely insert it at the front).

2 Likes

I see, so the intent is that you’d include multiple entries for the same package-version-source, just with different combinations of extras and/or dependency groups. Is that correct?

One issue with the DSL approach is that we need to evaluate how we’d actually generate these expressions after resolving (and not just whether they can capture all valid scenarios). That requires some thought…


I’m not strongly opposed to using a custom extras / groups DSL, I think it could work. But ignoring how hard it would be to design and implement, I would probably prefer something like:

  • Augment the PEP 508 marker grammar to allow evaluating these extras and groups container types in this context.
  • Use a single markers field that combines the environment and extras / groups markers into a single expression.

(I know this isn’t a new idea, just re-stating it from above.)

That would allow us to retain a single [[package]] entry per package-version-source and would avoid introducing an additional DSL with similar Boolean operations. We could define this as an extension of the PEP 508 grammar such that these expressions are allowed in pylock.toml, but not requires-dist, etc.

If it were the same amount of work (which it’s not :)), would folks prefer introducing these new markers…? Or does the DSL have other benefits?

3 Likes

Yep! I can’t remember how far back I made the change at the suggestion of @radoering to allow for multiple entries for the same thing and only consider it an error if there are package conflicts in the end.

I was thinking about what a potential DSL would be last night, and I think the syntax would be:

  • and
  • or
  • not
  • (/) for forced precedence (and you could even say that parentheses can’t be nested)
  • '...' for names

I don’t think anything else would be necessary.

As for evaluation, you could cheat a bit by substituting the names with True or False based on whether they were requested and then use eval() to see what the Boolean value ends up being (h/t to @ofek for teaching me that trick thanks to packaging.licenses and how he did it in Hatch when Ee ported it over to ‘packaging’).

All correct.

The only benefit I can think of for a DSL over expanding the marker expression syntax is it makes things potentially easier to read. Let’s look at two examples to see how they look with the table approach, DSL, and expanding marker expressions.

Let’s start with the example in PEP 751: one last time - #55 by kapinga where a pkg 1.0 only applies when extra-1 and extra-2 are set:

# Table
extras = {exactly = ["extra-1", "extra-2"]}

# DSL
extras = "'extra-1' and 'extra-2'"

# Marker
marker = "'extra-1' in extras; 'extra-2' in extras'"  # Could use `and` instead of `;`

Now let’s look at having an extra which contains a dependency that has a marker requirement already as in PEP 751: one last time - #61 by charliermarsh :

# Table; could require multiple [[packages]] entries.
extras = {any = ["bar"]}
marker = "sys_platform == 'darwin'"

# DSL; could require multiple [[packages]] entries.
extras = "'bar'"
marker = "sys_platform == 'darwin'"

# Marker
marker = "sys_platform == 'darwin'; 'bar' in extras"

I’m not sure what people’s references are between these options, so let’s find out. :grin:

Which format do you prefer (rank in order for most to least preferred)?
  • Table
  • DSL
  • Marker extension
0 voters
4 Likes

So the results are in and it’s interesting!

  • The table approach had the most top votes (including PDM via @frostming ), but Poetry via @radoering liked it the least.
  • No one put the DSL as the top pick, but it was the most widely selected second pick (as well as bottom pick).
  • Extending marker expressions had the inverse tool preference and an even split in vote counts across all rankings.

I have reached out to @charliermarsh to get an opinion from someone on how uv would have voted (but knowing my luck it will just make it all less clear :sweat_smile:).

@radoering how much do you not like the table approach as the sole person to rank it 3rd?


While I wait to hear from the people mentioned above, I’m going to think out loud about the table approach a bit more. I think there would be the following keys:

The any and all are exclusive of each other. Use of excluding is the equivalent of and between any/all and excluding (and putting parentheses around any if used).

  • any triggers a selection if any of the listed groups are selected
  • all triggers if all the listed group are selected
  • excluding blocks the selection if anything listed is selected

Am I missing anything here? Is there something that can’t be expressed?

Apologies for the late entry. My preference would be: (1) a marker extension (ideally, a single marker for each entry to capture the environment and extras), followed by (2) a table, followed by (3) a DSL. (2) and (3) are not that different for me, though; it’s kind of a toss-up. I suspect that the table would be a little easier to implement, at least for the installer phase, since we don’t need to write a parser for the DSL. I don’t know which one is easier to implement for the resolver phase (i.e., how we figure out what to write there in the first place).

(And I haven’t done the math on whether any of this helps break the tie…)

One issue for the DSL and the table approach is that we should think about how extras and groups interact (e.g., if you need an entry that’s included when a certain extra is enabled and a certain group is not enabled). Are all interactions solvable via multiple entries? Would we have separate extras and groups keys under [[packages]]? Etc.

As an aside: it does seem clear at this point that this functionality could be separated out into a separate follow-up PEP. I don’t know what the pros and cons are of doing that.

No worries!

:+1:

It makes the marker expression more interesting and it does kill the DSL idea.

The pro of sticking with the PEP as-is it that it’s done (short of me updating my proof-of-concept and a couple minor tweaks). All the tool authors have signaled they can and would implement the PEP as an export format, so there seems to be buy-in at this point which is obviously a big plus. It also gives a bit of time to live with the PEP and continue to think about where the gaps are for using as the only lock file format for folks. As the Zen of Python says, “never is often better than right now”, so I could write this idea down in a Deferred section and let it go.

The con of not trying to tackle this now is momentum. I have a bit of energy left on this topic to see if it can be resolved (if people even want me to). If I don’t tackle it now I don’t know if anyone will ever bother, and it does keep coming up. Luckily the file format is versioned so technically it could get resolved at a later date and it wouldn’t break the world. As the Zen of Python says, “Now is better than never” (yes, it can contradict itself).

I have no clue what opinions people have or if @pf_moore as PEP delegate has a preference (and yes, that’s an invitation for people to voice an opinion).

4 Likes

+1 For exploring the marker expression idea a little longer. Yes, it’s a lot to ask. It could be timeboxed like was done earlier for the broader PEP. I think there’s a lot of context in people’s minds and if it gets swapped out now, it might take longer to swap back in. I didn’t vote because I didn’t feel qualified, but I would have voted for 1) Marker expression, 2) Table, 3) DSL

It seems like in general, one could end up locking a different entry (potentially different versions of the same package) for every possible subset of [extras disjoint union with dependency groups]. As you said, this would require tweaking the table or DSL format. I’m a bit worried about both the complexity this would add to the lockfile and the potential to exacerbate the “combinatorial explosion” issue.

3 Likes

I assume I like the DSL more because it is more similar to markers and I like markers most. :person_shrugging:

2 Likes

You’ve got until the end of the month (i.e. 8 days).

And to be clear, people have until the end of the month to reach consensus that this is the way to go, else I’m leaving the PEP as-is with this idea out (I am willing to move it to a Deferred Idea section to make it clear it wasn’t outright rejected).


To be more specific about a potential marker expression extension:

  • extras is added as a container that holds all the requested extras
  • dependency_groups is added as a container that holds all the requested dependency groups
  • Both should always be set to empty containers
  • Both will ONLY be valid in lock files
  • extra will NOT be valid in lock files
3 Likes

I’ve been following this discussion since it first came up, it’s great that this appears to be approaching a positive outcome, thanks a lot to all involved, especially Brett! I just wanted to throw one use-case out there in case it’s of any interest, particularly around dependency groups. Take from it what you wish or ignore if unhelpful at this stage.

At $work we have an internal tool (let’s call it “splat”) that wraps linting/formatting tools to encourage best-practice and homogeneity by making setup as simple as possible (minimal config, hiding the venv setup step). Part of the intention is to have a consistent way of running these tools both manually and in CI, and of course this means there’s a need to lock the full set of dependencies.

Currently things are set up such that users (project authors) are responsible for pinning the dependencies in a requirements file, leaving it up to them whether they want one requirements.txt, or a requirements-dev.txt, or requirements-splat.txt specifically for this tool… This is an area we’re not particularly happy with - it’s an extra step for users, an extra config item (the file path), and it’s not unusual for people to overlook the importance of pinning all dependencies for stable CI runs.

A standard lock file that supports dependency groups would be a huge win here.


To illustrate more concretely, projects would be set up with a pyproject.toml something like this:

[project]
name = "myproject"
version = "0.1.0"
dependencies = ["lxml"]

[dependency-groups]
splat = ["black", "isort", "lxml-stubs", "mypy", "ruff"]

[tool.splat]
tools = ["black", "isort", "mypy", "ruff"]

The user would then generate and commit a lock file (perhaps with the guidance of the splat tool) such that splat can install the locked dependencies with a command like uv sync --group splat under the covers. We’d want this to work seamlessly for advanced users too, alongside other extras/dependency groups they may have.

I think all the pieces are pretty much there (dependency groups, uv’s lock file and sync command). The thing holding us back is a reluctance to use a tool-specific file such as uv.lock, for a couple of reasons: beginners may have no idea what uv is or what a uv.lock file is for; whereas intermediate/advanced users may want to use a different package management tool and not want to maintain multiple lock files. A standardised lock file would solve these concerns; it’d be great to be able to proudly tell beginners “this is the new standard way of doing things” while still keeping advanced users happy!

3 Likes

You’re welcome!

Thanks for the feedback!


As a reminder, the end of the month is Friday. People have until then to speak up as to whether they think it’s worth going for a marker expression change to support extras and dependency groups (since two tools preferred that approach the most and people seem the least worried if that approach won out), or I make supporting extras and dependency groups a deferred idea in the PEP.

3 Likes

From an ecosystem improvement point of view, finally replacing extra == 'name' with 'name' in extras would be lovely (since it aligns the surface syntax with the actual semantics). The current quirky spelling stems from the long ago PEP 345 requirement that all marker expressions are strings, and shoehorning extras support into that system. (I don’t recall any of us ever liking the current spelling, it was just so far down the list of problems to solve that nobody was inclined to try and fix it)

Updating the marker syntax would also open the door to some day allowing subset expressions as a shorthand for combining multiple containment checks.

From an adoption point of view, I believe the current workflow tools either vendor their marker parsing library, implement their own, or could readily require a new minimum version, so updated markers shouldn’t hinder the rollout of standardised lock file support.

From a compatibility point of view, that problem would arise when a new metadata version proposed allow the syntax (at least for extras) in other contexts, rather than being something that affects this PEP.

7 Likes

Most everyone has been pretty quiet, so I’ll make the first move. (Queen’s pawn to D4, obviously!)
I don’t think that the lockfile should include extras and groups. That idea should be deferred because it introduces many new unknowns.
I’ll share my reasoning, since I think that’s more valuable than my conclusion.

One thing I’ve been thinking about for the past week is that even with these values included, there will be use-cases which still demand multiple lock files for a single project.

The existing lockers I’ve tried allow packages included anywhere in the input to impact the entire resolution. That is, if I have project.dependencies = ["requests>2.0"] and dependency-groups.group1 = ["requests<=2.2.0"], then in my lockfile I’ll get requests == 2.2.0 as the locked version.[1]

This makes for fast “universal” locking behavior which supports most install scenarios. There’s a viable solution for requests which satisfies multiple different install scenarios, so lock that one. It’s how these tools guarantee that as many combinations of extras and groups as possible are mutually installable. But it’s also not what a package maintainer wants if they define a test matrix which tests their project “with and without group1 installed” – they really want to test on “the latest and 2.2.0, in separate tests”.

Expand this problem out as necessary to find more complex or more compelling examples as you see fit. The point is that the user wants distinct resolutions here, but the lockers are optimized (right now) for cases in which you want compatible resolution. Different goals.

A question follows from this: “Can the lockers allow you to customize resolution to get different results?” And the answer is yes, at least somewhat. uv has a config for declaring dependency groups incompatible, which allows them to resolve separately, but also means they can’t be installed into the same environment.[2] That lets you customize resolution to say that “group1 and group2 are incompatible with one another, don’t try to lock them together”.
But that capability is a double-edged sword. If we push that idea down into the standardized lockfile format, then a request to install group1, group2 from a lockfile may succeed or fail depending on whether or not the two are compatible.

If the lockfile spec only supports locking a single scenario, consumers won’t have to contend with these sorts of issues yet. But if it supports groups and extras, then installers will need to support selecting extras, selecting groups, and the possibility that the selection isn’t installable. And we will still need multiple locks in order to support scenarios like “resolve first-order dependencies to the minimum compatible versions, but second order dependencies and further to the latest versions” (for which we can expect that the user is required to do some work to express themselves).
So installers will become more complex, the lockfile spec becomes more complex, but multiple locks per project remain a requirement in order to express the most complex scenarios.

I say all of this even though I would like support to arrive. My own use-cases would benefit a lot from having a unified lock with dependency group support. But I think it’s risky to declare that part of the initial spec. If each file describes a single scenario, I can still create a requirements-lock/ directory and put a bunch of lockfiles in there, one for each test scenario I want. And I’ll still end up with many fewer files than I currently have with pip-compile, since lockfiles can handle markers and platform support more gracefully.


  1. This is definitely the case for poetry and uv. If another tool does it differently, please share that info! :smile: ↩︎

  2. Other tools may have this too. I’m only aware of the uv one. ↩︎

1 Like

I don’t think this logic follows. When it comes to installers, I think we need to consider two categories:

  • deployment installers: these install the default dependencies defined in the lock file. Today, they’re the ones you would feed a transitively locked requirements.txt file. For these installers, the fact there may be optional dependencies declared in the lock file is irrelevant, since they only look at the default set.
  • development workflow installers: this is what we see when uv, poetry, pdm, etc are used to create environments for specific development scenarios (like “test the oldest supported versions of dependencies”, “test different mutually incompatible groups”, etc). These do care about the optional dependencies, since they’re part of using one file across multiple scenarios.

Just as it is today, feeding complex extra+group combinations to a deployment installer would be an export operation - just the output format would be different (pylock.toml instead of requirements.txt)

And while you’re correct that even development workflow installers will still require multiple lock files to handle some scenarios using the standardised lock file format, without some level of support for expressing extras and dependencies, those tools will need to consider the standard format solely as a requirements.txt replacement for now, rather than potentially as their primary lock format.

3 Likes

First of all, thanks @sirosen for expressing the reservations I’ve been having about extra support better than I could have.

@ncoghlan your point is important. I’ve been mostly thinking in terms of "deployment installers, because (a) that’s what pip is, and (b) that’s the only workflow I’m really familiar with. From what you’re saying, it sounds like it’s inevitable that providing a lockfile to a deployment installer will always be an export operation, precisely because of the reasons @sirosen stated - the presence of extras that a deployment installer won’t care about can affect the versions of packages that are locked in the default install.

Or, to put it another way, even if you have an installer that supports multiple scenarios, you wouldn’t want to feed that lockfile to a deployment, precisely because it’s a “lowest common denominatior” resolution, constrained by all of the scenarios. So you’d export the scenario you want to a single-scenario lockfile.

I agree with @sirosen that longer-term, we could want to standardise multi-scenario lockfiles, but to be blunt, I don’t think there’s anything like a clear consensus yet on how that would work. As things stand, we don’t even have a terminology that allows us to discuss the problem in a way that avoids tool-specific assumptions. I don’t think that the key tools in this area (uv, Poetry and PDM) have a common model - they are still using the flexibility of this being a tool-specific feature to iterate on what’s the best design for users[1]. Because of all this, standardising feels premature - the last thing we want at this point is to lock in one tool’s approach as a standard.

So IMO, we should:

  1. Explicitly focus the PEP on deployment lockfiles.
  2. Defer multi-scenario lockfiles and support for extras and dependency groups until tools settle on a shared model.

That feels like the best solution for users, who in general don’t really care (yet?) about interoperability between workflow tools, but who very much do want a standard for deployment descriptions.


  1. My apologies if this is a mischaracterisation. As an outsider, the lack of a clear terminology that I mentioned makes it very difficult to understand where the various tools agree, and where they differ :slightly_frowning_face: ↩︎

4 Likes

[I had this as draft and forgotten to post]

That is true in a way. But an installer should not generate direct_url.json for non-direct entries in the lock file. Otherwise a freeze | install roundtrip could not be achieved.

I agree it can be considered a quality-of-implementation topic for installers, though.

What is the downside of my earlier proposal, i.e. disallow sdist and [wheel] when direct is set? You mentioned audit reasons, but I don’t quite understand how it would be easier to audit [wheel] than archive when direct is set and we know there must be exactly one wheel for that package entry.

Also, allowing sdist and [wheel] when direct is set introduces an asymmetry with the existing direct URL standard which has only archive to cover these cases. So it creates an ambiguity, or rather multiple ways to achieve the same result which will make things more complex for implementers.

On a related note it might be worth mentioning that direct and index cannot be both true.