The 'extra' environment marker and its operators

Related: Pip wheel install doesn't handle extra marker. · Issue #8469 · pypa/pip · GitHub

Currently it is possible to supply the following in METADATA:

Name: a
Version: 1.0
Provides-Extra: new-b
Requires-Dist: b >= 2; extra == 'new-b'
Requires-Dist: b < 2; extra != 'new-b'

While this logically makes sense, it seems to go against the original intention of the extras feature, i.e. to add optional dependencies to a package. And indeed, none of the current installer implementations handle this correctly. I also could not find similar things in other packaging ecosystems—all equivalents to Python package extras only allow a package to include more specifications, not remove from the base requirements.

My proposal is to amend PEP 508 to explicitly special-case extra, and disallow all operators other than ==. This would free up installers from dealing with convoluted edge cases that rarely anyone really ever thought through, and likely make the specification fit better with how extra is originally designed.


The only relevent description in PEP 508 is

Comparisons in marker expressions are typed by the comparison operator. The <marker_op> operators that are not in <version_cmp> perform the same as they do for strings in Python.

I don’t even want to think about what extra >= should mean :stuck_out_tongue:

PEP 426 described extras in more detail (saying they’re for “optional dependencies”). But that PEP was withdrawn, and did not clarify what optional dependencies mean either.

4 Likes

Python did a lot of things first, I don’t find this line of reasoning to be relevant.

But the rest is. Though how is excluding requirements based on extras so much different from excluding based on platform or Python version?

String ordering is fraught with peril, so I’m not opposed to just ruling out those comparisons, but have we actually seen issues? Or are we just simplifying specs because we can’t imagine the uses right now?

(Allowing the extra marker to use extra in {set} seems interesting to me, though guessing you really didn’t want to expand the syntax right now :wink:)

The discussion here on “default extras” is probably relevant, too. But it’s essentially arguing for some sort of “extras that remove dependencies” feature, and one of the reasons I find the discussion confusing is precisely that “extras removing things” feels weird to me.

So +1 on this proposal specifically. Also I’d be in favour of a more general principle that “extras only add stuff”. People looking for a more general “manipulate dependencies at install time” facility should come up with a design that isn’t constrained by how extras currently work. Such a design may well (probably should) make extras obsolete. @steve.dower’s selector packages idea may be worth reviewing in that context.

The difference is that platform and Python version can’t change in a dependency graph, but extras do. This causes troubles to dependency resolution. For example:

  • A is defined as above
  • B has two versions 1.0 and 2.0, neither with dependencies
  • C depends on A

What should pip install C A[new-B] do? Select B >= 2 (since that’s the variant wanted by the user)? How does the resolver know if C depends on A without extras because it does not care about B, or specifically wants B < 2? The logic gets more delicate if there’s a D depending on A (no extras), and conditionals that combine multiple extra markers.

I agree, and it’s essentially a good short hand to extra == ... or extra == .... But this should be a separate discussion since PEP 508 does not support the set syntax on the right side anyway.

(This actually brings up another issue—should extra in 'foobarrex' evaluate to true on pip install package[rr]? Does it right now? Gosh.)

Personally I think about the default extra thing not as extras removing thing, but extra groups being activated automatically. So pip install package implies package[default], and package[other] implies package[default,other]. I have no idea whether this is also what others have in mind, and nobody really clarifies how exactly how they want the thing be designed thus far, which is why I have largely stayed out of that topic (except pushing for a PEP draft so I can understand what exactly people are thinking).

I think this is almost right. It’s actually:

  • pip install package implies package[(install_requires),default] ,
  • pip install package[other] implies package[(install_requires),other] .

It makes sense as well, thanks for the clarification. In that case, the default extra proposal would also be a way to offer the same functionality asked for in the linked pip issue. Nice :slightly_smiling_face:

Um, whatever it would normally do…? That example isn’t about trying to omit a requirement, and the rules for resolution seem like they’d be the same as any other version conflict, with the added annoyance that you need to download the full metadata to know what the extra means.

And I’d assume that if a A[extra] excludes a requirement, but B->A includes it, you’ll get the requirement. Because that’s essentially the same as when you require two different packages - there’s no conflict here because only one package has the requirement.

(Another case for selector packages: inter-package dependencies don’t have to transitively pass along options that can be queried from the OS, because they all just depend on the selector and it gets resolved once to provide extra requirements).

C depends on A (without extra new-B, so transitively depends on B < 2) but the user also requests A[new-B] which depends on B >= 2. So I guess fail with an error about inconsistent requirements?

But whether that’s what people writing something like this would expect, is something I’m much less sure of. And whether it’s what pip does, I’m also not sure of. Once again, the point here is that the semantics of extras are not very clear, and not written up anywhere. But I’ve said this before, so I won’t go on about it again.

I’m in favour of disallowing != comparisons for extras because they are potentially confusing, likely to result in people guessing wrongly about how they will behave, and there’s no obvious use case. Not because we can’t trace through the logic and claim a justification for a particular behaviour…

Any kind of “slim” option could use this (as was asked in another thread about “how do I default to shipping a command-line tool but optionally just ship the library?”). And since multiple extras are allowed, you’ll force people into even worse logic in some cases.

We can (and probably should) discourage version selection via extras, which is what the first example is actually doing wrong, but I don’t see any valid reason to be worried about the != comparison.

To be honest, I’m only against != in the broad sense of not liking extras because they are not properly standardised. The more people find “clever” ways to use them, the harder any future standardisation effort becomes, because backward compatibility gets harder.

1 Like

If we were going to standardise them, wouldn’t it make the most sense to work it into the environment marker system anyway? Otherwise we’re inventing a brand new system, which would likely require deprecation and transition and all that stuff anyway…

What is normal? This is a serious question. As Paul have pointed out, extras are so under-specified that different package users and packagers have different, conflicting ideas what they are exactly. They come complaining when PyPA tools do not do what they expect, and there’s no way to tell them what’s wrong or right because there are absolutely no rules. Hell, even PyPA tools do things differently within themselves. To untangle this, PyPA need to write down those semantics (what extras are) to the extras feature, and come up rules (what packagers and users should expect) from those semantics.

Stepping back a little, this makes me reflect on one of the most troubling difficulties I feel in Python packaging. Many times the response you get when trying to explain the rules to people is “eh, that’s way too convoluted, why don’t you just do it normally,” while those rules are exactly the necessary part to define “normal.”

The two main “schools” to intrepret extras are either to define them as additional dependency requirements added on to a package, and selector for multiple dependency requirement sets. In the former interpretation, each package has a base set of requirements, and each extra adds a group of additional dependencies to the package. The latter one, on the other hand, treats package and package[extra] as two different identities that share the same packaged code, and thus can have different sets of requirements.

Difference between the two schools emerge when you try to merge extras, i.e. when a dependency graph requires both package[extra-a] and package[extra-b]. The add-on school wants the resolver to perform a union (i.e. package[extra-a, extra-b]); the selector school wants them to either conflict, or resolved via additional rules (e.g. either package[extra-a] or package[extra-b] should be chosen to represent package depending on which is specified by the user). And there’s no way to tell which of them is right or wrong since there’s no definition at all, and, again, even PyPA tools do different things within themselves.

I am personally okay with either of the interpretations, as long as there is one. But we need to have that conversation, otherwise extras will continue to be this thing that everyone understands differently but no tools implement “correctly” (read: matching users’ expectation). The proposal in the top post is a way to resolve the interpretations by proposing a hard concrete rule, and let expectations be derived. The proposal can very well be rejected, but only if we agree on an alternative one.

4 Likes

Thanks for the extra rationale, that’s helpful.

I guess the understanding boils down to extra meaning “extra requirement” vs “extra marker”. It seems like the former is more likely to be the original intention, so there’s a solid case for setting that as the definition and breaking anyone who’s misused it since. Though it’s also easy to describe additional requirements in terms of extra environment markers without losing the more complex functionality.

It’s a general rule in language design that users will always find more complex ways to combine your simple primitives. In many ways, that’s the entire story of Python. It would be nice to go with that rather than resist it, but sometimes a line ought to be drawn for technical, educational or compatibility reasons.

I like the generality of environment markers (with the extra-requires section being defined as adding install requirements with those markers), and unless we drastically rework how packages handle machine-specific dependencies I imagine we’ll see more complexity here.

I don’t see any specification for how multiple extras are handled by markers, so that seems like fertile ground for an update. Assuming extras are passed through the checks separately (seems logical), the not equals check isn’t worth much, and if they’re passed all at once then string comparisons are useless.

I have a couple of ideas for how this could be defined, but they’re based on language design principles and what’s in PEP 508 already, so I’d rather hear from those of you directly impacted by the user feedback. (Though I do feel comfortable pushing back against “they’re using it wrong” kind of arguments, since we don’t have a clear idea of right or wrong here :wink:)

In practice, I strongly believe that simple primitives with well-defined semantics can be usefully and powerfully composed in ways that the original designers may not have thought of¹. And this is an incredibly good thing, and we should embrace it. The problem with extras is that we don’t have the “well-defined semantics” part of the equation. Adding that after the fact without invalidating something that someone has done, is likely to be hard - not least because we don’t have a good means of finding out “what people did with this”.

¹ My pure mathematician background is showing here :slightly_smiling_face:

1 Like

The current implementation (InstallRequirement.match_markers(), which calls into packaging.markers) is to evaluate each extra against the marker separately and then combine them. So the dependencies of package[extra1,extra2] would return the union of three marker evaluations:

  1. marker.evaluate({'extra': ''})
  2. marker.evaluate({'extra': 'extra1'})
  3. marker.evaluate({'extra': 'extra2'})

The != operator breaks this logic (all inequity operators do) since dependency; extra != 'extra1' would evaluate to true for evaluate({'extra': ''}) and evalute({'extra': 'extra2'}) and apprear as a dependency, even though extra1 is specified.

I am not aware how pip implemented this prior to PEP 508, and it might as well be that pip’s current implementation is wrong and should be fixed. I find it unlikely, however, since packaging.markers does not offer a reasonable interface to compare multiple extra values to a marker all at once.

1 Like

I suspect you’re imagining that more design went into the current implementation than actually happened :wink:

The good news is that because it’s essentially useless, nobody is using it right now and it can be changed. So if we wanted to define semantics for handling multiple extras (in PEP 508), that’s not going to break the world.

There’s about three discussions going on right now in this area, and to me it seems like there’s legitimate value in improving this, so I think we should see how those turn out before committing to removing or changing anything.

I was thinking about how the pass extras all at once logic can be implemented in packaging, and suddenly realised I’ve been feeling weird with the idea because the == expression does not make sense in the first place. Currently the extra marker (extra == 'some_extra') kind of implies one extra is passed. So how should it be evaluated for package[some_extra, another_extra]? The only implementation that satisfies the expression I can think of would be something like

class Extra:
    def __init__(self, identifiers: Collection[str]):
        self._identifiers = identifiers

    def __eq__(self, value: str) -> bool:
        return value in self._identifiers

But that’s obviously a very unintuitive definition, and might cause more confusions down the road. I’m not sure what to make of this, maybe it means we should go with the current do string comparison separately and union approach (in which the == operator makes sense), or something else.

1 Like

When I mentioned earlier I had a few ideas that could work, this was my favourite one (with the rest fleshed out). The nice part is that it just defines the interface of the extras value, which is currently undefined, and we can keep it compatible with most existing use.

Still not sure we’re ready to go for anything at all yet though, based on the other threads on this topic that have been active today.