Handling of Pre-Releases when backtracking?

notatallshaw · January 11, 2024, 8:04pm

Thanks for all the context! I would like to mention some real world use cases to be aware of (to be clear not saying any of them should stop adopting something simpler to reason about):

I see pre-release specifiers used in the wild in three different ways that would not work well with the -pre flag Pip offers:

A single dependency that doesn’t yet have a final release, or at least a final release for a given platform (came up a lot with the release of Apple Silicon)
A single dependency that has released a new feature that is only in pre-release
A hack to test a subset of dependencies for their pre-release versions (e.g. How do I exclude Pip selecting any pre-release? · Issue #12470 · pypa/pip · GitHub)

I’ve seen these in the wild, but I think it was mostly to handle legacy versions, which Pip still supports (but emits a deprecation warning).

There certainly are users who would take advantage of this. But, at least in my experience, they make dependency resolution significantly more difficult. Simpler algorithms, like the mostly DFS one that resolvelib uses, might struggle in a world of these kind of dependencies.

I would add that when version or specifier specifications that include conditional or optional behavior have an outsized impact on either the complexity, or ambiguity, of how dependency resolvers must handle them. For Python specifiers I’m are of two features that have this issue in the way they are defined:

Pre-releases
Extras

pf_moore · January 11, 2024, 9:12pm

Having some sort of per-project “allow prereleases” option (--allow-pre-for=numpy?) would be a perfectly reasonable thing to offer, and well within the spitit of what even PEP 440 says (i.e., it’s not a “versioning v2” type of change). I think such an option would address these 3 cases.

I can (and probably should) go and look this up myself, but I thought all resolvelib cared about was “does this version satisfy this specifier?” Can you point me at where anything deeper is needed? You’ve worked on resolvelib a lot more recently than I have, so my knowledge could be out of date, but I thought the API was so highly abstracted that pretty much any concept of versions satisfying requirements was sufficient.

Pre-releases are a PITA (as we’re seeing here) because “what matches” is context-dependent in a nasty way. If ease of implementation is the criterion, that context dependency a flaw in the spec - although I’m hesitant about prioritising ease of implementation over user friendliness^[1]. This can certainly be solved in a versioning v2.

Extras are an utter abomination, and the way they need to be bolted onto the resolver machinery is a horrific bodge that has no right to exist^[2]. That’s a problem with the way the semantics of extras is defined, though, and unrelated to versioning. If I had my way, I’d remove the whole concept of extras - unfortunately they do solve real-world problems and they are very widely used, so even if we replaced them with something better designed, the transition issues would be insurmountable. So we’re stuck with them, unfortunately.

Luckily, I also think that the current spec isn’t particularly user-friendly in this regard ↩︎
apologies if anyone feels I sugar-coated this a bit too much ↩︎

notatallshaw · January 11, 2024, 9:39pm

This is true (excluding bugs), as long as you don’t care about performance.

However, to get resolvelib to be performant in the current Python ecosystem you have to make preference choices that make assumptions about how dependency graphs look (in some topological sense), the introduction of Boolean logic beyond and would break those assumptions. I speculate this would lead to needing advanced SAT solvers (probably written in a more performant language) to get performant resolution times.

From a dependency resolution perspective something that would make them much easier to reason about would be changing them from “optional extras” to “required extras”. E.g. if a user specifies pandas[excel] then the resolved version must include the excel extra.

It feels to me that this is a possible change. But maybe I’m underestimating the consequences.

steve.dower · January 11, 2024, 9:45pm

FWIW, I regularly rely on using specs like numpy>1.0.0a1 (not specifically for numpy, but since it’s the current example…) as dependencies when I need to rely specifically on a prerelease version. Often in a pyproject.toml to use a prerelease backend, so there’s no other way to specify that it should be a prerelease right now.

I guess if numpy>{largest non-prerelease version} would find a later prerelease until there was a non-prerelease to find, that would also work, arguably better.

But I don’t think a command-line option sufficiently covers the scenario, at least not without changing vastly more tooling than just pip.

notatallshaw · January 11, 2024, 10:24pm

You may find Poetry helpful (Handling of Pre-Releases when backtracking? - #11 by notatallshaw).

steve.dower · January 11, 2024, 10:33pm

Afraid not, and in general, this kind of post is never going to be a useful contribution to a discussion. I’d suggest avoiding it, or at least saving it for the new users who are asking for help.

notatallshaw · January 11, 2024, 10:45pm

Sorry, I’m not sure how to have meta discussions on discourse as I can’t split off or create sub-threads. I kept the message intentionally short as not to pollute the discussion, but now I’m at a loss on how to handle this.

The message did seem of value though because I don’t think it’s widely known that Poetry implements this part of PEP 400, while Pip does not, which solves the specific use case you’re outlining.

So from my point of view it’s a value add to users who might come across this discussion, even if it’s not specifically helpful for you (presumably because using Poetry is a lot more involved?). But if that sort of value is too meta for the rules of this forum I apologize.

pf_moore · January 11, 2024, 11:15pm

Yeah, that makes sense. Although it does trigger the “what about when there’s multiple specs” problem. If you have one dependency that says numpy > 1.0.0a1 and another that says numpy > 1.0.0, then in theory, 1.0.0a2 doesn’t satisfy the second constraint (because there’s nothing in it saying “allow prereleases”). So it’s arguable that you’re already depending on ill-defined behaviour. But it’s certainly good enough until we standardise something better.

I definitely agree it’s a use case that any “versioning 2.0” spec needs to address - whether by the “if you include a pre-release they are considered” mechanism or something different, I’m not sure. The difficulty will be how to define something that’s “global” and doesn’t change as evaluation (resolution) progresses - resolvers can discover new constraints as they progress, and having the discovery of a new constraint retroactively alter the set of versions under consideration is problematic at best…

EpicWink · January 12, 2024, 2:17pm

The dependency specifiers spec indicates this is already the case. I think I have misunderstood you.