PEP 665: Specifying Installation Requirements for Python Projects

PEP 665: add some open issues · python/peps@2147ddc · GitHub added some open issues:

  1. Record the creation date by @ncoghlan
  2. Idea to pin build dependencies by @kushaldas
  3. Record the original inputs to the locker’s resolver by @ncoghlan and @njs
  4. Make keeping a lock file in pyproject-lock.d optional by @steve.dower (and privately @eric.snow )
1 Like

I imagine this should only be used when there are multiple package entries under a name to “guide” installers to one of them directly, and in that case yes the installer can do less calculation.

1 Like

Add the idea of allowing marker and tags in [package] tables via Add an open issue about `marker` and `needs` per package version · python/peps@2d7c608 · GitHub (and w/ follow-up commits to fix the markup :sweat_smile:).

1 Like

Things have settled down here, so I think it’s time to drive towards closing out the open issues. Here they are. Any we can’t reach consensus around will be decided by the authors of the PEP.

Allow for Tool-Specific type Values

It has been suggested to allow for custom type values in the
code table. They would be prefixed with x- and followed by
the tool’s name and then the type, i.e. x-<tool>-<type> . This
would provide enough flexibility for things such as other version
control systems, innovative container formats, etc. to be officially
usable in a lock file.

Support Variable Expansion in the url field

This could include predefined variables like PROJECT_ROOT for the
directory containing pyproject-lock.d so URLs to local directories
and files could be relative to the project itself.

Environment variables could be supported to avoid hardcoding things
such as user credentials for Git.

Don’t Require Lock Files Be in a pyproject-lock.d directory

It has been suggested that since installers may very well allow users
to specify the path to a lock file that having this PEP say that
"MUST be kept in a directory named pyproject-lock.d " is pointless
as it is bound to be broken. As such, the suggestion is to change
“MUST” to “SHOULD”.

Record the Date of When the Lock File was Generated

Since the modification date is not guaranteed to match when the lock
file was generated, it has been suggested to record the date as part
of the file’s metadata. The question, though, is how useful is this
information and can lockers that care put it into their [tool]
table instead of mandating it be set?

Locking Build Dependencies

Thanks to PEP 518, source trees and sdists can specify what build
tools must be installed in order to build a wheel (or sdist in the
case of a source tree). It has been suggested that the lock file also
record such packages so to increase how reproducible an installation
can be.

There is nothing currently in this PEP, though, that prohibits a
locker from recording build tools thanks to metadata.needs acting
as the entry point for calculating what to install. There is also a
cost in downloading all potential sdists and source trees, reading
their pyproject.toml files, and then calculating their build
dependencies for locking purposes for which not everyone will want to
pay the cost for.

Recording the Requires-Dist Input to the Locker’s Resolver

While the needs key allows for recording dependency specifiers,
this PEP does not currently require the needs key to record the
exact Requires-Dist metadata that was used to calculate the
lock file. It has been suggested that recording the inputs would help
in auditing the outcome of the lock file.

If this were to be done, it would be an key named requested which
lived along side needs and would only be specified if it would
differ from what is specified in needs .

3 Likes

Thank you. No further objections, your honour :smiley:

1 Like

Thank you for this PEP, Brett.
I’m commenting here as a developer from a Platform Provider (Azure Functions, i.e serverless) so hopefully I can provide that perspective. I should also note that this is my first time commenting on this platform, so please let me know if I’m ignoring any best practices; thanks.

Recording the date of when a lock file is generated would be really useful to my job. When investigating user-reported incidents, especially those where a customer app suddenly starts acting “different” in some way, having this kind of metadata is key to providing a root-cause analysis.

However, I don’t have strong feeling as to where to store these timestamps; just that ideally they would be stored somewhere.

In Azure Functions, we structure Python apps in a slightly unconventional way (with respect to local apps/libraries) and so I support having flexibility in where and how to store these lock files.

and that’s all I had to say :wink: .
The rest of the PEP seems reasonable to me and I think it would be a positive tool if accepted.

3 Likes

I still have a more general unresolved issue: fundamentally the spec still seems to be “lockers should do the same thing poetry/pdm do” and “installers should support whatever poetry/pdm generate”, even though no-one has explained how poetry/pdm actually work, what invariants they implement, whether the algorithms are sound, etc.

It might well be fine? But from the current text, I don’t think it’s possible to evaluate whether this spec will be usable in the long term, or whether it’s another autoSpaceLikeWord95.

PS: the insistence on using a non-standard verb for requires remains inexplicable. I don’t like the word requires, it took me ages to get used to it. But our existing bikesheds are all the same color, so why are we insisting that this one needs to be green instead?

3 Likes

Speaking as potential PEP-delegate for this, I’d like to see some clarity around this. It may be that my lack of experience with use cases for lockfiles is the issue here (and to that end, if anyone else wants to offer to be PEP-delegate I’d be happy to hand over the task!) but equally I’m not sure the PEP should need in-depth knowledge to be understandable (as a pip maintainer, I could be required to implement support for the PEP, so I don’t think I can realistically say “I’m not the target audience”).

I’ll re-read the latest version of the PEP in the light of the recent discussion, just to make sure there’s not been an improvement I’ve missed, but I don’t remember seeing anything in my notifications.

As a specific point regarding the installer side of things, my naive assumption is that a “lockfile” should lock everything. So talk about needing a resolver to install from a lockfile confuses me, surely that’s what “being locked” means? I certainly hope that a “lockfile installer” could be substantially simpler than pip or Poetry… (To be clear, while I don’t mind if someone wants to clarify here, ultimately my point is that this information belongs in the PEP so that people wanting to implement an installer know what they are getting into from reading the spec alone).

1 Like

[…]

As a specific point regarding the installer side of things, my
naive assumption is that a “lockfile” should lock everything. So
talk about needing a resolver to install from a lockfile confuses
me, surely that’s what “being locked” means? I certainly hope that
a “lockfile installer” could be substantially simpler than pip or
Poetry… (To be clear, while I don’t mind if someone wants to
clarify here, ultimately my point is that this information belongs
in the PEP so that people wanting to implement an installer know
what they are getting into from reading the spec alone).

In OpenStack projects we’ve done something similar for roughly the
past 7-8 years now, originally in order to work around the
shortcomings of the old dep solver in pip. All our projects include
loose/open-ended install_requires lists. We install them together
into a virtualenv and then run pip freeze and record that to a
file. That file can later be supplied to pip -r or -c in order to
reproduce or constrain a subset of this reference environment, and
any updates to it can be similarly tested during code review to make
sure it’s coherent and usable.

It also serves to confirm that these projects developed by
independent teams are co-installable within a single environment,
and particularly with the advent of the new dep solver in pip, that
they’re not declaring incompatible versions of their requirements.
When our projects branch for stable maintenance, we stop updating
those frozen dependency sets in order to avoid introducing future
instability into tests for backported security fixes and the like.

This is essentially a “lockfile” as I’ve seen done in other package
ecosystems, used for recreating and tracking a reproducible test or
build environment for the software. I really hate that term though,
as it’s easily conflated with file-based mutexes for locking between
processes.

2 Likes

Thanks, that matches with my understanding of the term “lockfile” in this context.

Thinking about this some more, the key point is that I’m comparing PEP 665 with pip freeze output (i.e., requirements files with every package listed and fully locked versions). A minimal installer can consume such a requirements file even if:

  • It has no resolution logic at all (just “download a wheel for the current platform for project A version X.Y.Z”)
  • It doesn’t read dependency information from wheels at all (because the requirements specify everything)
  • It ignores extras (because extras only add depenencies, and we’ve already said those are all included in the requirements file).

I don’t think an installer that is this limited can consume a PEP 665 lockfile. Am I right? If so, then there is at least one use case that isn’t covered by the PEP¹. At a minimum, the PEP should be explicit that such installation environments will have to continue to rely on requirements files (and therefore project management tools will need to continue to support locking to requirements files if they expect to support such environments).

Maybe the packaging library could include an API that reads a PEP 665 lockfile and emits a series of name/version pairs. If that were possible, then I guess that would address my concerns (as long as the authors of packaging felt that the spec was clear enough to allow such an implementation). If it’s not possible, though, then what makes it any more likely that the installer can implement the logic?

I’d also note that personally, even if I’m using an installer that’s more capable than this (for example, pip) I still think of locking in terms of this sort of pip freeze output. So if I were using a lockfile mechanism like PEP 665, and (for example) trying to debug an issue, I’d expect to be able to read the lockfile as if it were such a list of name/version constraints and reason about it in that form. In particular, I would want something that meant I didn’t have to think about dependency resolution logic. Because reasoning about dependency resolution for a project with hundreds of dependencies is hard. I’m locking so that I can avoid needing to do that in my deployment environment, and I can be sure that having done it in my build environment, I’m finished with the problem.

¹ It’s not completely theoretical, either - if I were to write a deployment tool, that’s precisely the level of installer I’d write - “given a list of name/version pairs, install them”.

1 Like

This is not the intention, at least not mine, so if that’s the impression readers get, we should probably rewrite the text to avoid it. I explained my vision to the specification earlier in the thread; the goal is to encode a valid resolution result obtained by a dependency resolver, so the result can be used by another tool to perform the installation part. The fact that Poetry and PDM are prominently mentioned is mainly because they are the most widely used tools in the Python ecosystem that has this “resolving is separate from installing” notion (Pipenv is another one, but although it has lock files, the locking and installing steps are less separate in its workflow).

Or maybe the above is (a very brief version of) the explaination you’re looking for? If so, I can definitely add/rewrite some opening paragraphs in the document.

It should. The only requirements for an installer are:

  • Recursively find package entries from needs
  • Environment markers (to selectively eliminate needs and package entries it should not go into)
  • Choose a valid wheel for installation
  • Hash-checking

Personally I feel it is entirely possible to replace most requirement (and constraint) files with PEP 665. The only things not replacible are index information and some pip-specific logic, which we can gradually standardise afterwards. I definitely envision pip freeze to emit this format instead of requirements.txt eventually.

1 Like

To other authors of PEP 665: I think I’m going to draft some additional paragraphs to put in Motication and Rationale that

  1. Describe the specify-install-freeze workflow currently (most widely used in Python), and how a lock file can be used in it.
  2. Describe the specify-lock-install workflow (more widely adopted by modern packaging tools), and how it differs and improves the freeze-based model.
  3. Describe how the lock file (and PEP 621) fit in both workflows.
  4. What lockers and installers are (there is already a brief description and I will incoporate it), why the locker and installer should be separated (also already included), and how information can be passed from locker to installer (via the lock file).
  5. How to minimise what an installer needs to implement, but strike a balance to avoid overloading the resolver (lazily evaluate environmenr markers).
1 Like

OK. So let’s be specific here. If I’m consuming a PEP 665 file, I start with metadata.needs. (BTW, I also hate the term “needs” here - it’s unlike anything else used in packaging. I’m not going to bikeshed, but I hope you change it). Using the example in the PEP that gives me “mousebender”.

package.mousebender.version is 2.0.0, so the first thing I install is mousebender 2.0.0. So far, so good. Now I look at package.mousebender.needs. That says ["attrs>=19.3", "packaging>=20.3"]. So I need attrs and packaging. I assume at this point that I ignore the version constraints, because I want exact versions, not constraints, so I hope I get them later, or I’m in trouble…

package.attrs.version says 21.2.0, and package.packaging.version says 20.9, so that’s the versions of those two that I want. Recursing one more time, packaging needs pyparsing and package.pyparsing.version says 2.4.7. So that’s the final thing I need.

OK, that’s fine. So I guess I have a couple of questions:

  1. What’s the point of the version constraints in needs? The above install algorithm ignores them. Are they just for information?
  2. Why are people saying you need a resolver to install? There’s clearly no resolving going on here - and yet the PEP authors seem to have taken the comments that a resolver is needed seriously. So either I’ve missed something, or a lot of confusion could have been avoided by just saying “nope, installers just have to install what’s specified”.

The package.xxx.code sections, it seems to me, just act as a (highly specific) set of package index entries, just for this install, so the installer doesn’t even need to be able to parse a simple index page. So they support the information I found by the algorithm above, as they tell me how to find the file(s) corresponding to version X of package Y.

Looking back in the discussion, though, Nathaniel said earlier

and Brett explicitly did not say that this was inaccurate:

So I’m confused, because what you said above seems to contradict Brett’s comment here. Or is there some other content that would provide the “flexibility” that Brett is alluding to, which a simple installer like I’m describing couldn’t handle? And if so, how would I know I’d been given such a file, and how would I communicate to the user that they must tell their locker not to create that sort of file if they want to provide that file to me?

3 Likes

This part I understand :-). My problem is: traditionally, a “dependency resolver” is a function from requirements → a list of (package, version) pairs, and an “installer” is a function from a list of package (version, pairs) → a runnable environment. But this PEP uses a new, different definition, where a “dependency resolver” is a function from one set of requirements → a new set of requirements, and an “installer” is a function from that new set of requirements → a runnable environment. Which might be fine? But if you’re using a non-standard framework you have to actually explain how it works and how it relates to the traditional approach, and so far all I’ve gotten is “go read the poetry source code”, which isn’t really helpful.

3 Likes

The specifier is indeed just information (so are marker, extra, URL etc.) in the provided example. But here’s another one that requires them:

[metadata]
needs = ["a > 1 ; python_version >= '3.6'", "a <= 1; python_version < '3.6'"]

[[package.a]]
version = "1"

[[package.a]]
version = "2"

The installer must choose exactly one of the a entries. This requires reading the specifier and marker, which counts as dependency resolution under some definitions.

Note that one of the Open Questions was proposed for this:

[metadata]
needs = ["a > 1 ; python_version >= '3.6'", "a <= 1; python_version < '3.6'"]

[[package.a]]
version = "1"
marker = "python_version < '3.6'"  # This new field.

[[package.a]]
version = "2"
marker = "python_version >= '3.6'"

This would make the information in needs (except the name) purely informational. But it is still uncertain whether this is easy/possible to implement in the resolver (perhaps @frostming can provide a more educated assessment). But this marker evolution + version selection logic may still count as a dependency resolver (it’s almost what pip did before resolvelib…)

Personally I categorise the PEP’s definition as “multiple lists of package-version pairs merged into one graph with coinditional edges so conditional package requirement is possible without much duplication”. The strict one list of pairs definition is only guaranteed to work if the specification only aims to be applied to one well-defined runtime environment, which Python package dependency graphs often do not satisfy (because we have environment introspection based conditional requirements).

2 Likes

[…]

here’s another one that requires them:

[metadata]
needs = ["a > 1 ; python_version >= '3.6'", "a <= 1; python_version < '3.6'"]

[[package.a]]
version = "1"

[[package.a]]
version = "2"

The installer must choose exactly one of the a entries. This requires reading the specifier and marker, which counts as dependency resolution under some definitions.
[…]

Some alternative example from OpenStack’s upper-constraints.txt,
where strict package===version pairs are combined with
python_version differentiators for use with pip:

autobahn===21.2.1;python_version=='3.6'
autobahn===21.3.1;python_version=='3.8'
contextvars===2.4;python_version=='3.6'
dataclasses===0.8;python_version=='3.6'
immutables===0.15;python_version=='3.6'
scipy===1.5.4;python_version=='3.6'
scipy===1.6.1;python_version=='3.8'
numpy===1.19.5;python_version=='3.6'
numpy===1.20.1;python_version=='3.8'

This doesn’t need a resolver which can handle package version
conflicts, it merely needs one which can filter the supplied list
based on the Python interpreter version under which it’s running.

2 Likes

By your definition of a dependency resolver :slightly_smiling_face: Not that I don’t agree with you, but the definition is not universal, which is the problem.

2 Likes

Yeah, for sure. But it’s not enough to show that the classic definition is inadequate, you also have to show that the replacement is workable and sound :-).

2 Likes

How?

Let’s change the example slightly:

[metadata]
needs = ["a >= 2 ; python_version >= '3.6'", "a >= 1; python_version < '3.6'"]

[[package.a]]
version = "1"

[[package.a]]
version = "2"

On Python 3.5, what is the installer to choose? Either that lockfile is invalid, or the installer needs a resolver (in this case, a simple one that just does “choose the latest version” is enough, but (a) it’s still a resolbver, and (b) I can keep adding complexity as long as you want me to…).

So I repeat my question -

How would such an installer handle the lockfile I included above?

If it can’t, how can it claim to support PEP 665?

1 Like

A variant of this was discussed previously, around here: PEP 665: Specifying Installation Requirements for Python Projects - #98 by njs

The new marker field is an end product of the discussion.

  • Dependencies in a Python runtime environment form a rooted directed graph. The root node represents the user specification (direct dependencies of a project), and nodes directly connected to the root node are the user-specified dependencies. Each other node represents a transitive dependency. A Requires-Dist metadata field is an edge going out of the parent package to its dependency.
  • Graphs are known to be mergable, so dependencies from multiple Python runtime environments sharing the same user specification may be merged into one rooted directed graph.
  • A directed graph can be serialised into a set of nodes and a set of directed edges.
  • A lock file’s metadata table represents the root node. Each package.<name> entry represents a non-root node. A needs field specifies a node’s outward edges. A marker field (proposed previously) specifies the union of a node’s inward edge conditions.
  • Given a directed graph, a subgraph satisfying a given condition can be obtained by selecting all edges satisfying the condition and all nodes with at least one inward edge being selected, i.e. if the union of all inward edge conditions is satisfied.

A formal prove should be possible, but I’m hoping we don’t need to go there…Is the casual write-up good enough, or missing proof somewhere in the inference chain?

1 Like