PEP 665: Specifying Installation Requirements for Python Projects

One example would be pip’s resolver before 2020. It recursively visit requested dependencies (with matching environment marker) and pick the first thing it sees that works with what it currently knows. If anything ever comes up down the road that invalidates the previous solution, it simply ignores it and carries on (a more “robust” installer implementation can choose to error out immediately instead of ignore the conflicting constraints).

My previous comment was ambiguous and is likely the source of misunderstanding here. The lock file does not promise to contain exactly one solution for each environment, but at least one; in this situation there are two valid solutions, and the installer would be free to choose either. What the lock file promises, though, is that whichever solution the installer chooses at this point is going to be valid in the end, and the installer will never need to perform backtracking or conflict resolution or whatever, i.e. the “complex” part. The approach taken by existing resolvers is that since the user does not specify further, they should accept either solution, so the tool just picks one arbitrarily (but consistently to avoid confusion in practice).

1 Like

The Software Heritage project has developed the Software Heritage ID format, or SWHID. A SWHID is a persistent identifier. Looking forward, I suggest to replace the url field with a swhid field. In a SWHID the item of interest is separated from the url at which it can be found. In Nixpkgs we do this as well behind the scenes, because it should not matter where an artifact is fetched from, it’s just additional info.

Note Software Heritage only deals with source code, so I am not quite sure how it (the SWHID) would have to be adapted to deal with artifacts such as sdists and wheels.

1 Like

But that would require either hitting an external service to resolve what the ID resolves to in terms of a URL and a key design point of the PEP is that an installer does not need to contact any third parties to resolve what to download and install.

2 Likes

Why not both?

To go from just a core identifier to an object, a resolver is indeed needed. However, by also including the origin field (URI) that is not needed, as we effectively get the url as proposed now. Note I do find it a bit of a pity they use SHA1 only and not support SRI.

How do lockers guarantee that backtracking is never needed? E.g. a simple case:

Top-level requirements: A, B

A’s requirements:

  • B == 1; sys_platform == "win32"
  • B == 2; sys_platform == "darwin"

Now, suppose the installer happens to process the top-level requirements in the order B first, then A. Since both B == 1 and B == 2 have to be listed in the lockfile, the installer has to pick one. Presumably it will pick B == 2, because that’s the latest version. But on windows, this will later turn out to be a mistake, forcing it to backtrack…

Does that make this an invalid lockfile, or… what?

1 Like

(Disclaimer: I don’t actually work on a platform-agnostic locker, so this is only what I persume they’d do.) One way would be to split the top-level B requirement into ["B==1; sys_platform == 'win32'", "B; sys_platform == 'darwin'"], although this could be undesired if we’re going to use needs to record the “raw” requirements.

Also I recall a lockfile format (Poetry? don’t really remember) I researched has a platform marker field on individual package sections to indicate the package entry is only valid on certain platforms (kind of like the top-level marker field in PEP 665 but for each package), which feels like a good solution to this. Having marker (and perhaps tags?) in each package entry also makes it more similar to the top-level metadata field, and I like this kind of mirroring personally.

1 Like

If we propagate and combine all markers pertaining to a package in the lock file does that mean we don’t need to keep the markers in needs since the resolution would be at the package version level as to what to install?

If we were to do this, my questions would become:

  1. Do we then make needs just list package names?
  2. Does this change people wanting to record the original input to the locker’s resolver?

Wouldn’t hurt for symmetry.

Would this lower the computational overhead for the installer?

1 Like

PEP 665: add some open issues · python/peps@2147ddc · GitHub added some open issues:

  1. Record the creation date by @ncoghlan
  2. Idea to pin build dependencies by @kushaldas
  3. Record the original inputs to the locker’s resolver by @ncoghlan and @njs
  4. Make keeping a lock file in pyproject-lock.d optional by @steve.dower (and privately @eric.snow )
1 Like

I imagine this should only be used when there are multiple package entries under a name to “guide” installers to one of them directly, and in that case yes the installer can do less calculation.

1 Like

Add the idea of allowing marker and tags in [package] tables via Add an open issue about `marker` and `needs` per package version · python/peps@2d7c608 · GitHub (and w/ follow-up commits to fix the markup :sweat_smile:).

1 Like

Things have settled down here, so I think it’s time to drive towards closing out the open issues. Here they are. Any we can’t reach consensus around will be decided by the authors of the PEP.

Allow for Tool-Specific type Values

It has been suggested to allow for custom type values in the
code table. They would be prefixed with x- and followed by
the tool’s name and then the type, i.e. x-<tool>-<type> . This
would provide enough flexibility for things such as other version
control systems, innovative container formats, etc. to be officially
usable in a lock file.

Support Variable Expansion in the url field

This could include predefined variables like PROJECT_ROOT for the
directory containing pyproject-lock.d so URLs to local directories
and files could be relative to the project itself.

Environment variables could be supported to avoid hardcoding things
such as user credentials for Git.

Don’t Require Lock Files Be in a pyproject-lock.d directory

It has been suggested that since installers may very well allow users
to specify the path to a lock file that having this PEP say that
"MUST be kept in a directory named pyproject-lock.d " is pointless
as it is bound to be broken. As such, the suggestion is to change
“MUST” to “SHOULD”.

Record the Date of When the Lock File was Generated

Since the modification date is not guaranteed to match when the lock
file was generated, it has been suggested to record the date as part
of the file’s metadata. The question, though, is how useful is this
information and can lockers that care put it into their [tool]
table instead of mandating it be set?

Locking Build Dependencies

Thanks to PEP 518, source trees and sdists can specify what build
tools must be installed in order to build a wheel (or sdist in the
case of a source tree). It has been suggested that the lock file also
record such packages so to increase how reproducible an installation
can be.

There is nothing currently in this PEP, though, that prohibits a
locker from recording build tools thanks to metadata.needs acting
as the entry point for calculating what to install. There is also a
cost in downloading all potential sdists and source trees, reading
their pyproject.toml files, and then calculating their build
dependencies for locking purposes for which not everyone will want to
pay the cost for.

Recording the Requires-Dist Input to the Locker’s Resolver

While the needs key allows for recording dependency specifiers,
this PEP does not currently require the needs key to record the
exact Requires-Dist metadata that was used to calculate the
lock file. It has been suggested that recording the inputs would help
in auditing the outcome of the lock file.

If this were to be done, it would be an key named requested which
lived along side needs and would only be specified if it would
differ from what is specified in needs .

3 Likes

Thank you. No further objections, your honour :smiley:

1 Like

Thank you for this PEP, Brett.
I’m commenting here as a developer from a Platform Provider (Azure Functions, i.e serverless) so hopefully I can provide that perspective. I should also note that this is my first time commenting on this platform, so please let me know if I’m ignoring any best practices; thanks.

Recording the date of when a lock file is generated would be really useful to my job. When investigating user-reported incidents, especially those where a customer app suddenly starts acting “different” in some way, having this kind of metadata is key to providing a root-cause analysis.

However, I don’t have strong feeling as to where to store these timestamps; just that ideally they would be stored somewhere.

In Azure Functions, we structure Python apps in a slightly unconventional way (with respect to local apps/libraries) and so I support having flexibility in where and how to store these lock files.

and that’s all I had to say :wink: .
The rest of the PEP seems reasonable to me and I think it would be a positive tool if accepted.

3 Likes

I still have a more general unresolved issue: fundamentally the spec still seems to be “lockers should do the same thing poetry/pdm do” and “installers should support whatever poetry/pdm generate”, even though no-one has explained how poetry/pdm actually work, what invariants they implement, whether the algorithms are sound, etc.

It might well be fine? But from the current text, I don’t think it’s possible to evaluate whether this spec will be usable in the long term, or whether it’s another autoSpaceLikeWord95.

PS: the insistence on using a non-standard verb for requires remains inexplicable. I don’t like the word requires, it took me ages to get used to it. But our existing bikesheds are all the same color, so why are we insisting that this one needs to be green instead?

3 Likes

Speaking as potential PEP-delegate for this, I’d like to see some clarity around this. It may be that my lack of experience with use cases for lockfiles is the issue here (and to that end, if anyone else wants to offer to be PEP-delegate I’d be happy to hand over the task!) but equally I’m not sure the PEP should need in-depth knowledge to be understandable (as a pip maintainer, I could be required to implement support for the PEP, so I don’t think I can realistically say “I’m not the target audience”).

I’ll re-read the latest version of the PEP in the light of the recent discussion, just to make sure there’s not been an improvement I’ve missed, but I don’t remember seeing anything in my notifications.

As a specific point regarding the installer side of things, my naive assumption is that a “lockfile” should lock everything. So talk about needing a resolver to install from a lockfile confuses me, surely that’s what “being locked” means? I certainly hope that a “lockfile installer” could be substantially simpler than pip or Poetry… (To be clear, while I don’t mind if someone wants to clarify here, ultimately my point is that this information belongs in the PEP so that people wanting to implement an installer know what they are getting into from reading the spec alone).

1 Like

[…]

As a specific point regarding the installer side of things, my
naive assumption is that a “lockfile” should lock everything. So
talk about needing a resolver to install from a lockfile confuses
me, surely that’s what “being locked” means? I certainly hope that
a “lockfile installer” could be substantially simpler than pip or
Poetry… (To be clear, while I don’t mind if someone wants to
clarify here, ultimately my point is that this information belongs
in the PEP so that people wanting to implement an installer know
what they are getting into from reading the spec alone).

In OpenStack projects we’ve done something similar for roughly the
past 7-8 years now, originally in order to work around the
shortcomings of the old dep solver in pip. All our projects include
loose/open-ended install_requires lists. We install them together
into a virtualenv and then run pip freeze and record that to a
file. That file can later be supplied to pip -r or -c in order to
reproduce or constrain a subset of this reference environment, and
any updates to it can be similarly tested during code review to make
sure it’s coherent and usable.

It also serves to confirm that these projects developed by
independent teams are co-installable within a single environment,
and particularly with the advent of the new dep solver in pip, that
they’re not declaring incompatible versions of their requirements.
When our projects branch for stable maintenance, we stop updating
those frozen dependency sets in order to avoid introducing future
instability into tests for backported security fixes and the like.

This is essentially a “lockfile” as I’ve seen done in other package
ecosystems, used for recreating and tracking a reproducible test or
build environment for the software. I really hate that term though,
as it’s easily conflated with file-based mutexes for locking between
processes.

2 Likes

Thanks, that matches with my understanding of the term “lockfile” in this context.

Thinking about this some more, the key point is that I’m comparing PEP 665 with pip freeze output (i.e., requirements files with every package listed and fully locked versions). A minimal installer can consume such a requirements file even if:

  • It has no resolution logic at all (just “download a wheel for the current platform for project A version X.Y.Z”)
  • It doesn’t read dependency information from wheels at all (because the requirements specify everything)
  • It ignores extras (because extras only add depenencies, and we’ve already said those are all included in the requirements file).

I don’t think an installer that is this limited can consume a PEP 665 lockfile. Am I right? If so, then there is at least one use case that isn’t covered by the PEP¹. At a minimum, the PEP should be explicit that such installation environments will have to continue to rely on requirements files (and therefore project management tools will need to continue to support locking to requirements files if they expect to support such environments).

Maybe the packaging library could include an API that reads a PEP 665 lockfile and emits a series of name/version pairs. If that were possible, then I guess that would address my concerns (as long as the authors of packaging felt that the spec was clear enough to allow such an implementation). If it’s not possible, though, then what makes it any more likely that the installer can implement the logic?

I’d also note that personally, even if I’m using an installer that’s more capable than this (for example, pip) I still think of locking in terms of this sort of pip freeze output. So if I were using a lockfile mechanism like PEP 665, and (for example) trying to debug an issue, I’d expect to be able to read the lockfile as if it were such a list of name/version constraints and reason about it in that form. In particular, I would want something that meant I didn’t have to think about dependency resolution logic. Because reasoning about dependency resolution for a project with hundreds of dependencies is hard. I’m locking so that I can avoid needing to do that in my deployment environment, and I can be sure that having done it in my build environment, I’m finished with the problem.

¹ It’s not completely theoretical, either - if I were to write a deployment tool, that’s precisely the level of installer I’d write - “given a list of name/version pairs, install them”.

1 Like

This is not the intention, at least not mine, so if that’s the impression readers get, we should probably rewrite the text to avoid it. I explained my vision to the specification earlier in the thread; the goal is to encode a valid resolution result obtained by a dependency resolver, so the result can be used by another tool to perform the installation part. The fact that Poetry and PDM are prominently mentioned is mainly because they are the most widely used tools in the Python ecosystem that has this “resolving is separate from installing” notion (Pipenv is another one, but although it has lock files, the locking and installing steps are less separate in its workflow).

Or maybe the above is (a very brief version of) the explaination you’re looking for? If so, I can definitely add/rewrite some opening paragraphs in the document.

It should. The only requirements for an installer are:

  • Recursively find package entries from needs
  • Environment markers (to selectively eliminate needs and package entries it should not go into)
  • Choose a valid wheel for installation
  • Hash-checking

Personally I feel it is entirely possible to replace most requirement (and constraint) files with PEP 665. The only things not replacible are index information and some pip-specific logic, which we can gradually standardise afterwards. I definitely envision pip freeze to emit this format instead of requirements.txt eventually.

1 Like

To other authors of PEP 665: I think I’m going to draft some additional paragraphs to put in Motication and Rationale that

  1. Describe the specify-install-freeze workflow currently (most widely used in Python), and how a lock file can be used in it.
  2. Describe the specify-lock-install workflow (more widely adopted by modern packaging tools), and how it differs and improves the freeze-based model.
  3. Describe how the lock file (and PEP 621) fit in both workflows.
  4. What lockers and installers are (there is already a brief description and I will incoporate it), why the locker and installer should be separated (also already included), and how information can be passed from locker to installer (via the lock file).
  5. How to minimise what an installer needs to implement, but strike a balance to avoid overloading the resolver (lazily evaluate environmenr markers).
1 Like