PEP 665: Specifying Installation Requirements for Python Projects

brettcannon · August 3, 2021, 10:37pm

I tried to address people’s requests below via PEP 665: try to use more specific terminology · python/peps@12c8d47 · GitHub.

It depends on how you develop, hence why they PEP doesn’t try to be overly prescriptive of what a lock file is meant for. Some people, for instance, are very strict in what they develop against and let CI take care of the potential variance. We are not hear to dictate how you develop your software, but to help standardize some practices we have seen in the community.

I have done my best, but we don’t have any official definition to begin with, so this is still going to be personal opinion.

I have no way of knowing that, so we would both be guessing as to whether “requires” is less confusing to advanced users. Plus the beginner who “get” needs now will eventually be the advanced users, it’s a temporary issue (especially since there will inevitably be more Python developers to come than there have been up to this point).

But you forgot to suggest what “different term” to use. Also note that whomever ends up being the PEP delegate for this will not accept/reject it based on the title.

What they said.

Former; I superseded the latter in PEP 621.

You forgot to check the commit I referenced last time PEP 665: clarifications based on feedback · python/peps@ae53120 · GitHub

In our discussions for this PEP, the initial version did what you suggested, but then we brought in others who have tools that provide a lock file today and they said it wasn’t going to be tenable without this flexibility. If you look at Poetry and PDM (who provided the extra feedback), they needed the flexibility in order to support cross-platform lock files which their users have shown to them is a desired outcome.

Now there is nothing stopping you and anyone else from using a locker which is very strict about about versions and such to get the more classic definition of “lock file”. And this was very much on purpose as I want to be able to have very strict, fully reproducible installs on a specific platform.

In terms of PEP 650, I can’t speak for Dustin, but for me that isn’t right (same goes for this PEP). For me, they are on equal footing as I see needs in both arenas at work all the time. But what are you specifically trying to get out of a “primary” versus “secondary” ranking of motivations here?

Yes, url is required as you need some way to specify where to expect to get the code. If an installer wants to provide some potential fallback to something that still validates successfully against the hash, I guess that’s their call, but I personally wouldn’t want that for security reasons.

That’s a correct interpretation.

I don’t personally think so, but people seem to be getting tripped up without it, so I’ve done the best I can.

layday · August 3, 2021, 11:38pm

I did not think that they would. I suggested it because I thought rewording the title and abstract might alleviate some of the confusion. As for what a good alternative might be, the thread does open with “aka lock files”, so that’d probably do for the title. Using “requirements” and “dependencies” interchangeably isn’t great - I’d stick with the latter - and “installer” doesn’t appear to confer any meaning, so I’d drop it entirely. Also, in the abstract, I assume “list of projects” should be “list of packages” or “list of requirements” or “list of dependencies”; but “project” is used to mean something else entirely.

njs · August 4, 2021, 12:00am

Some kind of cross-platform support is definitely important! I am just literally uncertain whether it’s even possible to generate these lock files correctly as currently described. And in particular I don’t see how to generate them using my resolver (which is using the PubGrub algorithm).

Maybe poetry or PDM can describe their resolution algorithm in more detail, and in particular how they handle cross-platform locking?

Also, @uranusjr’s response to me upthread has the implicit assumption that needs will contain exactly the same information as the package requires, so that you can compare them to check if the package has consistent requirements:

So if you’re imagining that we’d be rewriting the needs fields to use == and @ requirements exclusively, then you’d need some secondary place to store the original package requirements (if you want to detect inconsistencies).

I was just referring back to the bit of text I wrote about how PaaS providers definitely 100% need the Python version, so this lock file format isn’t sufficient for them on its own, while IDEs can do useful things with the lock file without that.

brettcannon · August 4, 2021, 12:37am

That’s a @sdispater or @frostming question.

If you want this, then yes (and I suspect that might be what @steve.dower is about to suggest now that I answered one of his clarification questions).

Sorry, I missed that in your reply. You can specify the Python version support via either the marker or tags keys in [metadata] (depending on preference and/or whether you want to support a range of versions or a specific interpreter version), so you can restrict a lock file to a specific Python version (and even generate different lock files for different Python versions).

blackpingus · August 4, 2021, 7:42am

Hi, just a note: in this case git is for git-scm or for a generic dcvs? Or you can use git, hg, etc as with pip requirements url?

ie:

    [[package.packaging.code]]
    type = "hg"
    url = "https://foss.heptapod.net/mercurial/hg-git"
    commit = "d8e74be3a914fd4a80cbf8aa3b354d57855fc968"

Thanks

frostming · August 4, 2021, 10:55am

The lock file is highly environment-related, in a way that the version and/or URL of a particular package varies according to which environment it is installed. So there is a contradiction that the locker that generates the lock file doesn’t know which environment they might be installed. It can only be known by the installer.

PDM (and Poetry similarly) tries to record all possible applicable variants of a particular package in the lock file, by loosening the environment constraints when locking. This requires the installer to perform a quick resolution in installation – traversing through the dependency tree and discard unneeded ones. Ideally, no HTTP requests are needed in this process. In most use cases the locker and installer are run in different environments and the latter is run much more frequently. So cross-platform locking will be very useful in this scenario. This is also why package.<name> is an AoT instead of a table.

steve.dower · August 4, 2021, 12:11pm

In that case, strong -1 on this line from the PEP:

Lock files MUST be kept in a directory named pyproject-lock.d.

Rewording it to a recommendation that “tools/consumers may automatically discover lock files in a pyproject-lock.d directory, but should allow users to specify any path to a specific file” would satisfy me.

Putting a lock file outside of that directory in no way invalidates the fact that it is a perfectly good lock file. The current text makes that so.

(I also really dislike the name itself, but I’m not prepared to start bikeshedding on it since that has presumably already happened extensively behind the scenes.)

uranusjr · August 4, 2021, 1:19pm

For now this is Git-only. We do plan to support more VCS when we identify a need, but don’t want to go into too much detail for now (as you see there are a lot of high-level questions we need to sort out first). The x-<tool>-<type> Open Issue is partly meant to encourage tools trying to gather interest without putting everything in the spec at once.

pradyunsg · August 4, 2021, 7:25pm

I’m here with a wall of text again. I seem to be doing these a lot lately.

There seems to be some amount of “what exactly is this PEP for compared to what we have” undertones. Here’s my attempt at trying to address that, followed by responses to specific questions / comments.

The way I’ve been thinking about this file is that it’s a much better approach for doing what a requirements.txt file (with all the packages pinned, and with hashes for the specific files) wants to achieve. That is the kind of requirements.txt file is what pip-compile generates today – here’s an example. That’s effectively a lockfile. I’m gonna call these “locked requirements.txt” files for the rest of this thread.

There’s many other kinds of requirements.txt files, like weaker attempts at being a lockfile (pinned without hashes), or a bundle of requirements (eg: pip/tests/requirements-common_wheels.txt at main · pypa/pip · GitHub), or something else. This PEP does not care about those. Those are a different usecase from what this PEP is trying to address, but those can be used to generate a valid file in the format this PEP describes. It is also not trying to do anything to substitute an individual package’s declaration of install_requires, although that information is taken from the distribution files and encoded into the generated file.

The initial version of this PEP was basically a 1:1 replacement for locked requirements.txt files [footnote 1].

This is meant to decouple the description of “install these exact packages into the environment” from pip’s requirements.txt format (that is only specified as “whatever pip implements”). Creating a format to describe this information is already something that existing tooling has been doing to varying degrees (pdm, poetry, pipenv). The goal here was to have a well specified common format, that describes what needs to be installed into an environment, to recreate it in a reproducible manner.

As already mentioned, we did a round of “hey, would this work for you?” feedback prior to posting the PEP here and as a result of that, we made the format much more flexible to enable this format to be platform-agnostic. The behaviors possible with this additional flexibility are not really achievable by using locked requirements.txt files today. [footnote 2]

There’s an example below that shows what the “additional flexibility” entails.

Well, you don’t have to be platform-agnostic with this format.

If you’re going to be resolving for a specific platform/environment, you can encode that into the lockfile using metadata.marker and metadata.tags. Those are the only pieces of information about the environment that can affect what dependency or wheel file could be used. By encoding that information, the installer tool can determine whether the lockfile is compatible with the environment it is installing into.

Off-topic-ish: poetry’s resolver is pubgrub based, and it’s probably be a decent idea to take a look at how it handles generating the lockfile.

No. This flexibility seems to be tripping people up, so I’m gonna try giving an example:

Say you have a requirements.txt file like this:

pytest < 4.6
pytest-cov

The potential set of versions of pytest and pytest-cov that are compatible with these exact requirements is pretty huge. You could generate a pyproject-lock.d/very-broad.toml that lists all the potential versions for each of them with all their assets. You could also generate a pyproject-lock.d/strict.toml that has as few versions of each package as possible – usually only the highest version that is possible given the constraints.

The strict.toml is conceptually equivalent to what a locked requirements.txt file, or poetry.lock, or Pipfile.lock, or pdm.lock are today. This is what people usually mean, when they say a lockfile. This works exactly like how you’re expect a lockfile to work.

The very-broad.toml might seem not super useful at first. Why would you want something like that? Well, we’ve transformed what was a pip-specific file format with an implicitly declared package index, to an interoperable tool-agnostic format with exact hashes and URLs. With that, the installer no longer needs to make multiple requests for simple index pages. It can now download these packages in parallel. It can be much simpler than pip / poetry / pdm etc since no longer needs package index interaction logic. Notably, it can also be consumed by a platform, which can create an environment by picking a single version out of sets of packages provided.

And, yes, all these benefits also apply to the strict.toml case. Hopefully, this clarified how this format is both capable of being a lockfile and is a lot more flexible than what the usual expectation would be from a lockfile.

The main reason that this flexibility exists though is because this format is trying to be platform agnostic. You can have different dependency requirements for different environments and since some distribution files can be compatible with multiple environments, so the installer needs to be able to deal with that possibility if this format is going allow generating a single platform agnostic file.

I hope that’s clear by now – this is definitely the first one. It’s not replacing install-requires metadata or acting as a substitute for that in any way.

Could you elaborate further on this? What usecases do we know of where someone would actively want to specify different dependencies on a per-file basis that we want to support universally?

FWIW, it is definitely possible to only have a single file per package pinned in this format, so projects that do this are not excluded from this format in every case – only when the dependencies differ across the files pinned; which may be worth making an explicit required error. I do think we’d want to figure out a good way to communicate this to the users though, with a decent example error message if we decide to make this an error.

to have different dependencies due to having different needs across Python versions or platforms

This is what environment markers are for.

because setup.py makes it trivial to create a package that will have this characteristic

This reads like “it is possible for someone to do this” => “we should actively support this”, which… I disagree with. Not supporting all the misfeatures of existing tooling is a good idea IMO.

[footnote 1]: Okay, so, uhm… the format was not exactly a 1:1 replacement.

It was making one important optimisation (or assumption, pick your word) – if you know the package name, package version, package index and set-of-acceptable-file-hashes, then you effectively know the exact asset that you want to download. Right now, there’s one additional network request for getting the simple index page for the package and extracting the exact asset URL out of that. Theoretically, the package index could use a different URL for every request or whatever, but they practically do not. And those URLs usually need to be stable, if they want to benefit from pip’s caching anyway. The format, instead, encodes the exact URL into the file. This eliminates a network request and, when combined with the fact that dependency information is encoded into the format, this means that the entire process of “figure out what to install” is offline and deterministic.

[footnote 2]: So, theoretically you could be exceedingly ~~hacky~~ intelligent and generate a requirements.txt file that has “merged markers” but… uhm… I’d argue that it’s better and easier to write your own resolver + a lockfile format like this given that is what most people have actually done.

PS: @brettcannon @uranusjr maybe we should’ve bikeshedded the name of PEP a bit more.
PPS: I’m not proof-reading this (I’ve already spent an hour on this) so there’s a decent chance that there’s some stupid typo or a glaring mistake here – please be kind when pointing those out.

brettcannon · August 4, 2021, 7:27pm

But that doesn’t satisfy me.

Depends on your view of how independent a lock file is from a project. It’s not like you want pyproject.toml to be allowed anywhere. Now I suspect you’re going to say, “but that has metadata tied to the project”, and you’re right. But I would argue lock files come from some input and disassociating them from that input (which is expected to be a pyproject.toml file) isn’t necessarily good either. I also don’t want people keeping these lock files in random locations in a project either as that kills how discoverable they are and I need that for tooling purposes.

But why does specifying the directory name cause you issues? __pycache__ can’t be named anything else, same with various other things that can contain multiple filles like .dist-info directories.

steve.dower · August 4, 2021, 8:32pm

You seem to be implying that these are always tool-generated, not human written (and I know that’s been said explicitly in some places, but it doesn’t seem as pervasive an idea as it ought). Perhaps renaming the PEP to “Interchange Format for Installed Environment Requirements” would make it clearer up front?

Pradyun’s example with pytest is helpful here, mainly because of this line:

It would be even better if it went on to say “the installer no longer needs to resolve dependencies between packages and can simply download, validate and install the listed assets”. Then I’d put this in the abstract of the PEP, and in the motivation as well (before getting into the examples).

I still don’t see the criticality of the fixed directory name, as opposed to strongly recommending a convention, but at least if the files are very explicitly meant for generate-and-forget and there’s a strong description of how tools will know which specific one they should use - especially given the multi-platform angle - I guess I can accept that it’s something I don’t personally need to think about. (Still, either a .python or a __something__ name would be more consistent with everything else that is “not source code” but still checked into a repository.)

brettcannon · August 4, 2021, 10:12pm

It’s not quite that simple as there are still marker evaluations to do to choose the appropriate thing from the lock file to, but yes, it is part of the design to cut out needing to hit a server to resolve what to install.

It’s in PEP 665 – A file format to list Python dependencies for reproducibility of an application | peps.python.org and covered in the abstract by:

… not requiring any information beyond the platform being installed for and the lock file listing the required dependencies to perform a successful installation of dependencies.

and in the motivation as part of:

… it can be used to speed up recreating that development environment by minimizing having to query the network or the scope of the possible resolution of dependencies.

Except you may need to look at the files on occasion (and especially during a PR review that updates the files), so hiding them away in a dot directory or something isn’t necessarily an appropriate delegation either.

pylang · August 4, 2021, 11:19pm

It’s trivial in conda to create and use lock files.

Trivial to make, but traditionally not trivial to reproduce.

To be clear, the method you described creates conda environment.yml files, which do not guarantee reproducing identical environments.

I don’t recommend confusing environment.yml files with lock files. Lock files should always reproduce a complete environment. Sadly, environment.yml has been known to fail in the past when recreating large environments or environments with conda + pip packages.

A more robust method is to make what conda calls “spec files.”

conda list --explict > spec-file.txt

However spec files still fall short since they only handle conda packages and ignore included pip packages. conda-lock, mentioned previously, basically wraps around spec-files, but it also does not (yet) handle pip-packages.

See also conda docs on details of environment.yml.

h-vetinari · August 5, 2021, 12:07am

Using an environment.yml with a list of packages is very different from specifying the artefacts down to the build-strings and hashes. It is but a hairs’ width removed from the spec files (which I considered mentioning, but decided to aim for brevity).

Conda will try to handle pip-packages gracefully, but interoperability is not something where much can be done from the side of conda, and a distant concern for lockfiles & reproducibility.

njs · August 5, 2021, 3:36am

Sure – the proposal is definitely flexible enough to describe hyper-specific just-this-one-environment lockfiles. I’m more trying to understand the implications for cross-environment lockfiles – does this proposal force everyone to use a poetry-like algorithm for that, what even is that algorithm and is it sound, are there other strategies for handling cross-environment lockfiles that we should consider. That kind of thing.

So I did poke around in poetry’s source a bit, and I think what I’m looking at is that they have two separate resolvers? One that does a simple kind of backtracking traversal ignoring conflicts/markers to try to approximate the set of packages that a full resolver would pick, that’s used to generate the lockfile, and then the precise pubgrub-based resolver that runs over the output of the first resolver, to pick the actual packages?

I could be totally wrong

Resolving packages in a single environment is already NP-hard and allows for multiple solutions. IIUC the idea in this proposal is that lockfiles should encode solutions to an infinite space of solutions to different NP-hard problems, parametrized by complex boolean expressions, and should produce a deterministic result in all cases for all implementations.

So like, I’m not trying to pick on poetry or anything, I’m not saying I don’t trust the authors, I’m just saying that this seems like an incredibly difficult algorithmic problem. Hopefully, the two-pass approximate-lock-then-resolve is provably guaranteed to produce the same results as running the resolver directly on the full dependency graph for a specific environment, and also provably guaranteed to produce lockfiles that only allow a single valid solution for each environment? But like, that’s some serious math, I can’t figure it out from spending half an hour squinting at some python code :-). Also those might not even be the guarantees that they’re trying to enforce? Can’t tell that from code either.

pradyunsg · August 5, 2021, 9:04am

Ah, okie. It wasn’t clear from the way you phrased that question.

Yup, sounds about right.

Yes. And, we already have tools that do this.

Note that the complex boolean expressions already directly affect the complexity of the dependency resolution process for Python packages – pip is evaluating the markers in the same loop as where it is finding the packages. This PEP is permitting delaying when they get evaluated and introducing a common communication format across the boundary of lock-generator → lock-installer.

I feel it might help to elaborate a bit – the main thing here is that we’re permitting additional complexity for lockers, to delay evaluating the environment markers till the installation step. This is done by separating the “trimming” of the dependency tree using the markers, from the discover all the package versions step. Such lockers now need to be capable of spewing out a list of relevant versions across whatever environments they feel need to be described. As already noted, this additional complexity isn’t required to use the format but it is certainly allowed.

The way I think about this is that we’re creating a separation of the evaluation of the boolean expressions (markers and wheel tags) from “find all the relevant things” step is opt-in, and entail additional complexity in the locking tool. This does mean that a tool generating this lockfile is going to need additional complexity to allow for this separation, but we already have tools doing that, and the authors of those tools have stated that this complexity is worth the user-facing benefits in their opinions.

Yep. An easier way to think about this would be that we’re moving “evaluate the markers + tags” to be separate from “find all the potential package sources”. We’re not changing the information that the dependency resolver would see – we’re changing how easy it is to get that information, by specifying a clear format that has all the information necessary.

No. You could have a tool that runs a “full” resolver with a bunch of different environment markers + wheel tags (say, for the multiple linux flavours that a company uses) and then spew out a single file that can be used for all those environments. Or multiple files, each describing the individual environments. Or bundling together your Linux environments into one lockfile and putting your Windows environments in another. The “merge” code for that will be much simpler than writing a resolver.

That said, at least two tools (poetry, pdm) have existing implementations that don’t “evaluate” environment markers in the first pass, go down both branches, and are capable of spewing out the entire list of versions that you could need when you run a resolver over them. This is about as environment agnostic as you can get, and the PEP certainly should accomodate for the existing solutions for this problem space IMO.

pf_moore · August 5, 2021, 9:46am

Given the level of flexibility you’re saying these files will have, what options do consumers of the files have to decide how much of that flexibility (complexity) they are willing to handle?

To give a concrete example, can I write a tool that is only interested in implementing basic single-platform lockfile functionality, and as such only accepts files that don’t require evaluating markers or choosing between more than one concrete artifact per package? What sort of UI would my tool be capable of providing in order to give helpful information to people who feed it a PEP 665 compliant file that it’s not capable of dealing with?

And what sort of options would be available for lockfile generators to limit the complexity of the files they output?

Or are we expecting all tools¹ to support the full generality of the PEP? (Which seems to require implementing a resolver, if I’m following this discussion…)

¹ At least, all tools that don’t, in effect, define their own subset of PEP 665 that they support.

pf_moore · August 5, 2021, 9:53am

On which point, I’ll note that although this presumably classes as an interoperability spec, I’m not particularly familiar with the sort of scenarios where lock files are important, so if someone else wants to volunteer to act as PEP delegate on this PEP, that would be great. There’s no rush, though, as it’s probably a bit too early to make a decision about this right now.

uranusjr · August 5, 2021, 11:27am

You definitely are allowed to. The information flow implied by PEP 665 is something like

                 locker                      installer
pyproject.toml  ------->  pyproject-lock.d/  -------->  Environment

And since the installer is “in charge” of managing the target environment (in the same sense pip is generally in charge of managing packages in a virtual environment) and should be allowed to do whatever in it within the boundary of other standards (e.g. writing .dist-info directories appropriately). It would be perfectly acceptable if you just error out when you hit something you cannot handle while traversing the tree to collect things to install, e.g. if any of the entries in needs contain any markers. (Although I imagine if PEP 665 is accepted we’ll develop a common logic for traversing the file, like how packaging can parse requirements, so this particular case wouldn’t be an issue.)

You can go entirely environment agnostic (or as much as practical, since this is technically impossible due to the per-file dependency problem), or only target exactly one OS/interpreter/arch combination (which is what I imagine a future pip freeze implementation would do), or anything in between, as long as the target environment is described correctly in marker and tags (so GPU arch is out, targeting specifically Ubuntu is probably not easy, until you can get them into either the wheel tag or environment marker standard).

pf_moore · August 5, 2021, 12:24pm

Can I suggest this gets added to the “Installer Expectations” section of the PEP? If (for example) pip were to implement PEP 665 support, but only handled fully pinned requirements with only a single possibility for each, I could easily imagine getting bug reports claiming that we “didn’t support PEP 665”. So I’d like to have something explicit to point to, to confirm that we did

(To be honest, this was something of a strawman argument, as I’d be surprised if this sort of behaviour would stand up in reality, and I was expecting that a discussion of how we allow it will end up making that clear. Which in turn would help clarify the level of complexity we expect installers to implement. But if the PEP authors think it’s OK, maybe I was being too pessimistic.)