PEP 665: Specifying Installation Requirements for Python Projects

nicholdav · August 1, 2021, 6:37pm

As such, locking can be achieved trivially in conda for your current platform as conda env export -f my_env.lock and restored (anywhere, assuming the same OS/arch) as conda env create -f my_env.lock

In my experience it’s not so trivial but your point is taken

h-vetinari · August 1, 2021, 7:51pm

Thanks for your reply. Let’s try to take a step back. I agree that reproducibility is usually not that important, but since it is was one of the two key points in the motivation, I picked it up. I propose to shelve that aspect for the time being (in the context of this discussion).

The much more important thing is that - from my understanding of the term - lock files only make sense for environments (and that can overlap with the needs of a single project, e.g. the environment that people use to be on the same page when co-developing) - but perhaps I’m not getting an important aspect here.

Assuming we understand lock files similarly, it’d be fine if the goal of this PEP is just focussed on describing all the transitive dependencies necessary to install or work on a given library, but then it should IMO not use the words “installation requirements”, because that is a much broader concept in my view - people want to co-install packages (following the “installation requirements”) that need to share common dependencies (e.g. numpy), and then it becomes an environment question again, because different people will install different sets of packages.

This leads me to the second point. My mental yardstick is not a python-only project, but something that needs to be compiled (a very common case). And in such cases, there are then a whole lot of other “dependencies” (in the sense of factor affecting the build) that come into play. As a sidenote, I think it would be worth to sharpen the language around installation & runtime requirements, since these do not necessarily overlap once the project includes non-python code.

So IMO that’s a great goal, but not achievable for projects that aren’t just pure python without diving into some very tricky questions about being explicit enough so that “things external to the local environment do not affect how the description would be interpreted”. This is what I meant with considering conda as prior art, because it has solved exactly that question (and not with reproducibility as the primary focus).

I think this might be a crossed wire on the grapevine somewhere. It’s trivial in conda to create and use lock files. After doing e.g. conda create -n my_env python=3.9 numpy (and activating the env), the output of conda env export -f my_env.lock is (here for windows):

name: my_env
channels:
  - conda-forge
  - defaults
dependencies:
  - ca-certificates=2021.5.30=h5b45459_0
  - certifi=2021.5.30=py39hcbf5309_0
  - intel-openmp=2021.3.0=h57928b3_3372
  - libblas=3.9.0=10_mkl
  - libcblas=3.9.0=10_mkl
  - liblapack=3.9.0=10_mkl
  - mkl=2021.3.0=hb70f87d_564
  - numpy=1.21.1=py39h6635163_0
  - openssl=1.1.1k=h8ffe710_0
  - pip=21.2.2=pyhd8ed1ab_0
  - python=3.9.6=h7840368_1_cpython
  - python_abi=3.9=2_cp39
  - setuptools=49.6.0=py39hcbf5309_3
  - sqlite=3.36.0=h8ffe710_0
  - tbb=2021.3.0=h2d74725_0
  - tzdata=2021a=he74cb21_1
  - ucrt=10.0.20348.0=h57928b3_0
  - vc=14.2=hb210afc_5
  - vs2015_runtime=14.29.30037=h902a5da_5
  - wheel=0.36.2=pyhd3deb0d_0
  - wincertstore=0.2=py39hcbf5309_1006
prefix: C:\Users\[...]\.conda\envs\my_env

This specifies all artefacts in the environment down to the version, build number & build hash, which means recreating an environment from this lockfile will (generally**) be bit-for-bit equivalent (again, on the same platform) to the point in time where the snapshot was taken.

** except in exceptional circumstances; happy to go into detail if desired.

rgommers · August 2, 2021, 8:13am

Comment: while what you write about is a real pain point for projects with complex dependencies @h-vetinari, I don’t think it’s helpful to discuss it in the context of this PEP. Nothing in this PEP changes that one way or another. The scope and assumptions of this PEP are: use PyPI and wheels, and standardize lock files for use cases that mostly already work today.

The answer for “I depend on this native library that’s not on PyPI” already was “just bundle it in, or write in your project’s docs how to install it separately”, and that remains unchanged here.

h-vetinari · August 2, 2021, 8:32am

Yeah, I can see how things would work with only wheels, but then type="source tree" should not be part of the scope of the PEP.

pradyunsg · August 2, 2021, 11:01am

I disagree.

The type is clearly specified as “something to build a wheel from” and it uses an already-established-and-standardised meaning for “source trees”. Same for sdists.

uranusjr · August 2, 2021, 2:48pm

I still feel we are largely talking past each other, probably mostly because we are not using the same words to describe the same things. I’m making another attempt to try to clarify the terminology up, but I do wish you could try to clarify your own definitionof them, because I don’t understand your usage of those terms (and it’s not just me).

What is an environment? I gave my definition above, and PEP 665 is using lock files to describe an environment, and the project only comes in because it is tied to those environments. Or by environment do you mean not the runtime context itself (i.e. as in a virtual environment or conda environment), but the characteristics of an environment (i.e. as in PEP 508 environment marker or environment variable)?

In what ways is this related to what we are doing here? PEP 665 describes a structure of a file format, and how to put Python dependencies in it. It does that because Python packaging currently covers only Python stuffs. If it’s expanded to cover other things, the lock file format can grow to accomodate them. I don’t understand your insistence on the topic because it is just not relevant to this discussion.

This is a good lock file for your usage, but not an adaquet format for general usage, since it

Does not record intent. You probably know why each entry is in that file right now, but it cannot be evolved without manual input.
Does not record context. This file is used on Windows (and does not work elsewhere), but nobody can know that by looking at the file.
Is either too strict (only works if those exact files are installable on the target machine, due to the hashes), or depends on external setup (what files are provided by channels for a given dependency, if the hashes are removed).

This means the lock file basically only works on your machine (or an exactly identical setup), which is fine (and also achievable with PEP 665), but is not useful nor the use case lock files are generally designed to target. If this is your definition to a lock file, then PEP 665 does not qualify as a lock file for you. But that’s not the definition of lock file used by PEP 665.

layday · August 2, 2021, 3:23pm

If this PEP is about lock files can I suggest that a different term is used in the title and abstract? “Installation requirements” doesn’t appear anywhere else in the document and I’ve a hard time understanding what it means.

h-vetinari · August 2, 2021, 4:18pm

This part is not clear at all to me from the text, or rather, the title. “Installation requirements for python projects” sounds project-specific to me, and so does “This PEP specifies a file format to list the Python package installation requirements for a project.” from the motivation.

Perhaps my misunderstanding was to wrongly go from “installation requirements” to equating “project=package”? If so, I apologise.

Is the locker intended to also operate on an installed collection of packages (where none transitively requires all the others)? Mentioning that would have helped me avoid the misunderstanding.

This is where my biggest concern is. Python packaging emphatically is not only python stuff, and the problems arising from that are compounded greatly by allowing dependencies to be specified as source tree / sdist. As soon as this PEP is accepted, people will begin using all the available capabilities, and paired with the expectations set by the name/motivation/abstract, there’s a tremendous risk of disappointment when the resulting lockfile does not actually work as advertised.

Perhaps I again was influenced too much by the “installation requirements” in the title, but I was pointing out that many immediate problems with the stated motivation (cross-platform installation requirements, reproducibility, …) have some substantial prior art, and that I would have hoped for the PEP to not unwittingly block the path to accommodate expansions based on already existing functionality elsewhere.

I didn’t say conda’s lock files are perfect, just that they exist. Intent and context would be very beneficial to have. Regarding cross-platform, it would be instructive to see it mentioned how the lockfile will handle platform-specific dependencies (an example)?

Still, fully cross-platform lockfiles sound like an unrealistic goal to me (what if no wheels have been published for windows? what if the sources don’t compile trivially on OSX? etc. etc.). Not that it wouldn’t be great; just that it’s very hard. There’s a reason why the conda stuff is platform-specific.

steve.dower · August 2, 2021, 4:23pm

I agree there’s some conflation between “environment” and “project” here. My loose-and-outdated-but-familiar definitions are that requirements.txt specifies an environment and setup(install_requires=[...]) specifies a project.

If the response from the PEP authors is “this does both”, then I think that’s a bad decision. So hopefully they’ll be able to clearly say “it is this one”. (And I assume that since the version constraints are optional but the locked version/hash are mostly required means that “this one” is requirements.txt.)

I have thoughts on the implications of this, but I’ll wait for an author to confirm their intent first.

To respond to some responses:

Sure, you can put them in a directory, but why specify the name of that directory in the PEP? Why not just let people put them wherever they like?

“IMO” is fine but hopefully it makes it into the PEP text

It’s attached to the package itself, rather than the thing that needs it. So if you’re, say, rendering a human-readable list of packages, you can read both the requested and the locked versions out of the same table. The needs are specified in different tables, so you need to search for references to the package to figure out what the requests were.

Again, depending on whether this file is meant to supersede requirements.txt, setup(install_requires=[...]) or both, the best approach here is going to vary. So I’ll hang out for that clarification first.

njs · August 2, 2021, 11:42pm

Re-reading the PEP and responses here, I’m actually much more confused than when I started out

So one big thing I missed was that the lock file doesn’t actually have… locks? I.e. it doesn’t specify specific versions to install, and the installer is expected to have a full resolver, but just run it restricted to the package versions mentioned in the file instead of all of PyPI? This isn’t what I expect from a lock file :-).

Also, most resolvers actually take more than just the package requirements as input – e.g. what pip calls “constraints”, --allow-pre, etc. How are these supposed to be encoded in the file?

Also, it seems that this lock file requires that you can figure out all the package versions that might possibly be required to satisfy all the requirements, under all possible marker environments at once. I’m not sure there even… exists a sound algorithm for doing that? How do you imagine this will be implemented? What are you supposed to do if the resolution includes an sdist that you can’t build on the platforms the user cares about? The lock file format seems to require that you somehow figure out its requirements to build the lock file at all, even if the user will never actually need them. Is this even possible to implement?

(In my resolver, my tentative plan was to just run the resolver several times for the environments the user says they care about, e.g. ["win32", "linux", "macos"] or whatever, record the locks restricted to the markers that were actually used during the resolution, and then at install time either apply the locks as-is if possible, or else warn the user that we don’t have a valid resolution for the current platform and re-run the resolver, sticking as close to the locked versions as possible. That strategy seems to be fundamentally incompatible with this lock format though?)

I don’t understand this at all either :-). The entire tree is described directly in the lock file itself, in the needs fields. [Though I agree they should be called requires to match with every other packaging spec.] Scanning through to find all packages that mention another one inside a needs field is trivial.

Thanks! Just trying to visualize how this will work in practice. Looking at PEP 650, it seems like the motivations are:

Platforms like Heroku/Lambda/etc. want a way to take a source tree and convert it into a self-contained executable bundle. So I guess the idea is you’d put in a bit of configuration to Heroku or whatever that says “please use pyproject-lock.d/prod.toml”, and then it takes it from there.

One limitation of this proposal: it doesn’t have any way to specify the python version. Platforms need that, and in my own drafts I include the python interpreter as part of the locked configuration. I could stuff that in a tool section of course, but it’s a bit awkward if the platform doesn’t know to read that tool section and you have to configure it separately so it can get out of sync etc.
IDEs want a way to figure out which packages you’re using, so they can do stuff like process type hints and suggest autocompletions.

This might also benefit from a way of saying which python version you’re using, though presumably less so, since IDEs don’t necessarily need to create a running environment in order to benefit from knowing which packages are in use.
Developers on teams that prefer heterogenous tooling: as PEP 650 puts it:

Developers want to be able to use the installer of their choice while working with other developers, but currently have to synchronize their installer choice for compatibility of dependency installation. If all preferred installers instead implemented the specified interface, it would allow for cross use of installers, allowing developers to choose an installer regardless of their collaborator’s preference.

I don’t think this applies to PEP 665 at all. You still need everyone to agree on which tool they use to create and update the lockfiles, and that’s the user-facing part where people have strong opinions. I find it hard to imagine a team where everyone agrees on using poetry to resolve dependencies, but some of them insist on using venv to create the environments while others are virtualenv-or-nothing.
Dependabot-like tools:

Package upgraders and package infrastructure in CI/CD such as Dependabot [3], PyUP [9], etc. currently support a few installers. They work by parsing and editing the installer-specific dependency files directly (such as requirements.txt or poetry.lock) with relevant package information such as upgrades, downgrades, or new hashes.

Again, I don’t think this applies to PEP 665 at all. Tools like Dependabot need to see the input to the resolution process. Seeing the output alone is not particularly useful.

So it seems like the motivation here is for IDEs and secondarily PaaS providers? Do I have it right?

EpicWink · August 2, 2021, 11:46pm

To clarify on this, as I suspect the answer is “discoverability for possible consumers”, how does GitHub/VSCode know which of multiple files in the directory pyproject-lock.d/ to automatically use for dependency analysis and environment setup? Is it always supposed to be user-specified? If so, why not allow any filename on the repo, and suggest a prefix (like pyproject-lock-dev.toml).

Is url a required field for wheel/sdist package code spec types? What if that URL becomes inaccessible in the future? Are installed allowed to ignore that URL, if the hash still matches are falling back to a package index?

ofek · August 3, 2021, 2:12am

I share this opinion.

To be explicit, I think that the lock file should be solely for setting up a single reproducible Python environment for a single platform + arch pair (which MUST be in the file name) and metadata.marker MUST be exactly PEP 621’s requires-python + any desired keys of optional-dependencies.

uranusjr · August 3, 2021, 7:32am

Maybe we are mixing terminologies again here? I first learned about this distinction between requirements.txt and setup.py from Donald’s blogpost; the terms he used was libraries and applications, and that’s the terminology I’ve sticked to since, not projects and environments. To me, a project is a collection of Python code that is either a library or an appllication, and an environment is where you install a project’s dependencies (and maybe the project itself) into, so neither is directly related to the distinction.

h-vetinari · August 3, 2021, 9:01am

Were you addressing anyone specifically?

I think the issue is that “project” is a very overloaded term. Writing a few notebooks or scripts is colloquially also called a project (with no other infrastructure other than a requirements.txt), but falls neither under library nor application (IMO). Does the PEP (intend to) address such cases and if so, how? I don’t think they can be considered out-of-scope, because that’s the vast majority of code that people want to work on collaboratively (and therefore need something like locking to avoid mismatched behaviour)…

I think this is something that everyone so far agrees on. The issue is whether the locking happens on a per-project (=library-or-application) level, or on an environment-level, where many such projects can be installed side-by-side. The PEP reads like it’s the former, while many use cases need the latter (which is strictly more general, but unlike PEP621 cannot be attached to a single project).

bernatgabor · August 3, 2021, 9:32am

In my book these are applications, and the proposal for those applies. You think it doesn’t?

h-vetinari · August 3, 2021, 10:00am

I’m happy to hear if it does, I just wasn’t sure where the line for “project” is being drawn.

I haven’t seen an example of what information the locker would operate on, but can we agree that one conceivable case is that the locker processes a requirements.txt file (plus maybe OS/arch context) into a lockfile? That would then definitely cover the “low-infrastructure” case that most projects fall under.

bernatgabor · August 3, 2021, 10:16am

Conceptually, in my head at least, any code that needs to be installed first to run is a library. Anything that runs without being installed is an application.

pf_moore · August 3, 2021, 10:35am

I’d qualify this a bit further, a library is used with other code, whereas an application stands alone.

So when you install a library, any dependencies it has need to be compatible with the code the library is being used with. When you install an application, you just set up the application code and its dependencies, and you don’t need to care about being compatible with anything else.

The confusion here seems to come from the idea of using a single Python runtime (interpreter, virtualenv, environment, whatever) for multiple applications. That’s a completely different problem in terms of complexity and I’d say it’s generally not recommended practice.

The PEP seems to me to be talking about “specifying what you need to install when setting up an application”, on the assumption that you’re installing into a clean environment. Making that explicit in the PEP seems like a good idea to me (if that is indeed the correct interpretation).

sbidoul · August 3, 2021, 12:55pm

Does the PEP actually need to be very specific about what the lock file is for, beyond a vague “these specific versions have been known to work together for some purpose” ?

A given project can provide a top level application (CLI/GUI/REST API, …) and also a programmatic API and as such can be embedded in another application. In which case a lock file helps people installing a “known good” set of dependencies for running the application. But developers can also depend on the same project in which case the looser dependencies are used, as expressed in the sdist/wheel/prepare_metadata_for_build_wheel/prepare_metadata_for_build_wheel.

fungi · August 3, 2021, 1:20pm

On prior art, for many years OpenStack projects have been relying on
pip install’s --constraints option for such a purpose (in fact, the
option was originally added to pip by OpenStack contributors with
this precise use in mind):

A global set of exact (===) version constraints for all direct and
transitive dependencies of the projects is maintained in a central
repository, applied in integration test jobs to keep them
reproducible and minimize failures related to new releases of
dependencies. Updates to the central list are auto-proposed any time
something in that list gets a new release, and these proposals are
themselves integration tested with a representative subset of
significant projects to verify they’re reasonably safe.

This is what the set currently looks like: