PEP 665: Specifying Installation Requirements for Python Projects

brettcannon · July 30, 2021, 1:04am

Also known as lock files.

Been working on this for almost 6 months with @pradyunsg and @uranusjr. If this gets accepted I think it would mean rejecting PEP 650 as redundant (the installer API PEP).

PEP: 665
Title: Specifying Installation Requirements for Python Projects
Author: Brett Cannon brett@python.org,
Pradyun Gedam pradyunsg@gmail.com,
Tzu-ping Chung uranusjr@gmail.com
PEP-Delegate:
Discussions-To: PEP 665: Specifying Installation Requirements for Python Projects
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 29-Jul-2021
Post-History: 29-Jul-2021
Resolution:

========
Abstract

This PEP specifies a file format to list the Python package
installation requirements for a project. The list of projects is
considered exhaustive for the installation target and thus
locked down, not requiring any information beyond the platform being
installed for and the lock file listing the required dependencies
to perform a successful installation of dependencies.

==========
Motivation

Thanks to PEP 621, projects have a way to list their direct/top-level
dependencies which they need to have installed. But PEP 621 also
(purposefully) omits two key details that often become important for
projects:

#. A listing of all indirect/transitive dependencies
#. Specifying (at least) specific versions of dependencies for
reproducible installations

Both needs can be important for various reasons when creating a new
environment. Consider a project which
is an application that is deployed somewhere (either to users as a
desktop app or to a server). Without a complete listing of all
dependencies and the specific versions to use, there can be a skew
between developers of the same project, or developer and user, based on
what versions of a project’s dependencies happen to be available at the
time of installation in a new environment. For instance, a dependency may
have v1 as the newest version on Monday when one developer installed the
dependency, while v2 comes out on Wednesday when another developer
installs the same dependency. Now the two developers are working against
two different versions of the same dependency, which can lead to different
outcomes. This is the use-case of developing a desktop or server
application where one might have a requirements.txt file which
specifies exact versions of various packages.

Another important reason for reproducible installations is for
security purposes. Guaranteeing that the same binary data is
downloaded and installed for all installations of an app makes sure that no
bad actor has somehow changed a dependency’s binary data in a malicious
way. A lock file can assist in this guarantee by recording the exact
details of what should be installed and how to verify that those
dependencies have not changed any bytes unexpectedly. This is the use-case
of developing a secure application using a requirements.txt file which
specifies the hash of all the packages that should be installed.

Tied into this concept of reproducibility is the speed at which an
environment can be recreated. If you created a lock file as part of
your local development, it can be used to speed up recreating that
development environment by minimizing having to query the network or the
scope of the possible resolution of dependencies. This makes recreating
your local development environment faster as the amount of work required
to calculate what dependencies to install has been minimized. This is the
use-case of when you are working on a library or some such project where
the lock file is not committed to version control and the lock file used as
a local cache of installation resolution details, such as an uncommitted
poetry.lock file.

The community itself has also shown a need for lock files based on the
fact that multiple tools have independently created their own lock
file formats:

#. PDM_
#. pip-tools_
#. Pipenv_
#. Poetry_
#. Pyflow_

Other programming language communities have also shown the usefulness
of lock files by developing their own solution to this problem. Some
of those communities include:

#. Dart_
#. npm_/Node
#. Rust_

Below, we identify some use-cases applicable to stakeholders in the
Python community and anyone who interacts with Python package
installers who are the ultimate consumers of a lock file (this is not
considered exhaustive and is borrowed from PEP 650).

Providers

Providers are the parties (organization, person, community, etc.) that
supply a service or software tool which interacts with Python
packaging. Two different types of providers are considered:

Platform/Infrastructure Providers

Platform providers (cloud environments, application hosting, etc.) and
infrastructure service providers need to support package installers
for their users to install Python dependencies. Most only support
requirements.txt files and a smattering of other file formats for
listing a project’s dependencies. Most providers do not want to maintain
support for more than one dependency specification format
because of the complexity it adds to their software or service and the
resources it takes to do so (e.g. not all platform providers have
the staffing to support pip-tools, Poetry, Pipenv, etc.).

This PEP would allow platform providers to declare support for this PEP
and thus only have to support one dependency specification format. What
this would mean is developers could use whatever toolchain they preferred
for development as long as they could emit a file that implemented this
PEP. This then allows developers to not have to align with what their
platform providers supports as long as everyone agrees to implementing
this PEP.

IDE Providers

Integrated development environments may interact with Python package
installation and management. Most only support select few tools, and
users are required to find work arounds to install
their dependencies using other package installers. Similar to the
situation with PaaS & IaaS providers, IDE providers do not want to
maintain support for N different formats. Instead, tools would only
need to be able to read files which implement this PEP to perform various
actions (e.g. list all the dependencies of the open project, which ones
are missing, install dependencies, generate the lock file, etc.).

As an example, the Python extension for VS Code has to have custom support
for each installer tool people may use: pip, Poetry, Pipenv, etc. This is
not only tedious by having to track multiple projects and any changes they
make, but it also locks out newer tools whose popularity isn’t great
enough to warrant inclusion in the extension.

Developers

Developers are teams, people, or communities that code and use Python
package installers and Python packages. Three different types of
developers are considered:

Developers using PaaS & IaaS providers

Most PaaS and IaaS providers only support one Python package
installer: requirements.txt. This dictates the installers that
developers can use while working with these providers, which might not
be optimal for their application or workflow.

Developers adopting this PEP would be able to use third party
platforms/infrastructure without having to
worry about which Python package installer they are required to use as
long as the provider also supports this PEP.

Developers using IDEs

Most IDEs only support pip or a few Python package installers.
Consequently, developers must use workarounds or hacky methods to
install their dependencies if they use an unsupported package
installer.

If the IDE uses/supports this PEP it would allow for
any developer to use whatever tooling they wanted to generate
their lock file while the IDE can use whatever tooling it wants to
performs actions with/on the lock file.

Developers working with other developers

Developers want to be able to use the installer of their choice while
working with other developers, but currently have to synchronize their
installer choice for compatibility of dependency installation. If all
preferred installers instead implemented the specified interface, it
would allow for cross use of installers, allowing developers to choose
an installer regardless of their collaborator’s preference.

Upgraders & Package Infrastructure Providers

Package upgraders and package infrastructure in CI/CD such as
Dependabot_, PyUP_, etc. currently support a few formats. They work
by parsing and editing the dependency files with
relevant package information such as upgrades, downgrades, or new
hashes. Similar to Platform and IDE providers, most of these providers
do not want to support N different formats.

Currently, these services/bots have to implement support for each
format individually. Inevitably, the most popular
formats are supported first, and less popular tools are often never
supported. By implementing this specification, these services/bots can
support one format, allowing users to select the tool
of their choice to generate the file. This will allow for more innovation
in the space, as platforms and IDEs are no longer forced to prematurely
select a “winner” tool which generates a lock file.

Open Source Community

Specifying installer requirements and adopting this PEP will reduce
the friction between Python package installers and people’s workflows.
Consequently, it will reduce the friction between Python package
installers and 3rd party infrastructure/technologies such as PaaS or
IDEs. Overall, it will allow for easier development, deployment and
maintenance of Python projects as Python package installation becomes
simpler and more interoperable.

Specifying a single file format can also increase the pace of innovation
around installers and the generation of dependency graphs. By
decoupling generating the dependency graph details from installation It
allows for each area to grow and innovate independently. It also allows
more flexibilty in tool selection on either end of the dependency graph
and installation ends of this process.

=========
Rationale

To begin, two key terms should be defined. A locker is a tool
which produces a lock file. An installer is a tool which
consumes a lock file to install the appropriate dependencies.

The expected information flow to occur if this PEP were accepted, from
the specification of top-level dependencies to all necessary
dependencies being installed in a fresh environment, is:

Read top-level dependencies from pyproject.toml (PEP 621).
Generate a lock file via a locker in pyproject-lock.d/.
Install the appropriate dependencies based entirely on information
contained in the lock file via an installer.

Goals

The file format should be machine-readable, machine-writable, and
human-readable. Since the assumption is the vast majority of lock
file will be generated by a locker tool, the format should be easy
to write by a locker. As install tools will be consuming the lock
file, the format also needs to be easily read by an installer. But the
format should also be readable by a person as people will inevitably
be performing audits on lock files. Having a format that does not lend
itself towards being read by people would hinder that. This includes
changes to a lock file being readable in a diff format for auditing
changes. It also means that understanding why something is in
the lock file should be comprehensible in a diff to assist in auditing
changes.

The lock file format needs to be general enough to support
cross-platform and cross-environment specifications of dependencies.
This allows having a single lock file which can work on a myriad of
platforms and environments when that makes sense. This has been shown
as a necessary feature by the various tools in the Python packaging
ecosystem which already have a lock file format (e.g. Pipenv_,
Poetry_, PDM_). This can be accomplished by allowing (but not
requiring) lockers to defer marker evaluation to the installer, and
thus permitting the locker to include a wider range of possible
dependencies that the installer has to work with.

The lock file also needs to support reproducible installations. If
one wants to restrict what the lock file covers to a single platform
to guarantee the exact dependencies and files which will be installed,
that should be doable. This can be critical in security contexts for
projects like SecureDrop_.

When a computation could be performed either in the locker or
installer, the preference is to perform the computation in the
locker. This is because the assumption is a locker will be executed
less frequently than an installer.

The installer should be able to resolve what to install based entirely
on platform/environment information and what is contained within the
lock file. There should be
no need to use network or other file system I/O in order to resolve
what to install.

The lock file should provide enough flexibility to allow lockers and
installers to innovate. While the lock file specification provides a
common denominator of functionality, it should not act as a ceiling
for functionality.

Non-Goals

Because of the expected size of lock files, no effort was put into
making lock files human-writable.

This PEP makes no attempt to make this work in any special way for
installers to use a lock file to install into a preexisting environment.
The assumption is the installer is installing into a new/fresh
environment.

=============
Specification

Details

Lock files MUST use the TOML_ file format thanks to its adoption by
PEP 518 for pyproject.toml. This not only prevents the need to
have another file format in the Python packaging ecosystem, but it
also assists in making lock files human-readable.

Lock files MUST be kept in a directory named pyproject-lock.d.
Lock files MUST end with a .toml file extension. Projects may have
as many lock files as they want using whatever file name stems they
choose. This PEP prescribes no specific way to automatically select
between multiple lock files and installers SHOULD avoid guessing which
lock file is “best-fitting” (this does not preclude situations where
only a single lock file with a certain name is expected to exist and
will be used by default, e.g. a documentation hosting site always
using a lock file named pyproject-lock.d/rftd.toml when provided).

The following are the top-level keys of the TOML file data format.

`version`

The version of the lock file being used. The key MUST be specified and
it MUST be set to 1. The number MUST always be an integer and it
MUST only increment in future updates to the specification. What
consistitutes a version number increase is left to future PEPs or
standards changes.

Tools reading a lock file whose version they don’t support MUST raise
an error.

`[tool]`

Tools may create their own sub-tables under the tool table. The
rules for this table match those for pyproject.toml and its
[tool] table from the build system declaration spec_.

`[metadata]`

A table containing data applying to the overall lock file.

`metadata.marker`

An optional key storing a string containing an environment marker as
specified in the dependency specifier spec_.

The locker MAY specify an environment marker which specifies any
restrictions the lock file was generated under (e.g. specific Python
versions supported).

If the installer is installing for an environment which does not
satisfy the specified environment marker, the installer MUST raise an
error as the lock file does not support the environment.

`metadata.tags`

An optional array of inline tables representing
platform compatibility tags_ that the lock file supports. The locker
MAY specify tables in the array which represent the compatibility the
lock file was generated for.

The tables have the possible keys of:

interpreter
abi
platform

representing the parts of the platform compatibility tags. Each key is
optional in a table. These keys MUST represent a single value, i.e.
the values are exploded and not compressed in wheel tag parlance.

If the environment an installer is installing for does not match
any table in the array (missing keys in the table means implicit
support for that part of the compatibility), the installer MUST raise
an error as the lock file does not support the environment.

`metadata.needs`

An array of strings representing the package specifiers for the
top-level/direct dependencies of the lock file as defined by the
dependency specifier spec_ (i.e. the root of the dependency graph
for the lock file).

Lockers MUST only allow specifiers which may be satisfiable by the
lock file and the dependency graph the lock file encodes. Lockers MUST
normalize project names according to the simple repository API_.

`[package]`

A table containing arrays of tables for each dependency recorded
in the lock file.

Each key of the table is the name of a package which MUST be
normalized according to the simple repository API_. If extras are
specified as part of the project to install, the extras are to be
included in the key name and are to be sorted in lexicographic order.

Within the file, the tables for the projects MUST be
sorted by:

#. Project/key name in lexicographic order
#. Package version, newest/highest to older/lowest according to the
version specifiers spec_
#. Extras via lexicographic order

`package.<name>.version`

A required string of the version of the package as specified by the
version specifiers spec_.

`package.<name>.needs`

An optional key containing an array of strings following the
dependency specifier spec_ which specify what other packages this
package depends on. See metadata.needs for full details.

`package.<name>.needed-by`

A key containing an array of package names which depend on this
package. The package names MUST match the package name as used in the
package table.

The lack of a needed-by key infers that the package is a
top-level package listed in metadata.needs.

`package.<name>.code`

An array of tables listing files that are available to satisfy
the installation of the package for the specified version in the
version key.

Each table has a type key which specifies how the code is stored.
All other keys in the table are dependent on the value set for
type. The acceptable values for type are listed below; all
other possible values are reserved for future use.

Tables in the array MUST be sorted in lexicographic order of the value
of type, then lexicographic order for the value of url.

When recording a table, the fields SHOULD be listed in the order
the fields are listed in this specification for consistency to make
diffs of a lock file easier to read.

For all types other than “wheel”, an INSTALLER MAY refuse to install
code to avoid arbitrary code execution during installation.

An installer MUST verify the hash of any specified file.

type="wheel"
‘’’’’’’’’’’’’’’’

A wheel file_ for the package version.

Supported keys in the table are:

url: a string of location of the wheel file (use the
file: protocol for the local file system)
hash-algorithm: a string of the algorithm used to generate the
hash value stored in hash-value
hash-value: a string of the hash of the file contents
interpreter-tag: (optional) a string of the interpreter portion
of the wheel tag as specified by the platform compatibility tags_
spec
abi-tag: (optional) a string of the ABI portion of the wheel tag
as specified by the platform compatibility tags_ spec
platform-tag: (optional) a string of the platform portion of the
wheel tag as specified by the platform compatibility tags_ spec

If the keys related to platform compatibility tags_ are absent then
the installer MUST infer the tags from the URL’s file name. If any of
the platform compatibility tags_ are specified by a key in the table
then a locker MUST provide all three related keys. The values of the
keys may be compressed tags.

type="sdist"
‘’’’’’’’’’’’’’’’

A source distribution file_ (sdist) for the package version.

url: a string of location of the sdist file (use the
file: protocol for the local file system)
hash-algorithm: a string of the algorithm used to generate the
hash value stored in hash-value
hash-value: a string of the hash of the file contents

type="git"
‘’’’’’’’’’’’’’

A Git_ version control repository for the package.

url: a string of location of the repository (use the
file: protocol for the local file system)
commit: a string of the commit of the repository which
represents the version of the package

The repository MUST follow the source distribution file_ spec
for source trees, otherwise an error is to be raised by the locker.

As the commit ID for a Git repository is a hash of the repository’s
contents, there is no hash to verify.

type="source tree"
‘’’’’’’’’’’’’’’’’’’’’’

A source tree which can be used to build a wheel.

url: a string of location of the source tree (use the
file: protocol for the local file system)
mime-type: (optional) a string representing the MIME type of the
URL
hash-algorithm: (optional for a local directory) a string of the
algorithm used to generate the hash value stored in hash-value
hash-value: (optional for a local directory) a string of the
hash of the file contents

The collection of files MUST follow the source distribution file_
spec for source trees, otherwise an error is to be raised by the
locker.

Installers MAY use the file extension, MIME type from HTTP headers,
etc. to infer whether they support the storage mechanism used for the
source tree. If the MIME type cannot be inferred and it is not
specified via mime-type then an error MUST be raised.

If the source tree is NOT a local directory, then an installer MUST
verify the hash value. Otherwise if the source tree is a local
directory then the hash-algorithm and hash-value keys MUST be
left out. The installer MAY warn the user of the use of a local
directory due to the potential change in code since the lock file
was created.

Example

::

    version = 1

    [tool]
    # Tool-specific table ala PEP 518's `[tool]` table.

    [metadata]
    marker = "python_version>='3.6'"

    needs = ["mousebender"]

    [[package.attrs]]
    version = "21.2.0"
    needed-by = ["mousebender"]

    [[package.attrs.code]]
    type = "wheel"
    url = "https://files.pythonhosted.org/packages/20/a9/ba6f1cd1a1517ff022b35acd6a7e4246371dfab08b8e42b829b6d07913cc/attrs-21.2.0-py2.py3-none-any.whl"
    hash-algorithm="sha256"
    hash-value = "149e90d6d8ac20db7a955ad60cf0e6881a3f20d37096140088356da6c716b0b1"

    [[package.mousebender]]
    version = "2.0.0"
    needs = ["attrs>=19.3", "packaging>=20.3"]

    [[package.mousebender.code]]
    type = "sdist"
    url = "https://files.pythonhosted.org/packages/35/bc/db77f8ca1ccf85f5c3324e4f62fc74bf6f6c098da11d7c30ef6d0f43e859/mousebender-2.0.0.tar.gz"
    hash-algorithm = "sha256"
    hash-value = "c5953026378e5dcc7090596dfcbf73aa5a9786842357273b1df974ebd79bd760"

    [[package.mousebender.code]]
    type = "wheel"
    url = "https://files.pythonhosted.org/packages/f4/b3/f6fdbff6395e9b77b5619160180489410fb2f42f41272994353e7ecf5bdf/mousebender-2.0.0-py3-none-any.whl"
    hash-algorithm = "sha256"
    hash-value = "a6f9adfbd17bfb0e6bb5de9a27083e01dfb86ed9c3861e04143d9fd6db373f7c"

    [[package.packaging]]
    version = "20.9"
    needs = ["pyparsing>=2.0.2"]
    needed-by = ["mousebender"]

    [[package.packaging.code]]
    type = "git"
    url = "https://github.com/pypa/packaging.git"
    commit = "53fd698b1620aca027324001bf53c8ffda0c17d1"

    [[package.pyparsing]]
    version = "2.4.7"
    needed-by = ["packaging"]

    [[package.pyparsing.code]]
    type="wheel"
    url = "https://files.pythonhosted.org/packages/8a/bb/488841f56197b13700afd5658fc279a2025a39e22449b7cf29864669b15d/pyparsing-2.4.7-py2.py3-none-any.whl"
    hash-algorithm="sha256"
    hash-value="ef9d7589ef3c200abe66653d3f1ab1033c3c419ae9b9bdb1240a85b024efc88b"
    interpreter-tag = "py2.py3"
    abi-tag = "none"
    platform-tag = "any"

Installer Expectations

Installers MUST implement the
direct URL origin of installed distributions spec_ as all packages
installed from a lock file inherently originate from a URL and not a
search of an index by package name and version.

Installers MUST error out if they encounter something they are unable
to handle (e.g. lack of environment marker support).

Example Flow

#. Have the user specify which lock file they would like to use in
pyproject-lock.d (e.g. dev, prod)

#. Check if the environment supports what is specified in
metadata.tags; error out if it doesn’t

#. Check if the environment supports what is specified in
metadata.marker; error out if it doesn’t

#. Gather the list of package names from metadata.needs, and for
each listed package …

#. Resolve any markers to find the appropriate package to install
#. Find the most appropriate code to install for the package
#. Repeat the above steps for packages listed in the needs key
for each package found to install

#. For each project collected to install …

#. Gather the specified code for the package
#. Verify hashes of code
#. Install the packages (if necessary)

=======================
Backwards Compatibility

As there is no pre-existing specification regarding lock files, there
are no explicit backwards compatibility concerns.

As for pre-existing tools that have their own lock file, some updating
will be required. Most document the lock file name, but not its
contents, in which case the file name of the lock file(s) is the
important part. For projects which do not commit their lock file to
version control, they will need to update the equivalent of their
.gitignore file. For projects that do commit their lock file to
version control, what file(s) get committed will need an update.

For projects which do document their lock file format like pipenv_,
they will very likely need a new major version release.

Specifically for Poetry_, it has an
export command <https://python-poetry.org/docs/cli/#export>_ which
should allow Poetry to support this lock file format even if the
project chose not to adopt this PEP as Poetry’s primary lock file
format.

=====================
Security Implications

A lock file should not introduce security issues but instead help
solve them. By requiring the recording of hashes of code, a lock file
is able to help prevent tampering with code since the hash details
were recorded. A lock file also helps prevent unexpected package
updates being installed which may be malicious.

=================
How to Teach This

Teaching of this PEP will very much be dependent on the lockers and
installers being used for day-to-day use. Conceptually, though, users
could be taught that the pyproject-lock.d directory contains files
which specify what should be installed for a project to work. The
benefits of consistency and security should be emphasized to help
users realize why they should care about lock files.

========================
Reference Implementation

No proof-of-concept or reference implementation currently exists.

==============
Rejected Ideas

File Formats Other Than TOML

JSON_ was briefly considered, but due to:

#. TOML already being used for pyproject.toml
#. TOML being more human-readable
#. TOML leading to better diffs

the decision was made to go with TOML. There was some concern over
Python’s standard library lacking a TOML parser, but most packaging
tools already use a TOML parser thanks to pyproject.toml so this
issue did not seem to be a showstopper. Some have also argued against
this concern in the past by the fact that if packaging tools abhor
installing dependencies and feel they can’t vendor a package then the
packaging ecosystem has much bigger issues to rectify than needing to
depend on a third-party TOML parser.

Alternative Name to `pyproject-lock.d`

The name __lockfile__ was briefly considered, but the directory
would not sort next to pyproject.toml in instances where files
and directories were sorted together in lexicographic order. The
current naming is also more obvious in terms of its relationship
to pyproject.toml.

Supporting a Single Lock File

At one point the idea of not using a directory of lock files but a
single lock file which contained all possible lock information was
considered. But it quickly became apparent that trying to devise a
data format which could encompass both a lock file format which could
support multiple environments as well as strict lock outcomes for
reproducible builds would become quite complex and cumbersome.

The idea of supporting a directory of lock files as well as a single
lock file named pyproject-lock.toml was also considered. But any
possible simplicity from skipping the directory in the case of a
single lock file seemed unnecessary. Trying to define appropriate
logic for what should be the pyproject-lock.toml file and what
should go into pyproject-lock.d seemed unnecessarily complicated.

Using a Flat List Instead of a Dependency Graph

The first version of this PEP proposed that the lock file have no
concept of a dependency graph. Instead, the lock file would list
exactly what should be installed for a specific platform such that
installers did not have to make any decisions about what to install,
only validating that the lock file would work for the target platform.

This idea was eventually rejected due to the number of combinations
of potential PEP 508 environment markers. The decision was made that
trying to have lockers generate all possible combinations when a
project wants to be cross-platform would be too much.

Being Concerned About Different Dependencies Per Wheel File For a Project

It is technically possible for a project to specify different
dependencies between its various wheel files. Taking that into
consideration would then require the lock file to operate not
per-project but per-file. Luckily, specifying different dependencies
in this way is very rare and frowned upon and so it was deemed not
worth supporting.

Use Wheel Tags in the File Name

Instead of having the metadata.tags field there was a suggestion
of encoding the tags into the file name. But due to the addition of
the metadata.marker field and what to do when no tags were needed,
the idea was dropped.

Using Semantic Versioning for `version`

Instead of a monotonically increasing integer, using a float was
considered to attempt to convey semantic versioning. In the end,
though, it was deemed more hassle than it was worth as adding a new
key would likely constitute a “major” version change (only if the
key was entirely optional would it be considered “minor”), and
experience with the core metadata spec_ suggests there’s a bigger
chance parsing will be relaxed and made more strict which is also a
“major” change. As such, the simplicity of using an integer made
sense.

Alternative Names for `needs`

Some other names for what became needs were installs and
dependencies. In the end a Python beginner was asked which term
they preferred and they found needs clearer. Since there wasn’t
any reason to disagree with that, the decision was to go with
needs.

Alternative Names for `needed-by`

Other names that were considered were dependents, depended-by,
, supports and required-by. In the end, needed-by made
sense and tied into needs.

Only Allowing a Single Code Location For a Project

While reproducibility is serviced better by only allowing a single
code location, it limits usability for situations where one wants to
support multiple platforms with a single lock file (which the community
has shown is desired).

Support for Branches and Tags for Git

Due to the direct URL origin of installed distributions spec_
supporting the specification of branches and tags, it was suggested
that lock files support the same thing. But because branches and tags
can change what commit they point to between locking and installation,
that was viewed as a security concern (Git commit IDs are hashes of
metadata and thus are viewed as immutable).

Accepting PEP 650

PEP 650 was an earlier attempt at trying to tackle this problem by
specifying an API for installers instead of standardizing on a lock file
format (ala PEP 517). The
initial response <https://discuss.python.org/t/pep-650-specifying-installer-requirements-for-python-projects/6657/>__
to PEP 650 could be considered mild/lukewarm. People seemed to be
consistently confused over which tools should provide what functionality
to implement the PEP. It also potentially incurred more overhead as
it would require executing Python APIs to perform any actions involving
packaging.

This PEP chose to standardize around an artifact instead of an API
(ala PEP 621). This would allow for more tool integrations as it
removes the need to specifically use Python to do things such as
create a lock file, update it, or even install packages listed in
a lock file. It also allows for easier introspection by forcing
dependency graph details to be written in a human-readable format.
It also allows for easier sharing of knowledge by standardizing what
people need to know more (e.g. tutorials become more portable between
tools when it comes to understanding the artifact they produce). It’s
also simply the approach other language communities have taken and seem
to be happy with.

===========
Open Issues

Allow for Tool-Specific `type` Values

It has been suggested to allow for custom type values in the
code table. They would be prefixed with x- and followed by
the tool’s name and then the type, i.e. x-<tool>-<type>. This
would provide enough flexibility for things such as other version
control systems, innovative container formats, etc. to be officially
usable in a lock file.

Support Variable Expansion in the `url` field

This could include predefined variables like PROJECT_ROOT for the
directory containing pyproject-lock.d so URLs to local directories
and files could be relative to the project itself.

Environment variables could be supported to avoid hardcoding things
such as user credentials for Git.

Don’t Require Lock Files Be in a `pyproject-lock.d` directory

It has been suggested that since installers may very well allow users
to specify the path to a lock file that having this PEP say that
"MUST be kept in a directory named pyproject-lock.d" is pointless
as it is bound to be broken. As such, the suggestion is to change
“MUST” to “SHOULD”.

Record the Date of When the Lock File was Generated

Since the modification date is not guaranteed to match when the lock
file was generated, it has been suggested to record the date as part
of the file’s metadata. The question, though, is how useful is this
information and can lockers that care put it into their [tool]
table instead of mandating it be set?

Locking Build Dependencies

Thanks to PEP 518, source trees and sdists can specify what build
tools must be installed in order to build a wheel (or sdist in the
case of a source tree). It has been suggested that the lock file also
record such packages so to increase how reproducible an installation
can be.

There is nothing currently in this PEP, though, that prohibits a
locker from recording build tools thanks to metadata.needs acting
as the entry point for calculating what to install. There is also a
cost in downloading all potential sdists and source trees, reading
their pyproject.toml files, and then calculating their build
dependencies for locking purposes for which not everyone will want to
pay the cost for.

Recording the `Requires-Dist` Input to the Locker’s Resolver

While the needs key allows for recording dependency specifiers,
this PEP does not currently require the needs key to record the
exact Requires-Dist metadata that was used to calculate the
lock file. It has been suggested that recording the inputs would help
in auditing the outcome of the lock file.

If this were to be done, it would be an key named requested which
lived along side needs and would only be specified if it would
differ from what is specified in needs.

===============
Acknowledgments

Thanks to Frost Ming of PDM_ and Sébastien Eustace of Poetry_ for
providing input around dynamic install-time resolution of PEP 508
requirements.

Thanks to Kushal Das for making sure reproducible builds stayed a
concern for this PEP.

Thanks to Andrea McInnes for settling the bikeshedding and choosing
the paint colour of needs.

=========
Copyright

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.

…
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:

njs · July 30, 2021, 1:47am

Interesting! Some quick thoughts:

What’s the point of required-by? It’s purely redundant with the other information in the lock file, isn’t it?

(Also, if we’re going to have both, then shouldn’t needs and required-by use the same verb? needs + needed-by, or requires + required-by?)

Who owns the pyproject-lock.d namespace? Is it intended to be shared between different tools simultaneously? Should auto-managed lockfiles include the tool that manages them in the name somehow, to avoid collisions?

What’s the motivation for this PEP? What problems do we have today that this would solve? (I haven’t seen people wishing they could use pipenv for some environments and poetry for others in the same project, e.g.)

It is technically possible for a project to specify different
dependencies between its various wheel files. Taking that into
consideration would then require the lock file to operate not
per-project but per-file.

I’ve been noodling around a bit with locking, and in my sketch I handled this a third way: for each locked package, I also record the dependency metadata and provenance that I used during the resolution (so e.g. “for mousebender 0.21.3, the resolver looked at https://.../mousebender-0.21.3-py3-cp37-win32.whl, and it saw dependencies on anybender >= 0.1, urllib3 ~= 1.4”). This way during installation, I can check the final wheel I’m installing has the same requirements I was expecting, and if not I can explain what happened (“I’m installing mousebender-0.21.3-py3-cp39-win_amd64.whl, and its dependencies are XX, which doesn’t match mousebender-0.21.3-py3-cp37-win32.whl, so someone should fix that.”)

So basically: we don’t support different wheels having inconsistent dependencies, but we do acknowledge that it can happen and make sure to fail-fast and with usable debugging info.

dustin · July 30, 2021, 1:50am

Congrats on finishing this and thanks for sharing. Two quick thoughts:

I think the PEP might be making some assumptions about how rare this actually is in practice. I think it’s not that uncommon for different wheels for the same release to have different dependencies due to having different needs across Python versions or platforms, and because setup.py makes it trivial to create a package that will have this characteristic. I’m also not sure if this is actually something that’s discouraged currently.

The fact that PyPI currently stores metadata per-release and not per-file has caused a number of headaches and the current plan is to eventually fix that. Given that, it might be worth giving this more consideration, or at least addressing the issue here in more detail.

This PEP is implicitly rejecting PEP 650 as well – I think it’d be good to have an explanation here about why lockfiles should be preferred over what’s proposed in that PEP. I can think of a few reasons:

the benefits of having a single universal requirements standard for the ecosystem
only having to do dependency resolution once
not having to maintain an importable API across multiple tools

I’m sure you have more which helped motivate the creation of this PEP as well.

dustin · July 30, 2021, 1:53am

I think this PEP could largely borrow the motivation from PEP 650 verbatim. The use cases this is attempting to solve is essentially the same: https://www.python.org/dev/peps/pep-0650/#motivation

uranusjr · July 30, 2021, 5:40am

Dustin mentioned PEP 650’s motivation above and I agree it applies to this PEP as well. Following the same line of thought, the owner of pyproject-lock.d would be the one locker chosen by the project. This locker will generally be provided by the project’s project management tool of choice, e.g. Poetry, but the content generated by the lock (pyproject-lock.d) can be consumed by an installer implemented by a different party.

I agree this is not rare, but per-file dependency is extremely resource-consuming to resolve, and none of the existing resolver/locker implementations (including pip freeze) attempt to do this as a result.

So I feel it is OK to assume that those who do encounter per-file dependencies don’t care (e.g. they mandate the project should only be developed on a given platform, so everyone on the team is installing the same wheel). This is generally implicit in currently existing lock file implementations, and this PEP tries to provide an explicit way to express this intent with the top level marker and tag fields.

rgommers · July 30, 2021, 6:35am

Nice PEP, thank you for working on this.

It would be really useful to include a bit of context about assumed or recommended usage patterns of lock files. The only thing I see right now is in the backwards compatibility section, which says lock files can but don’t have to be checked into version control. Related questions:

Can/should lock files be included in an sdist and/or a wheel?
- If a lock file is included, does it do nothing, does it override pyproject.toml, or can it be opted into via an installer?
Are lock files only for standalone/top-level projects (applications, dev environments, etc.) or also for libraries that are dependencies for other end user facing functionality?

If there is a good reference with more extended discussion, that would be good to link to as well. My impression after reading this PEP is that I should somehow know exactly what a lock file is for and when to use it because I have already used it elsewhere.

Are usage patterns of lock files in all those languages the same, and things work smoothly - or are there pain points that this PEP has taken into account?

uranusjr · July 30, 2021, 9:04am

This is not in the PEP (and should), but my understanding when working on the PEP is

The lock file is only for standalone project environments and should not be included in an sdist or wheel.
If it is in a wheel, it has no effect (the wheel’s dependency information is described solely by METADATA).
If it is in an sdist, the behaviour technically depends on the sdist’s build backend. We strongly encourage the backend to ignore the file.

This is the approach taken by all lock file usages across languages from my understanding. Other than this, languages and their underlying packaging stack differ too much for most pain points to be meaningful for Python, so we did not really addressed much of the pain points found in those other communities face (because most of them don’t make sense in Python), but tried more to identify and fix where their solutions are incompatible with Python packaging. From the top of my head:

Maybe lock file designs pre-defined dependency “groups”, but it’s clear that many Python users expect to have a lot more freedom on how many environments they can use and combine for a project (-r another file from a requirements.txt), hence the multi-file directory approach.
Python packaging provides some rather unique functionalities to specify dependencies (PEP 508 direct URL, pip’s ability to install a relative path, swap out the main index entirely, etc.), so there are quite some extra information encoded in the file to support those.
Python users expect to be able to only copy the lock file somewhere (not the file that produced the lock file) and have it “just work”. Most languages expect you to either copy the entire project directory or at least the original user input (packages.json, Cargo.toml, etc.). This requirement to make the file work entirely on its own is also somewhat special.

Yes, it’s technically redundant, but is very useful when upgrading and removing packages in a lock file. The idea is the locker can use it to find whether a package entry becomes dangling; say if b only has a in its required-by, and a got removed, then I can safely remove the entry b and look for b in other entries’ required-by. Without the field, the locker will need to reconstruct the entire tree top-down, which is relatively expensive for Python packaging than other languages.

Brett’s our de-facto arbiter for naming issues

I may be missing something, but it seems this is achievable with the current url and needs fields, right? It’s definitely an interesting idea we didn’t think of.

WhyNotHugo · July 30, 2021, 9:28am

I gree that there’s a bit of a weird asymetry between needs and required-by. needed-by is more intuitively the other side of a needs relationship.

encukou · July 30, 2021, 9:36am

For background, perhaps it would be useful to mention that since lock files describe “the environment”, you cannot meaningfully combine different lock files (e.g. of different projects) together. If you need that, fall back to (unpinned) dependencies (and perhaps build a new lock file out of that).
So:

While lock files can be useful for library developers (e.g. for test setups or doc builds), they’re useless for users of the library.
Applications can (and should) use lock files for deployment. However, providing “traditional” (unpinned) dependencies is useful as well – both for building lock files and to enable installing into other environments (with the understanding that those environments need their own integration testing).

bhogan · July 30, 2021, 4:48pm

I think that needs and required-by need to have the same verb. Either needs and needed-by, or (my preference) requires and required-by. I prefer requires over needs because its how the python community at large already refers to requirements. They are “requirements” and typically listed in either: requirements.txt, install_requires / extras_requires (in setup.py). I think that changing the nomenclature to needs is rather pointless at best, and at worst it could introduce confusion to new python programmers when they are trying to figure out how this relates to requirements.txt and install_requires in existing projects.

After reading this pep, I am also not sure how it handles the situation where one package is both a top level dependency, and also a dependency for another top level dependency. For example, imagine a project that uses flask and jinja2 directly. flask also has a top level dependency on jinja2. So, if I understand the pep correctly, we’d end up with a file that contained these sections (omitting non relevant sections to my question):

[metadata]
needs = ["flask", "jinja2"]

[[package.flask]]
needs = ["jinja2>=3.0", ...]

[[package.jinja2]]
required-by = ["flask"]

I can’t see why this would be a problem, but in light of Tzu-ping Chung’s comment:

I think it would be beneficial to explicitly state the hierarchy of dependencies appearing as both top level and sub dependencies. Otherwise an overeager implementation of this particular feature by any lockers could potentially result in them losing some top level dependencies in rare situations.

One last thing is I don’t understand why this should support having multiple different package.<name>.code tables for a single package, like the example does for mousebender. In such a situation, how does an installer pick which code to use? The pep seems to implicitly state that it prefers wheel types over any others when it states that installers may choose to refuse to install all types other than wheel, but at the same time the pep doesn’t actually say that wheel type code is preferred if multiple are found for the same package.

Additionally, since the motivation of this pep is to have a way to specify reproducible builds, it would stand to reason that if a package has multiple code blocks, they should all contain the exact same source code? If this is true, then why would you need multiple code blocks? If this is not true, then the build is not reproducible if installers are allowed to pick any of the code blocks to install the package from. One installer could produce a different build than another, which appears to violate the motivation behind this pep. I think it would be good to either require a single package.<name>.code block per package, or to specify a mechanism to indicate inside the file which code block is preferred. Potentially something along the lines of:

[[package.mousebender]]
version = "2.0.0"
needs = ["attrs>=19.3", "packaging>=20.3"]
preferred_code_type = "sdist"

Or, changing the specification for package.<name>.code so that the tables are not stored in lexicographic order, but instead stored in preferred install order. So installers would just have to use the first code block they found, and if for some reason the installer refuses to use that one, then it can continue to the next one.

uranusjr · July 30, 2021, 6:43pm

Thanks for pointing this out. My comment was only meant to illustrate the idea behind required-by, but I failed to communicate that it’s only the basic idea, not the complete logic to do that—in general, you need not required-by itself, but the sections referenced in it (and the top-level dependency specification, as you pointed out). The idea is to limit what needs to be traversed when a locker only intends to modify a part of the graph, which is a common operation for tools like Dependabot (to upgrade a dependency away from a vulnarable version), and not currently handled well by tools like pip and its derivatives.

This reminds me though; in my old proposal back in 2019, I used an empty string to designate the top level dependencies. This never caught on back then, but I’d definitely not object if we bring it back

[[package.jinja2]]
required-by = ["", "flask"]  # Jinja2 is specified by the user, and is also depended by Flask.

In a perfect world, reproducibility means running the same code, with the same dependencies, on the same Python interpreter, in the same runtime environment. And that’s definitely a popular and valuable definition (also why containers are so popular nowadays). In reality though, projects tend to come with some level of assumptions and want to bend the rules. The runtime environment does not need to be exactly the same, as long as we don’t config the environment in a way that significantly impacts the behaviour (a popular definition for web services). Maybe the application can run on multiple operating systems (and/or architectures) and they don’t need to all install the same wheel, as long as the dependency’s maintainers promise all those wheels behave the same, so we can develop on macOS and deploy to Ubuntu. In extreme situations, even a dependency’s version doesn’t need to be exactly the same, as long as only one unique resolution is possible for each platform, and the different resolved versions don’t affect the end application’s behaviour. These are all feature requests that came up during development of existing tools, and PEP 665’s design. You can argue some of those are bad practice (I don’t think PEP 665 handles that final example without workarounds, for example), but those are popular enough that we are convinced they need to be possible, otherwise the format won’t be able to catch on and we’ll be back to square one.

steve.dower · July 30, 2021, 8:58pm

Thanks for getting this written up! I know it’s been a big task. Just a few questions/comments:

Could you add a short summary putting this file in context given today’s de facto state of the world? e.g. “this is essentially the intermediate data generated by pip’s resolver in between parsing a requirements.txt/set of requirements, and just before it starts downloading/installing packages”. I think that will help place it correctly, in that this replaces neither of those steps, but actually separates them in a way that other tools can participate in either half of the process without having to do both.
Why require a directory? That seems to presume more about project structure than we should or should need to? (I guess it’s to make it easier for tools to be able to automatically build from a repo, but since they can’t pick the right lock file automatically anyway there’s nothing gained from the location being fixed.)
What should consumers (installers) written today do if the version field is not 1? We need to tell them now to either accept, accept-and-warn, accept-and-warn-on-unknowns, or fail fast. Even if the next version is mostly compatible, there’s a chance it may not be. (This would actually be the reason to bring back SemVer, so that we can make compatible-for-parsing changes without breaking existing tooling, but still have the ability to break it if needed.)
Where file:// is mentioned, could it be file: instead and allow relative paths? Main application would be to create a bundle of wheels and lock files where the whole bundle is deployed/downloaded and then the right lock file is selected on install. (The variables would also work, but “relative to the lock file location” is also valid. This scenario also gets weird given the arbitrary directory name requirement I mentioned above.)
type="source tree" is the space really necessary? Would type="sources" suffice?
Any consideration for having a requested_version field in package references (alternatively, “constraint”)? This is along the lines of what Conda does, which would allow using a lock file to store an environment spec that can be updated later without needing to find the original source again. The needs fields are essentially this, but seem unnecessarily indirect compared to just putting the requested constraint on the package table.

brettcannon · July 30, 2021, 11:52pm

I’m going to do my best to answer everyone’s question which hasn’t been answered yet. If I missed your question then please let me know.

PEP 665: clarifications based on feedback · python/peps@ae53120 · GitHub should contain all the changes I mention below. PEP 665 – A file format to list Python dependencies for reproducibility of an application | peps.python.org with have the fastest available, rendered version of the updates. I have also updated the copy above.

Works for me! And I appreciate that the biggest ask so far has been about a name.

Done.

Would this eliminate the top-level needs key?

What if my lock file for development is different than production? If if I have different production lock files? What if I want a Read the Docs lock file, testing lock file, dev lock file, and prod lock file and they are all subtly different?

We could try to cram all of this into a single file and come up with some way to deal with conflicts (e.g. separate sections for each “grouping”), but we chose a file as it more visibly separated things and keeps the individual files at least a little smaller than they might otherwise be for trying to navigate.

Error out IMO.

Sure (that was my mistake in writing it that way to begin with).

It’s not necessary, but “source tree” is an official, defined term which matches what this is meant for.

How is this different than having that specified in needs?

FRidh · July 31, 2021, 8:46am

The PEP says the lock file is for “Python projects”. Is this meant to cover both applications and project development environments? This is unclear for me from the PEP as the former isn’t really mentioned. I think the motivation should start with use cases.

Talking about applications, I’d imagine a tool such as pipx would want to install applications from a lock file. Would it make sense then for applications to eventually do include the lock files in the sdist or wheel?

uranusjr · July 31, 2021, 6:06pm

It can, but all reasons we have both needs and required-by apply here as well, so no.

I believe it is. My personal view is that a Python library run in a local environment—development or otherwise—is an application (also a Python application is a library unless you vendor all of your dependencies including the interpreter, but that’s not relevant here), so the distinction is minimal. But I understand this application/library categorisation is very useful as a concept and important to many, so I agree the PEP should describe the use cases better.

That’s an interesting idea and definitely makese sense, but IMO there are many details to work out. One problem is a package release is basically immutable, and including a lock file in it means the dependency graph is frozen in time, along with all the security vulnarabilities and bugs discovered afterwards. This is not an issue with applications since by definition the project maintainers have control to the deployments as well and can upgrade the actual installations when needed, but this is not possible with versioned library releases. This feature is worth its own entire discussion an another PEP.

h-vetinari · August 1, 2021, 11:35am

I don’t understand several aspects about this:

Why mix project installation requirements with lock files? They serve different roles: reproducibility for the latter and specifying (ideally) timeless - i.e. unlocked - dependencies to build/run for the former
Lockfiles are usually for environments, not individual projects - in particular, lockfiles of individual projects cannot be combined trivially (as mentioned above already); it’s clear that dependencies need to be specified per project, but how lock files should then be used at scale becomes very difficult (i.e. one user installing one library more than another means their environments might be completely differnt)
The name of the PEP says “installation requirements”, but it seems mostly about locking. In particular, allowing type="source tree" opens up a Pandora’s box of ABI concerns (and lack of reproducibility) that seem unaddressed.

I’m quite surprised that conda is not discussed as prior art here at all (not least considering the previous discussions on this topic: 1, 2) - it has successfully come up with a sufficient set of scaffolding to build the entire ecosystem in an ABI-compatible way across all platforms & arches, with or without GPUs, with package variants (e.g. OpenBLAS vs. MKL), etc. etc.

In particular, one of the crucial elements are the different kind of requirements that are distinguished in conda. It’s very different for a package to runtime-depend on numpy or to be using the C-API, where then the version used at build-time affects the version usable at runtime, etc. etc.

The following is a laudable goal:

It would then be very unfortunate to reinvent something in a way that then is incompatible with an approach that reached a much higher degree of functionality already.

I’m sure the conda(-forge) people would still be interested in this format discussion (ideally with the ability to eventually “speak the same language”), so tagging some people from the two previous threads (as well as a smattering of conda(-forge) people whose discuss-handle I found): @pzwang @teoliphant @msarahan @dhirschfeld @scopatz @jezdez @jakirkham @ocefpaf @kkraus14 @minrk

uranusjr · August 1, 2021, 6:07pm

I honestly don’t understand most of the points you tried to make, and suspect we are using the same terms to describe very different ideas. So this is my attempt to describe the terminology used in the PEP, and explain why I feel you are not thinking the same things when using the same terms.

For most of Python packaging (from what I understand), a “project” merely means a bunch of source files grouped logically together and used in one collective logical context. When a project’s code is invoked in an environment, it needs some run-time dependencies, and a lock is used to describe those dependencies in a way that things external to the local environment does not affect how the description would be interpreted. In this context, per-project and user-specified requirements are naturally a part of the lock, since they describe intent (why a dependency is needed).

Using the above definition, if the goal is reproducibility, a project’s runtime environment is naturally coupled to the project itself, since every environment created to run the project should be alike (if not identical), and the lock is describing that abstract likeness. The PEP also does not mention anything about combining lock files from different projects (as you said, it can’t be done easily and should not be done generally), so I’m not sure how to make the rest of the paragraph.

As I mentioned in a comment above, a useful lock format is by no means one-size-fits-all, since there are a lot of practical definition to reproducibility. Yes, allowing things to build from source (not just source tree, but sdist as well) opens the door to things that technically break the strict definition of reproducibility, but from what I can tell (based on feedback from authors of existing tooling), most people don’t want that strict definition, and are willing and need to bend the idea for practical reasons. Since the PEP does not define what can create a lock file (but only an interoperable format between a locker and an installer), you are most free to create and use a tool that guaranteed the strictest reproducibility definition and only output such lock files if that’s your goal; and it would be usable for any installer consuming the lock file.

IMO this is out of the scope of a lock file format. As I mentioned, the format does not intend to force complete reproducibility. The PEP also does not invent any of the reproducibility features, because Python packaging already has ways to enforce those (wheel tags and environment markers), and the lock file format only needs to support them. Therefore, additional reproducibility features should not block the creation of a lock file format, since those features can and should be added to those existing mechanisms—and when they are, they automatically become a part of the lock file.

It is also weird to me that you feel Conda should be discussed as a prior art for a lock file format, because Conda (somewhat famously) does not have an equivalent to what other communities call a lock file (package-lock.json, Cargo.lock, etc.). The closest thing it has environment.yml, which is a list of user intents, and addresses nothing about how those intents should be interpreted and the reproducibility issues that come with the interpretation. So at this point I’m completely at a lost and don’t know how to continue, since we are most definitely not on the same page.

nicholdav · August 1, 2021, 6:14pm

It is also weird to me that you feel Conda should be discussed as a prior art for a lock file format, because Conda (somewhat famously) does not have an equivalent to what other communities call a lock file ( package-lock.json , Cargo.lock , etc.).

to this point, there have been efforts to create lock files for conda

so clearly that community is not seeing conda envs as a lock file

nicholdav · August 1, 2021, 6:17pm

just wanted to echo this point since I think it got lost in the discussion.

I get that needs could be intuitive for beginners, but won’t a lot of ‘advanced beginners’ (like yours truly) be more familiar with requires?

h-vetinari · August 1, 2021, 6:30pm

Indeed, a conda environment is more than a lock file, in the sense that lock files are merely reproducible snapshots of an environment. As such, locking can be achieved trivially in conda for your current platform as conda env export -f my_env.lock and restored (anywhere, assuming the same OS/arch) as conda env create -f my_env.lock.

Where conda-lock comes in is that one might want to generate lockfiles for more platforms than the current one. That’s actually also a relevant question about the PEP: how does it deal with cases where requirements differ by platform?

PEP 665: Specifying Installation Requirements for Python Projects

======== Abstract

========== Motivation

Providers

Platform/Infrastructure Providers

IDE Providers

Developers

Developers using PaaS & IaaS providers

Developers using IDEs

Developers working with other developers

Upgraders & Package Infrastructure Providers

Open Source Community

========= Rationale

Goals

Non-Goals

============= Specification

Details

version

[tool]

[metadata]

metadata.marker

metadata.tags

metadata.needs

[package]

package.<name>.version

package.<name>.needs

package.<name>.needed-by

package.<name>.code