PEP 665, take 2 -- A file format to list Python dependencies for reproducibility of an application

PEP: 665
Title: A file format to list Python dependencies for reproducibility of an application
Author: Brett Cannon brett@python.org,
Pradyun Gedam pradyunsg@gmail.com,
Tzu-ping Chung uranusjr@gmail.com
PEP-Delegate: Paul Moore p.f.moore@gmail.com
Discussions-To: PEP 665: Specifying Installation Requirements for Python Projects
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 29-Jul-2021
Post-History: 29-Jul-2021, 03-Nov-2021, 25-Nov-2021
Resolution:

========
Abstract

This PEP specifies a file format to specify the list of Python package
installation requirements for an application, and the relation between
the specified requirements. The list of requirements is considered
exhaustive for the installation target, and thus not requiring any
information beyond the platform being installed for, and the file
itself. The file format is flexible enough to allow installing the
requirements across different platforms, which allows for
reproducibility on multiple platforms from the same file.

===========
Terminology

There are several terms whose definition must be agreed upon in order
to facilitate a discussion on the topic of this PEP.

A package is something you install as a dependency and use via the
import system. The packages on PyPI are an example of this.

An application or app is an end product that other external code
does not directly rely on via the import system (i.e. they are
standalone). Desktop applications, command-line tools, etc. are
examples of applications.

A lock file records the packages that are to be installed for an
app. Traditionally, the exact version of the package to be installed
is specified by a lock file, but specified packages are not always
installed on a given platform (according a filtering logic described
in a later section), which enables the lock file to describe
reproducibility across multiple platforms. Examples of this are
package-lock.json from npm_, Poetry.lock from Poetry_, etc.

Locking is the act of taking the input of the packages an app
depends on and producting a lock file from that.

A locker is a tool which produces a lock file.

An installer consumes a lock file to install what the lock file
specifies.

==========
Motivation

Applications want reproducible installs for a few reasons (we are not
worrying about package development, integration into larger systems
that would handle locking dependencies external to the Python
application, or other situations where flexible installation
requirements are desired over strict, reproducible installations).

One, reproducibility eases development. When you and your fellow
developers all end up with the same files on a specific platform, you
make sure you are all developing towards the same experience for the
application. You also want your users to install the same files as
you expect to guarantee the experience is the same as you developed
for them.

Two, you want to be able to reproduce what gets installed across
multiple platforms. Thanks to Python’s portability across operating
systems, CPUs, etc., it is very easy and often desirable to create
applications that are not restricted to a single platform. Thus, you
want to be flexible enough to allow for differences in your package
dependencies between platforms, while still having consistency
and reproducibility on any one specific platform.

Three, reproducibility is more secure. When you control exactly what
files are installed, you can make sure no malicious actor is
attempting to slip nefarious code into your application (i.e. some
supply chain attacks). By using a lock file which always leads to
reproducible installs, we can avoid certain risks entirely.

Four, relying on the wheel file_ format provides reproducibility
without requiring build tools to support reproducibility themselves.
Thanks to wheels being static and not executing code as part of
installation, wheels always lead to a reproducible result. Compare
this to source distributions (aka sdists) or source trees which only
lead to a reproducible install if their build tool supports
reproducibility due to inherent code execution. Unfortunately the vast
majority of build tools do not support reproducible builds, so this
PEP helps alleviate that issue by only supporting wheels as a package
format.

This PEP proposes a standard for a lock file, as the current solutions
don’t meet the outlined goals. Today, the closest we come to a lock
file standard is the requirements file format_ from pip.
Unfortunately, that format does not lead to inherently reproducible
installs (it requires optional features both in the requirements file
and the installer itself, to be discussed later).

The community itself has also shown a need for lock files based on the
fact that multiple tools have independently created their own lock
file formats:

#. PDM_
#. pip-tools_
#. Pipenv_
#. Poetry_
#. Pyflow_

Unfortunately, those tools all use differing lock file formats. This
means tooling around these tools much be unique. This impacts tooling
such as code editors and hosting providers, which want to be as
flexible as possible when it comes to accepting a user’s application
code, but also have a limit as to how much development resources they
can spend to add support for yet another lock file format. A
standardized format would allow tools to focus their work on a single
target, and make sure that workflow decisions made by developers
outside of the lock file format are of no concern to e.g. hosting
providers.

Other programming language communities have also shown the usefulness
of lock files by developing their own solution to this problem. Some
of those communities include:

#. Dart_
#. npm_/Node
#. Go
#. Rust_

The trend in programming languages in the past decade seems to have
been toward providing a lock file solution.

=========
Rationale


File Format

We wanted the file format to be easy to read as a diff when auditing
a change to the lock file. As such, and thanks to PEP 518 and
pyproject.toml, we decided to go with the TOML_ file format.


Secure by Design

Viewing the requirements file format_ as the closest we have to
a lock file standard, there are a few issues with the file format when
it comes to security. First is that the file format simply does not
require you to specify the exact version of a package. This is why
tools like pip-tools_ exist to help manage that users of
requirements files.

Second, you must opt into specifying what files are acceptable to be
installed by using the --hash argument for a specific dependency.
This is also optional with pip-tools as it requires specifying the
--generate-hashes CLI argument.

Third, even when you control what files may be installed, it does not
prevent other packages from being installed. If a dependency is not
listed in the requirements file, pip will happily go searching for a
file to meet that need. You must specify --no-deps as an
argument to pip to prevent unintended dependency resolution outside
of the requirements file.

Fourth, the format allows for installing a
source distribution file_ (aka “sdist”). By its very nature,
installing an sdist requires executing arbitrary Python code, meaning
that there is no control over what files may be installed. Only by
specifying --only-binary :all: can you guarantee pip to only use a
wheel file_ for each package.

To recap, in order for a requirements file to be as secure as what is
being proposed, a user should always do the following steps:

#. Use pip-tools and its command pip-compile --generate-hashes
#. Install the requirements file using
pip install --require-hashes --no-deps --only-binary :all:

Critically, all of those flags, and both the specificity and
exhaustion of what to install that pip-tools provides, are optional
for requirements files.

As such, the proposal raised in this PEP is secure by design which
combats some supply chain attacks. Hashes for files which would be
used to install from are required. You can only install from
wheels to unambiguously define what files will be placed in the file
system. Installers must lead to an deterministic installation
from a lock file for a given platform. All of this leads to a
reproducible installation which you can deem trustworthy (when you
have audited the lock file and what it lists).


Cross-Platform

Various projects which already have a lock file, like PDM_ and
Poetry_, provide a lock file which is cross-platform. This allows
for a single lock file to work on multiple platforms while still
leading to the exact same top-level requirements to be installed
everywhere with the installation being consistent/unambiguous on
each platform.

As to why this is useful, let’s use an example involving PyWeek_
(a week-long game development competition). Assume you are developing
on Linux, while someone you choose to partner with is using macOS.
Now assume the judges are using Windows. How do you make sure everyone
is using the same top-level dependencies, while allowing for any
platform-specific requirements (e.g. a package requires a helper
package under Windows)?

With a cross-platform lock file, you can make sure that the key
requirements are met consistently across all platforms. You can then
also make sure that all users on the same platform get the same
reproducible installation.


Simple Installer

The separation of concerns between a locker and an installer allows
for an installer to have a much simpler operation to perform. As
such, it not only allows for installers to be easier to write, but
facilitates in making sure installers create unambiguous, reproducible
installations correctly.

The installer can also expend less computation/energy in creating the
installation. This is beneficial not only for faster installs, but
also from an energy consumption perspective, as installers are
expected to be run more often than lockers.

This has led to a design where the locker must do more work upfront
to the benefit installers. It also means the complexity of package
dependencies is simpler and easier to comprehend in a lock files to
avoid ambiguity.

=============
Specification


Details

Lock files MUST use the TOML_ file format. This not only prevents the
need to have another file format in the Python packaging ecosystem
thanks to its adoption by PEP 518 for pyproject.toml, but also
assists in making lock files more human-readable.

Lock files MUST end their file names with .pylock.toml. The
.toml part unambiguously distinguishes the format of the file,
and helps tools like code editors support the file appropriately. The
.pylock part distinguishes the file from other TOML files the user
has, to make the logic easier for tools to create functionality
specific to Python lock files, instead of TOML files in general.

The following sections are the top-level keys of the TOML file data
format. Any field not listed as required is considered optional.

version

This field is required.

The version of the lock file being used. The key MUST be a string
consisting of a number that follows the same formatting as the
Metadata-Version key in the core metadata spec_.

The value MUST be set to "1.0" until a future PEP allows for a
different value. The introduction of a new optional key to the file
format SHOULD increase the minor version. The introduction of a new
required key or changing the format MUST increase the major version.
How to handle other scenarios is left as a per-PEP decision.

Installers MUST warn the user if the lock file specifies a version
whose major version is supported but whose minor version is
unsupported/unrecognized (e.g. the installer supports "1.0", but
the lock file specifies "1.1").

Installers MUST raise an error if the lock file specifies a major
version which is unsupported (e.g. the installer supports "1.9"
but the lock file specifies "2.0").

created-at

This field is required.

The timestamp for when the lock file was generated (using TOML’s
native timestamp type). It MUST be recorded using the UTC time zone to
avoid ambiguity.

If the SOURCE_DATE_EPOCH_ environment variable is set, it MUST be used
as the timestamp by the locker. This facilitates reproducibility of
the lock file itself.

[tool]

Tools may create their own sub-tables under the tool table. The
rules for this table match those for pyproject.toml and its
[tool] table from the build system declaration spec_.

[metadata]

This table is required.

A table containing data applying to the overall lock file.

metadata.marker

A key storing a string containing an environment marker as
specified in the dependency specifier spec_.

The locker MAY specify an environment marker which specifies any
restrictions the lock file was generated under.

If the installer is installing for an environment which does not
satisfy the specified environment marker, the installer MUST raise an
error as the lock file does not support the target installation
environment.

metadata.tag

A key storing a string specifying platform compatibility tags_
(i.e. wheel tags). The tag MAY be a compressed tag set.

If the installer is installing for an environment which does not
satisfy the specified tag (set), the installer MUST raise an error
as the lock file does not support the targeted installation
environment.

metadata.requires

This field is required.

An array of strings following the dependency specifier spec_. This
array represents the top-level package dependencies of the lock file
and thus the root of the dependency graph.

metadata.requires-python

A string specifying the supported version(s) of Python for this lock
file. It follows the same format as that specified for the
Requires-Python field in the core metadata spec_.

[[package._name_._version_]]

This array is required.

An array per package and version containing entries for the potential
(wheel) files to install (as represented by _name_ and
_version_, respectively).

Lockers MUST normalize a project’s name according to the
simple repository API_. If extras are specified as part of the
project to install, the extras are to be included in the key name and
are to be sorted in lexicographic order.

Within the file, the tables for the projects SHOULD be sorted by:

#. Project/key name in lexicographic order
#. Package version, newest/highest to older/lowest according to the
version specifiers spec_
#. Optional dependencies (extras) via lexicographic order
#. File name based on the filename field (discussed
below)

These recommendations are to help minimize diff changes between tool
executions.

package._name_._version_.filename

This field is required.

A string representing the base name of the file as represented by an
entry in the array (i.e. what
os.path.basename()/pathlib.PurePath.name represents). This
field is required to simplify installers as the file name is required
to resolve wheel tags derived from the file name. It also guarantees
that the association of the array entry to the file it is meant for is
always clear.

[package._name_._version_.hashes]

This table is required.

A table with keys specifying a hash algorithm and values as the hash
for the file represented by this entry in the
package._name_._version_ table.

Lockers SHOULD list hashes in lexicographic order. This is to help
minimize diff sizes and the potential to overlook hash value changes.

An installer MUST only install a file which matches one of the
specified hashes.

package._name_._version_.url

A string representing a URL where to get the file.

The installer MAY support any schemes it wants for URLs. A URL with no
scheme MUST be assumed to be a local file path (both relative paths to
the lock file and absolute paths). Installers MUST support, at
minumum, HTTPS URLs as well as local file paths.

An installer MAY choose to not use the URL to retrieve a file
if a file matching the specified hash can be found using alternative
means (e.g. on the file system in a cache directory).

package._name_._version_.direct

A boolean representing whether an installer should consider the
project installed “directly” as specified by the
direct URL origin of installed distributions spec_.

If the key is true, then the installer MUST follow the
direct URL origin of installed distributions spec_ for recording
the installation as “direct”.

package._name_._version_.requires-python

A string specifying the support version(s) of Python for this file. It
follows the same format as that specified for the
Requires-Python field in the core metadata spec_.

package._name_._version_.requires

An array of strings following the dependency specifier spec_ which
represent the dependencies of this file.


Example

::

    version = "1.0"
    created-at = 2021-10-19T22:33:45.520739+00:00

    [tool]
    # Tool-specific table.

    [metadata]
    requires = ["mousebender", "coveragepy[toml]"]
    marker = "sys_platform == 'linux'"  # As an example for coverage.
    requires-python = ">=3.7"

    [[package.attrs."21.2.0"]]
    filename = "attrs-21.2.0-py2.py3-none-any.whl"
    hashes.sha256 = "149e90d6d8ac20db7a955ad60cf0e6881a3f20d37096140088356da6c716b0b1"
    url = "https://files.pythonhosted.org/packages/20/a9/ba6f1cd1a1517ff022b35acd6a7e4246371dfab08b8e42b829b6d07913cc/attrs-21.2.0-py2.py3-none-any.whl"
    requires-python = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*"

    [[package.attrs."21.2.0"]]
    # If attrs had another wheel file (e.g. that was platform-specific),
    # it could be listed here.

    [[package."coveragepy[toml]"."6.2.0"]]
    filename = "coverage-6.2-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl"
    hashes.sha256 = "c7912d1526299cb04c88288e148c6c87c0df600eca76efd99d84396cfe00ef1d"
    url = "https://files.pythonhosted.org/packages/da/64/468ca923e837285bd0b0a60bd9a287945d6b68e325705b66b368c07518b1/coverage-6.2-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl"
    requires-python = ">=3.6"
    requires = ["tomli"]

    [[package."coveragepy[toml]"."6.2.0"]]
    filename = "coverage-6.2-cp310-cp310-musllinux_1_1_x86_64.whl "
    hashes.sha256 = "276651978c94a8c5672ea60a2656e95a3cce2a3f31e9fb2d5ebd4c215d095840"
    url = "https://files.pythonhosted.org/packages/17/d6/a29f2cccacf2315150c31d8685b4842a6e7609279939a478725219794355/coverage-6.2-cp310-cp310-musllinux_1_1_x86_64.whl"
    requires-python = ">=3.6"
    requires = ["tomli"]

    # More wheel files for `coverage` could be listed for more
    # extensive support (i.e. all Linux-based wheels).

    [[package.mousebender."2.0.0"]]
    filename = "mousebender-2.0.0-py3-none-any.whl"
    hashes.sha256 = "a6f9adfbd17bfb0e6bb5de9a27083e01dfb86ed9c3861e04143d9fd6db373f7c"
    url = "https://files.pythonhosted.org/packages/f4/b3/f6fdbff6395e9b77b5619160180489410fb2f42f41272994353e7ecf5bdf/mousebender-2.0.0-py3-none-any.whl"
    requires-python = ">=3.6"
    requires = ["attrs", "packaging"]

    [[package.packaging."20.9"]]
    filename = "packaging-20.9-py2.py3-none-any.whl"
    hashes.blake-256 = "3e897ea760b4daa42653ece2380531c90f64788d979110a2ab51049d92f408af"
    hashes.sha256 = "67714da7f7bc052e064859c05c595155bd1ee9f69f76557e21f051443c20947a"
    url = "https://files.pythonhosted.org/packages/3e/89/7ea760b4daa42653ece2380531c90f64788d979110a2ab51049d92f408af/packaging-20.9-py2.py3-none-any.whl"
    requires-python = ">=3.6"
    requires = ["pyparsing"]

    [[package.pyparsing."2.4.7"]]
    filename = "pyparsing-2.4.7-py2.py3-none-any.whl"
    hashes.sha256 = "ef9d7589ef3c200abe66653d3f1ab1033c3c419ae9b9bdb1240a85b024efc88b"
    url = "https://files.pythonhosted.org/packages/8a/bb/488841f56197b13700afd5658fc279a2025a39e22449b7cf29864669b15d/pyparsing-2.4.7-py2.py3-none-any.whl"
    direct = true  # For demonstration purposes.
    requires-python = ">=2.6, !=3.0.*, !=3.1.*, !=3.2.*"

    [[package.tomli."2.0.0"]]
    filename = "tomli-2.0.0-py3-none-any.whl"
    hashes.sha256 = "b5bde28da1fed24b9bd1d4d2b8cba62300bfb4ec9a6187a957e8ddb9434c5224"
    url = "https://files.pythonhosted.org/packages/e2/9f/5e1557a57a7282f066351086e78f87289a3446c47b2cb5b8b2f614d8fe99/tomli-2.0.0-py3-none-any.whl"
    requires-python = ">=3.7"

Expectations for Lockers

Lockers MUST create lock files for which a topological sort of the
packages which qualify for installation on the specified platform
results in a graph for which only a single version of any package
qualifies for installation and there is at least one compatible file
to install for each package. This leads to a lock file for any
supported platform where the only decision an installer can make
is what the “best-fitting” wheel is to install (which is discussed
below).

Lockers are expected to utilize metadata.marker, metadata.tag,
and metadata.requires-python as appropriate as well as environment
markers specified via requires and Python version requirements via
requires-python to enforce this result for installers. Put another
way, the information used in the lock file is not expected to be
pristine/raw from the locker’s input and instead is to be changed as
necessary to the benefit of the locker’s goals.


Expectations for Installers

The expected algorithm for resolving what to install is:

#. Construct a dependency graph based on the data in the lock file
with metadata.requires as the starting/root point.
#. Eliminate all files that are unsupported by the specified platform.
#. Eliminate all irrelevant edges between packages based on marker
evaluation for requires.
#. Raise an error if a package version is still reachable from the
root of the dependency graph but lacks any compatible file.
#. Verify that all packages left only have one version to install,
raising an error otherwise.
#. Install the best-fitting wheel file for each package which
remains.

Installers MUST follow a deterministic algorithm determine what the
“best-fitting wheel file” is. A simple solution for this is to
rely upon the packaging project <https://pypi.org/p/packaging/>__
and its packaging.tags module to determine wheel file precedence.

Installers MUST support installing into an empty environment.
Installers MAY support installing into an environment that already
contains installed packages (and whatever that would entail to be
supported).

========================
(Potential) Tool Support

The pip_ team has said <https://github.com/pypa/pip/issues/10636>__
they are interested in supporting this PEP if accepted. The current
proposal for pip may even
supplant the need <https://github.com/jazzband/pip-tools/issues/1526#issuecomment-961883367>__
for pip-tools_.

PDM_ has also said they would
support the PEP <https://github.com/pdm-project/pdm/issues/718>__
if accepted.

Pyflow_ has said they
"like the idea" <https://github.com/David-OConnor/pyflow/issues/153#issuecomment-962482058>__
of the PEP.

Poetry_ has said they would not support the PEP as-is because
"Poetry supports sdists files, directory and VCS dependencies which are not supported" <https://github.com/python-poetry/poetry/issues/4710#issuecomment-973946104>.
Recording requirements at the file level, which is on purpose to
better reflect what can occur when it comes to dependencies,
"is contradictory to the design of Poetry" <https://github.com/python-poetry/poetry/issues/4710#issuecomment-973946104>
.
This also excludes export support to a this PEP’s lock file as
"Poetry exports the information present in the poetry.lock file into another format" <https://github.com/python-poetry/poetry/issues/4710#issuecomment-974551351>__
and sdists and source trees are included in Poetry.lock files.
Thus it is not a clean translation from Poetry’s lock file to this
PEP’s lock file format.

=======================
Backwards Compatibility

As there is no pre-existing specification regarding lock files, there
are no explicit backwards compatibility concerns.

As for pre-existing tools that have their own lock file, some updating
will be required. Most document the lock file name, but not its
contents. For projects which do not commit their lock file to
version control, they will need to update the equivalent of their
.gitignore file. For projects that do commit their lock file to
version control, what file(s) get committed will need an update.

For projects which do document their lock file format like pipenv_,
they will very likely need a major version release which changes the
lock file format.

===============
Transition Plan

In general, this PEP could be considered successful if:

#. Two pre-existing tools became lockers (e.g. pip-tools, PDM,
pip_ via pip freeze).
#. Pip became an installer.
#. One major, non-Python-specific platform supported the file format
(e.g. a cloud provider).

This would show interoperability, usability, and programming
community/business acceptance.

In terms of a transition plan, there are potentially multiple steps
that could lead to this desired outcome. Below is a somewhat idealized
plan that would see this PEP being broadly used.


Usability

First, a pip freeze equivalent tool could be developed which
creates a lock file. While installed packages do not by themselves
provide enough information to statically create a lock file, a user
could provide local directories and index URLs to construct one. This
would then lead to lock files that are stricter than a requirements
file by limiting the lock file to the current platform. This would
also allow people to see whether their environment would be
reproducible.

Second, a stand-alone installer should be developed. As the
requirements on an installer are much simpler than what pip provides,
it should be reasonable to have an installer that is independently
developed.

Third, a tool to convert a pinned requirements file as emitted by
pip-tools could be developed. Much like the pip freeze equivalent
outlined above, some input from the user may be needed. But this tool
could act as a transitioning step for anyone who has an appropriate
requirements file. This could also act as a test before potentially
having pip-tools grow some --lockfile flag to use this PEP.

All of this could be required before the PEP transitions from
conditional acceptance to full acceptance (and give the community a
chance to test if this PEP is potentially useful).


Interoperability

At this point, the goal would be to increase interoperability between
tools.

First, pip would become an installer. By having the most widely used
installer support the format, people can innovate on the locker side
while knowing people will have the tools necessary to actually consume
a lock file.

Second, pip becomes a locker. Once again, pip’s reach would make the
format accessible for the vast majority of Python users very quickly.

Third, a project with a pre-existing lock file format supports at
least exporting to the lock file format (e.g. PDM or Pyflow). This
would show that the format meets the needs of other projects.


Acceptance

With the tooling available throughout the community, acceptance would
be shown via those not exclusively tied to the Python community
supporting the file format based on what they believe their users
want.

First, tools that operate on requirements files like code editors
having equivalent support for lock files.

Second, consumers of requirements files like cloud providers would
also accept lock files.

At this point the PEP would have permeated out far enough to be on
par with requirements files in terms of general accpetance and
potentially more if projects had dropped their own lock files for this
PEP.

=====================
Security Implications

A lock file should not introduce security issues but instead help
solve them. By requiring the recording of hashes for files, a lock
file is able to help prevent tampering with code since the hash
details were recorded. Relying on only wheel files means what files
will be installed can be known ahead of time and is reproducible. A
lock file also helps prevent unexpected package updates being
installed which may in turn be malicious.

=================
How to Teach This

Teaching of this PEP will very much be dependent on the lockers and
installers being used for day-to-day use. Conceptually, though, users
could be taught that a lock file specifies what should be installed
for a project to work. The benefits of consistency and security should
be emphasized to help users realize why they should care about lock
files.

========================
Reference Implementation

I proof-of-concept locker can be found at
GitHub - frostming/pep665_poc: A POC implementation of PEP 665 . Not installer has been
implemented yet, but the design of this PEP suggests the locker is the
more difficult asepect to implment.

==============
Rejected Ideas


File Formats Other Than TOML

JSON_ was briefly considered, but due to:

#. TOML already being used for pyproject.toml
#. TOML being more human-readable
#. TOML leading to better diffs

the decision was made to go with TOML. There was some concern over
Python’s standard library lacking a TOML parser, but most packaging
tools already use a TOML parser thanks to pyproject.toml so this
issue did not seem to be a showstopper. Some have also argued against
this concern in the past by the fact that if packaging tools abhor
installing dependencies and feel they can’t vendor a package then the
packaging ecosystem has much bigger issues to rectify than the need to
depend on a third-party TOML parser.


Alternative Naming Schemes

Specifying a directory to install file to was considered, but
ultimately rejected due to people’s distaste for the idea.

It was also suggested to not have a special file name suffix, but it
was decided that hurt discoverability by tools too much.


Supporting a Single Lock File

At one point the idea of only supporting single lock file which
contained all possible lock information was considered. But it quickly
became apparent that trying to devise a data format which could
encompass both a lock file format which could support multiple
environments as well as strict lock outcomes for
reproducible builds would become quite complex and cumbersome.

The idea of supporting a directory of lock files as well as a single
lock file named pyproject-lock.toml was also considered. But any
possible simplicity from skipping the directory in the case of a
single lock file seemed unnecessary. Trying to define appropriate
logic for what should be the pyproject-lock.toml file and what
should go into pyproject-lock.d seemed unnecessarily complicated.


Using a Flat List Instead of a Dependency Graph

The first version of this PEP proposed that the lock file have no
concept of a dependency graph. Instead, the lock file would list
exactly what should be installed for a specific platform such that
installers did not have to make any decisions about what to install,
only validating that the lock file would work for the target platform.

This idea was eventually rejected due to the number of combinations
of potential PEP 508 environment markers. The decision was made that
trying to have lockers generate all possible combinations as
individual lock files when a project wants to be cross-platform would
be too much.


Use Wheel Tags in the File Name

Instead of having the metadata.tag field there was a suggestion
of encoding the tags into the file name. But due to the addition of
the metadata.marker field and what to do when no tags were needed,
the idea was dropped.


Alternative Names for requires

Some other names for what became requires were installs,
needs, and dependencies. Initially this PEP chose needs
after asking a Python beginner which term they preferred. But based
on feedback on an earlier draft of this PEP, requires was chosen
as the term.


Accepting PEP 650

PEP 650 was an earlier attempt at trying to tackle this problem by
specifying an API for installers instead of standardizing on a lock
file format (ala PEP 517). The
initial response <https://discuss.python.org/t/pep-650-specifying-installer-requirements-for-python-projects/6657/>__
to PEP 650 could be considered mild/lukewarm. People seemed to be
consistently confused over which tools should provide what
functionality to implement the PEP. It also potentially incurred more
overhead as it would require executing Python APIs to perform any
actions involving packaging.

This PEP chooses to standardize around an artifact instead of an API
(ala PEP 621). This would allow for more tool integrations as it
removes the need to specifically use Python to do things such as
create a lock file, update it, or even install packages listed in
a lock file. It also allows for easier introspection by forcing
dependency graph details to be written in a human-readable format.
It also allows for easier sharing of knowledge by standardizing what
people need to know more (e.g. tutorials become more portable between
tools when it comes to understanding the artifact they produce). It’s
also simply the approach other language communities have taken and
seem to be happy with.

Acceptance of this PEP would mean PEP 650 gets rejected.


Specifying Requirements per Package Instead of per File

An earlier draft of this PEP specified dependencies at the package
level instead of per file. While this has traditionally been how
packaging systems work, it actually did not reflect accurately how
things are specified. As such, this PEP was subsequently updated to
reflect the granularity that dependencies can truly be specified at.


Specify Where Lockers Gather Input

This PEP does not specify how a locker gets its input. An initial
suggestion was to partially reuse PEP 621, but due to disagreements
on how flexible the potential input should be in terms of specifying
things such as indexes, etc., it was decided this would best be left
to a separate PEP.


Allowing Source Distributions and Source Trees to be an Opt-In, Supported File Format

After extensive discussion <https://discuss.python.org/t/supporting-sdists-and-source-trees-in-pep-665/11869/>__,
it was decided that this PEP would not support source distributions
(aka sdists) or source trees as an acceptable format for code.
Introducing sdists and source trees to this PEP would immediately undo
the reproducibility and security goals due to needing to execute code
to build the sdist or source tree. It would also greatly increase
the complexity for (at least) installers as the dynamic build nature
of sdists and source trees means the installer would need to handle
fully resolving whatever requirements the sdists produced dynamically,
both from a building and installation perspective.

Due to all of this, it was decided it was best to have a separate
discussion about what supporting sdists and source trees after
this PEP is accepted/rejected. As the proposed file format is
versioned, introducing sdists and source tree support in a later PEP
is doable.

It should be noted, though, that this PEP is not stop an
out-of-band solution from being developed to be used in conjunction
with this PEP. Building wheel files from sdists and shipping them with
code upon deployment so they can be included in the lock file is one
option. Another is to use a requirements file just for sdists and
source trees, then relying on a lock file for all wheels.

===========
Open Issues

None.

===============
Acknowledgments

Thanks to Frost Ming of PDM_ and SĂ©bastien Eustace of Poetry_ for
providing input around dynamic install-time resolution of PEP 508
requirements.

Thanks to Kushal Das for making sure reproducible builds stayed a
concern for this PEP.

Thanks to Andrea McInnes for initially settling the bikeshedding and
choosing the paint colour of needs (at which point people ralled
around the requires colour instead).

=========
Copyright

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.

… _build system declaration spec: Declaring build system dependencies — Python Packaging User Guide
… _core metadata spec: Core metadata specifications — Python Packaging User Guide
… _Dart: https://dart.dev/
… _dependency specifier spec: Dependency specifiers — Python Packaging User Guide
… _direct URL origin of installed distributions spec: Recording the Direct URL Origin of installed distributions — Python Packaging User Guide
… _Git: https://git-scm.com/
… _Go: https://go.dev/
… _JSON: https://www.json.org/
… _npm: https://www.npmjs.com/
… _PDM: pdm · PyPI
… _pip: https://pip.pypa.io/
… _pip-tools: pip-tools · PyPI
… _Pipenv: pipenv · PyPI
… _platform compatibility tags: Platform compatibility tags — Python Packaging User Guide
… _Poetry: poetry · PyPI
… _Pyflow: pyflow · PyPI
… _PyWeek: https://pyweek.org/
… _requirements file format: Requirements File Format - pip documentation v22.0.dev0
… _Rust: https://www.rust-lang.org/
… _SecureDrop: https://securedrop.org/
… _simple repository API: Simple repository API — Python Packaging User Guide
… _source distribution file: Source distribution format — Python Packaging User Guide
… _SOURCE_DATE_EPOCH: SOURCE_DATE_EPOCH specification
… _TOML: https://toml.io
… _version specifiers spec: Version specifiers — Python Packaging User Guide
… _wheel file: Binary distribution format — Python Packaging User Guide

…
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:

5 Likes

Re-posting the link to the rst-rendered version for easier reference:

Is there a PR we can use to comment inline?

Unless it’s a typo or similar non-substantive issue, I’d prefer comments to be on this thread. I will not be reviewing github PR comments when making a decision on the PEP.

Of course, it’s up to you (and the PEP authors) where you choose to discuss anything that you’re happy to be excluded from the decision process (except in terms of any changes to the PEP that it provokes).

They are formulations that IMHO are sub-optimal. It’s more than a typo but less than questioning the whole existence. I could enlist them here, but that would be harder to follow and discusses…

2 Likes

The PR is PEP 665: take 2 by brettcannon · Pull Request #2131 · python/peps · GitHub

Is the std-lib’s inability to natively parse Z-suffixed UTC iso-format date-time strings the only reason to use the full time-zone offset in the created-at field? I feel if UTC is the only allowed time-zone for that format, then the PEP should follow the conventions recommended by database guides and use Z instead.

It’s pretty easy to write code which produces the same behaviour to support this

# parse
dt_str = created_at.replace("Z", "+00:00")
dt = datetime.fromisoformat(dt_str)

# produce
dt_utc = dt.astimezone(timezone.utc)
dt_str = dt_utc.isoformat("T")
created_at = dt_str.replace("+00:00", "Z")

Edit: as noted below, TOML handles the date-time format

1 Like

That and I think it’s more unambiguous than Z if you don’t know about it ahead of time. For instance, if some beginner opened up a .pylock.toml file and saw a datetime ending in Z would they know what it meant? And how easy is it to find out what Z means?

Why do you think we should specifically follow what databases suggest in this instance? Does ISO 8601 have something specific to say on the matter? RFC 3339 is silent on the matter.

Basically my question comes down to is using Z worth forcing potentially every consumer of this file to have to have that extra replace() call to parse the string?

4 Likes

For details as to what it might take to get Z into datetime, see Issue 35829: datetime: parse "Z" timezone suffix in fromisoformat() - Python tracker .

I have reached out to various projects to ask them if they would support this PEP:

If anyone knows of a project that you feel I missed and could/should ask for support of this PEP, please let me know and I will reach out.

I don’t know if tox/nox would be relevant in this context…

Depends on the level of optimizations they want to provide. I cannot speak for nox, but as far as tox goes we already implemented a pip requirements.txt parser, so we likely would be involved in this to.

1 Like

To me the “decision” to not use Z is more like “meh, who cares”. If anything, since the timestamp can only be in UTC, we don’t need the timezone part in the first place, and the addition is simply for explicitness, so the goal should be to have something as simple as possible, and +00:00 is that.

2 Likes

There should be generators and consumers, among consumers I think pipx is related?

It’s the business of the TOML parser, and TOML v1.0.0 spec allows Z-suffixed format. So IMHO it doesn’t make sense to restrict the datetime in created-at field, both +00:00 and Z are parsed as the same datetime object in Python. The strings are to be consumed by the TOML parser not datetime.isoformat and it is not necessary to do validation against the raw TOML string.

1 Like

17 posts were split to a new topic: Supporting sdists and source trees in PEP 665

Since we’re talking about reproducibility, what’s the use case for the non-reproducible created_at timestamp? The PEP doesn’t address it.

While it won’t affect the reproducibility of an application that a .pylock.toml file describes, it will affect the bit-for-bit reproducibility of anything that may contain a .pylock.toml file.

2 Likes

Thank you for working on this!

As a distro packager, the PEP doesn’t affect my use case directly, but I’d be happy if it briefly mentioned that it doesn’t:

Development is one use case of many. Please consider expanding it into something like this (at the end of Motivation?):

Lock files are not meant as a replacement for specifying flexible dependencies: ranges and constraints as described in PEP 440.
Flexible dependencies are useful for development or as input for generating lock files. Also, they can be useful for integrating applications into larger environments which aren’t necessarily Python-centric, such as Linux distros. Such environments must of course handle integration, integration testing, and reproducibility on their own.


The PEP’s title is a mouthful, but I’ll still suggest making it just a bit longer. Could it be “PEP 665 – Lock file: A file format to list Python dependencies for reproducibility of an application”?

I’m terrible at remembering numbers, so I’d like to encourage people to write “PEP 665: Lock file” or “the lock file PEP” rather than “PEP 665”.


fromisoformat accepts other separators than T. Please either allow a space as well (for readability), or mention that a fixed format also benefits other (possibly non-Python) tools.
(This point may be moot if TOML handles the datetime.)


Typo: unambiguously


I first read this as “project” being the thing managed by Poetry. Consider rewording this as: “even if Poetry chooses not to adopt this PEP as its primary lock file format.”

It is a conscious effort (well, my conscious insistence) to avoid Lock File in both the title and abstract because I observed some people have a preconceived perception to this term (that’s different from the definition used by PEP 665), and tend to want to force what they think the term means onto the PEP. I want to make the title and abstract as unambiguous as possible by only describing the idea with better defined terms instead.

Would “PEP 665 – pylock.toml: A file format …” work?

1 Like