This is the Draft PEP proposal that follows from Pip freeze, vcs urls and pep 517 (feat. editable installs).
I’ll integrate remarks at https://github.com/sbidoul/peps/blob/source_url-sbi/pep-9999.rst.
Looking forward to reading your comments.
-sbi
PEP: 9999
Title: Recording the origin of distributions installed from direct URL references
Author: Stéphane Bidoul <stephane.bidoul@acsone.eu>
Sponsor: Chris Jerdonek <???>
Discussions-To: https://discuss.python.org/t/recording-the-source-url-of-an-installed-distribution/1535
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 21-Apr-2019
Post-History:
Abstract
========
Following PEP 440, a distribution can be identified by a name and either a
version, or a direct reference (see `PEP440 Direct References`_).
After installation, the name and version are captured in the project metadata,
but currently there is no way to obtain details of the URL used when the
distribution was identified by a direct reference.
This proposal defines
additional metadata, to be added to the installed distribution by the
installation front end, which records the direct reference for use by
consumers which introspect the database of installed packages (see PEP 376).
Motivation
==========
The main motivation of this PEP is allowing tools attempting to "freeze" the
state of a python environment to work in a broader range of situations.
This PEP originated from the need to implement `pip issue #609`_:
i.e. improving the behavior of ``pip freeze`` in presence of distributions
installed from direct URL references. It follows a
`thread on discuss.python.org`_ about the best course of action to implement
it.
Installation from direct references
-----------------------------------
Python installers such as pip are capable of downloading and installing
distributions from package indexes. They are also capable of downloading
and installing source code from requirements specifying arbitrary URLs of
source archives and Version Control Systems (VCS) repositories,
as standardized in `PEP440 Direct References`_.
In other words two relevant installation modes exist.
1. the package to install is specified as a name and version specifier:
In this case, the installer looks in a package index (or optionally
using --find-links in the case of pip) to find the distribution to install.
2. The package to install is specified as a direct URL reference:
In this case, the installer downloads whatever is specified by the URL
(typically a wheel, a source archive or a VCS repository) and installs it.
In this mode, installers typically download the source code in a
temporary directory, invoke the PEP 517 build backend to produce a wheel
if needed, install the wheel, and delete the temporary directory.
After installation, no trace of the URL the user requested to download the
package is left on the user system.
Freezing an environment
-----------------------
Pip also sports a command named ``pip freeze`` which examines the Database of
Installed Python Distributions to generate a list of requirements. The main
goal of this command is to help users generating a list of requirements that
will later allow the re-installation the same environment with the highest
possible fidelity.
The ``pip freeze`` command outputs a ``name==version`` line for each installed
distribution (except for editable installs). To achieve the goal of
reinstalling the same environment, this requires the (name, version)
tuple to refer to an immutable version of the
distribution. The immutability is guaranteed by package indexes
such as Warehouse. The package index to use is typically known from
environmental or command line parameters of the installer.
This freeze mechanism therefore works fine for installation mode 1 (i.e.
when the package to install was specified as a name plus version specifier).
For installation mode 2, i.e. when the package to install was specified as a
direct URL reference, the ``name==version`` tuple is obviously not sufficient
to reinstall the same distribution and users of the freeze command expect it
to output the URL that was originally requested.
The reasoning above is equally applicable to tools, other than ``pip freeze``,
that would attempt to generate a ``Pipfile.lock`` or any other similar format
from the Database of Installed Python Distributions. Unless specified
otherwise, "freeze" is used in this document as a generic term for such
an operation.
The importance of installing from (VCS) URLs for application integrators
------------------------------------------------------------------------
For an application integrator, it is important to be able to reliably install
and freeze unreleased version of python distributions.
For instance when a developer needs to deploy an unreleased patched version
of a dependency, it is common to install the dependency directly from a VCS
branch that has the patch, while waiting for the maintainer to release an
updated version.
In such cases, it is important for "freeze" to pin the exact VCS
reference (commit-hash if available) that was installed, in order to create
reproducible builds with the highest possible fidelity.
Note about "editable" installs
------------------------------
The editable installation mode of pip roughly lets a user insert a
local directory in sys.path for development purpose. This mode is somewhat
abused to work around the fact that a non editable install from a VCS URL
loses trace of the origin after installation.
Indeed editable installs implicitly record the VCS origin in the checkout
directory, so the information can be recovered when running "freeze".
The use of this workaround, although useful, is fragile, creates confusion
about the purpose of the editable mode, and works only when the distribution
can be installed with setuptools (i.e. it is not usable with other PEP 517
build backends).
For the sake of clarity, it is important to note that this PEP is otherwise
unrelated to editable installs.
Rationale
=========
This PEP specifies a new ``direct_url.json`` metadata file in the .dist-info
directory of an installed distribution.
The fields specified are sufficient to reproduce the source archive and `VCS
URLs supported by pip`_. They are also sufficient to reproduce
`PEP440 Direct References`_, as well as `Pipfile and Pipfile.lock`_ entries.
Since at least the above 3 different way to encode the information exist,
this PEP uses a key-value format, to not make any assumption on how a direct
URL must ultimately be encoded in a requirement or lockfile. See also
the `Alternatives`_ section below for more discussion about this choice.
Information has been taken from Ruby's bundler manual to verify it has similar
capabilities and inform the selection and naming of fields in this
specifications.
The json format allows for the addition of additional fields in the future.
Specification
=============
This PEP specifies a ``direct_url.json`` file in the ``.dist-info`` directory
of an installed distribution.
This file MUST be created by installers when installing a distribution
from a requirement specifying a direct URL reference (including a VCS URL
in *non*-editable mode).
This file MUST NOT be created when installing a distribution from an other
type of requirement (i.e. name plus version specifier, or URL in editable mode).
This json MUST be a flat dictionary where all keys and values are of string type.
For the sake of forward compatibility, tools SHOULD ignore values which are
not of string type.
If present, it MUST contain at least one field with name ``url``.
``url`` MUST be stripped of any sensitive authentication information,
for security reasons. The user:password section of the URL MAY however
be composed of environment variables, matching the following regular
expression::
\$\{[A-Za-z0-9-_]\}:\$\{[A-Za-z0-9-_]\}
When ``url`` refers to a VCS repository:
- A ``vcs`` field MUST be present, containing the name of the VCS
(i.e. one of ``git``, ``hg``, ``bzr``, ``svn``).Other VCS SHOULD be registered by
amending this PEP.
- The ``url`` value MUST be compatible with the corresponding VCS,
so an installer can hand it off without transformation to a
checkout/download command of the VCS.
- A ``revision`` field MAY be present to reference the
branch/tag/ref/commit/revision (in a format compatible with the VCS) that
was requested for installation.
- A ``resolved_commit_id`` field MUST be present, containing the
exact commit/revision number that was installed.
If the VCS supports commit-hash
based revision identifiers, such commit-hash MUST be used as
``resolved_commit_id`` in order to reference the immutable
version of the source code that was installed.
When ``url`` refers to a source archive, a wheel, or a local directory:
- A ``hash`` field SHOULD be present, with value
``<hash-algorithm>=<expected-hash>``.
It is RECOMMENDED that only hashes which are unconditionally provided by
the latest version of the standard library's ``hashlib`` module be used for
source archive hashes. At time of writing, that list consists of 'md5',
'sha1', 'sha224', 'sha256', 'sha384', and 'sha512'.
.. note::
When the requested URL points to a local directory that happens to contain a
VCS checkout, installers MUST NOT attempt to infer any VCS information and
therefore MUST NOT output any vcs related information (such as ``vcs`` field)
in ``direct_url.json``.
A ``subdirectory`` field MAY be present containing a directory path,
relative to the root of the VCS repository, source archive or local directory,
to specify where ``pyproject.toml`` or ``setup.py`` is located.
.. note::
As a general rule, installers should as much as possible preserve the
information that was provided in the requested URL when generating
``direct_url.json``. For example user:password environment variables
should be preserved and ``revision`` should reflect the revision that was
provided in the requested URL as faithfully as possible. This information is
however *enriched* with more precise data, such as ``resolved_commit_id``.
Registered VCS
--------------
This section lists the registered VCS, along with precisions on how
to use the ``vcs``, ``revision`` and ``resolved_commit_id`` fields.
Tools MAY support other VCS although it is RECOMMENDED to register
them by amending this PEP. The ``vcs`` field SHOULD be the command name
(lowercased). Additional fields that would be necessary to
support such VCS SHOULD be prefixed with the VCS command name.
Git
+++
Home page
https://git-scm.com/
vcs command
git
vcs field
git
revision field
A tag name, branch name, git ref, commit hash, shortened commit hash.
resolved_commit_id field
A commit hash (40 hexadecimal characters sha1).
Mercurial
+++++++++
Home page
https://www.mercurial-scm.org/
vcs command
hg
vcs field
hg
revision field
A tag name, branch name, git ref, changeset ID, shortened changeset ID.
resolved_commit_id field
A changeset ID (40 hexadecimal characters).
Bazaar
++++++
Home page
https://bazaar.canonical.com/
vcs command
bzr
vcs field
bzr
revision field
A tag name, branch name, revision id.
resolved_commit_id field
A revision id.
Subversion
++++++++++
Home page
https://subversion.apache.org/
vcs command
svn
vcs field
svn
revision field
``revision`` must be compatible with ``svn checkout`` ``--revision`` option.
In Subversion, branch or tag is part of ``url``.
resolved_commit_id
Since Subversion does not support globally unique identifiers,
this field is the Subversion revision number in the corresponding
repository.
Examples
========
Example direct_url.json
-----------------------
Source archive:
.. code::
{
"url": "https://github.com/pypa/pip/archive/1.3.1.zip",
"hash": "sha256=2dc6b5a470a1bde68946f263f1af1515a2574a150a30d6ce02c6ff742fcc0db8"
}
Git URL with tag and commit-hash:
.. code::
{
"url": "https://github.com/pypa/pip.git",
"vcs": "git",
"revision": "1.3.1",
"resolved_commit_id": "7921be1537eac1e97bc40179a57f0349c2aee67d"
}
Example pip commands and their effect on direct_url.json
--------------------------------------------------------
Commands that generate a ``direct_url.json``:
* pip install https://example.com/app-1.0.tgz
* pip install https://example.com/app-1.0.whl
* pip install "git+https://example.com/repo/app.git#egg=app&subdirectory=setup"
* pip install ./app
* pip install file:///home/user/app
Commands that *do not* generate a ``direct_url.json``
* pip install app
* pip install app --no-index --find-links https://example.com/
* pip install --editable "git+https://example.com/repo/app.git#egg=app&subdirectory=setup"
* pip install -e ./app
Use cases
=========
"Freezing" an environment
Tools, such as ``pip freeze``, which generate requirements from the Database
of Installed Python Distributions SHOULD exploit ``direct_url.json``
if it is present, and give it priority over the Version metadata in order
to generate a higher fidelity output. In presence of a ``vcs`` direct URL,
The ``resolved_commit_id`` field SHOULD be used in priority in order to provide
the highest possible fidelity to the originally installed version. If
supported by their requirement format (such as `PEP440 Direct References`_),
tools are encouraged to output both ``revision``and ``resolved_commit_id``.
Tools MAY choose another approach, depending on the needs of their users.
Backwards Compatibility
=======================
Since this PEP specifies a new file in the ``.dist-info`` directory,
there are no backwards compatibility implications.
Alternatives
============
PEP426 source_url
-----------------
The now withdrawn PEP 426 specifies a ``source_url`` metadata entry.
It is also implemented in `distlib`_.
It was intended for a slightly different purpose, for use in sdists.
This format lacks support for the ``subdirectory`` option of pip requirement
URLs. The same limitation is present in PEP440 direct references.
It also lacks explicit support for `environment variables in the user:password
part of URLs`_.
The introduction of a key/value extensibility mechanism and support
for environment variables for user:password in PEP440, would be necessary
for use in this PEP.
revision vs ref
---------------
The ``revision`` key was retained over ``ref`` as it is a more generic term
across various VCS and ``ref`` has a specific meaning for ``git``.
References
==========
.. _`pip issue #609`: https://github.com/pypa/pip/issues/609
.. _`thread on discuss.python.org`: https://discuss.python.org/t/pip-freeze-vcs-urls-and-pep-517-feat-editable-installs/1473
.. _PEP440: http://www.python.org/dev/peps/pep-0440
.. _`VCS URLs supported by pip`: https://pip.pypa.io/en/stable/reference/pip_install/#vcs-support
.. _`PEP440 Direct References`: https://www.python.org/dev/peps/pep-0440/#direct-references
.. _`Pipfile and Pipfile.lock`: https://github.com/pypa/pipfile
.. _distlib: https://distlib.readthedocs.io
.. _`environment variables in the user:password part of URLs`: https://pip.pypa.io/en/stable/reference/pip_install/#id10
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: