I’ve just submitted PEP 722: Dependency specification for single-file scripts by pfmoore · Pull Request #3210 · python/peps · GitHub, to specify a format for including dependency specifications in single-file Python scripts.
This is basically a formalisation of (a subset of) the syntax already supported in pip-run
and pipx
(the pipx
code isn’t released yet), so it’s available in existing tools. By having it in a formal spec, hopefully other tools can rely on and work with this data as well.
See the PEP for details, but it’s intended to be for single-file scripts only, and is not intended in any way to replace or compete with project metadata stored in pyproject.toml
. Basically, if you have a pyproject.toml
, or can create one, you shouldn’t need this.
The rendered version is available here
Edit: The above was a preview link. The PEP is now published and is available from the normal PEP location
(The details below are for the original version, and are now out of date).
End of edit
Full PEP text
PEP: 722
Title: Dependency specification for single-file scripts
Author: Paul Moore <p.f.moore@gmail.com>
PEP-Delegate: TBD
Discussions-To: https://discuss.python.org/t/29905
Status: Draft
Type: Standards Track
Topic: Packaging
Content-Type: text/x-rst
Created: 19-Jul-2023
Post-History: `19-Jul-2023 <https://discuss.python.org/t/29905>`__
Abstract
========
This PEP specifies a format for including 3rd-party dependencies in a
single-file Python script.
Motivation
==========
Nearly every non-trivial Python project will depend on one or more 3rd party
libraries. These dependencies are typically recorded in the project metadata, in
the file ``pyproject.toml``. This approach serves well for any code that is large
enough to be stored in its own project directory, but it does not work well for
single-file Python scripts (which are often kept in some sort of shared
directory). This PEP offers a solution for that use case, by storing the
dependencies in the script itself.
This proposal is (a subset of) behaviour that already exists in the ``pipx`` and
``pip-run`` tools, so it is simply formalising existing behaviour, rather than
defining a brand new capability.
Rationale
=========
In general, the Python packaging ecosystem is focused on "projects", which are
structured around a directory containing your code, some metadata and any
supporting files such as tests, automation, etc. However, beginners (and quite a
few more experienced developers) will often *start* a project by just opening up
an editor and writing a simple script. In some cases, the project might outgrow
the "simple script" phase and get restructured into a more conventional project
directory, but not all do, and even those that do still need to be usable in the
"simple script" phase.
These days, the idea that "simple scripts can just use the stdlib" is becoming
less and less practical, as the quantity, quality and usability of libraries on
PyPI steadily increases. For example, using the stdandard ``urllib`` library
rather than something like ``requests`` or ``httpx`` makes correct code
significantly harder to write.
Having to consider "uses 3rd party libraries" as the break point for moving to a
"full scale project" is impractical, so this PEP is designed to allow a project
to use external libraries while still remaining as a simple, standalone script.
Of course, *declaring* your dependencies isn't sufficient by itself. You need to
install them (probably in some sort of virtual environment) so that they are
available when the script runs. This PEP does not cover environment management,
as tools like `pip-run <https://pypi.org/project/pip-run/>`__ and `pipx
<https://pypi.org/project/pipx/>`__` already offer that ability. But by
standardising the means of declaring dependencies, this PEP allows scripts to
remain tool-independent.
Specification
=============
Any Python script may contain a *dependency block*, which is a specially
structured comment block. The format of the dependency block is as follows:
* A single comment line containing the (case sensitive) text "Requirements:"
* A series of comment lines containing :pep:`508` requirement specifiers.
* An empty comment or blank line.
To be recognised as a "comment line", the line must start with a ``#`` symbol.
Leading and trailing whitespace on the comment line is ignored.
The dependency block may occur anywhere in the file. There MUST only be a single
dependency block in the file - tools consuming dependency data MAY simply
process the first dependency block found. This avoids the need for tools to
process more data than is necessary. Stricter tools MAY, however, fail with an
error if multiple dependency blocks are present.
Example
-------
The following is an example of a script with an embedded dependency block::
#!/usr/bin/env python
# In order to run, this script needs the following 3rd party libraries
#
# Requirements:
# requests
# rich
import requests
from rich.pretty import pprint
resp = requests.get("https://peps.python.org/api/peps.json")
data = resp.json()
pprint([(k, v["title"]) for k, v in data.items()][:10])
Backwards Compatibility
=======================
As the dependency data is recorded in the form of a structured comment, this is
compatible with any existing code.
Security Implications
=====================
If a script containing a dependency block is run using a tool that automatically
installs dependencies, this could cause arbitrary code to be downloaded and
installed in the user's environment. This is only possible if an untrusted
script is run, and such a script can already cause arbitrary damage, so no new
risk is introduced by this PEP.
How to Teach This
=================
The format is simple, and should be understandable by anyone who can write
Python scripts. In order to add dependencies, a user needs to
1. Understand how to specify a dependency - they should already have encountered
the format when installing their dependencies manually using a tool like pip.
2. Use a tool that recognises and processes dependency blocks. This PEP does not
cover teaching users about such tools. It is assumed that if they are
popular, users will find out about them as with any other library or tool.
Note that the core Python interpreter does *not* interpret dependency blocks.
This may be a point of confusion for beginners, who try to run ``python
some_script.py`` and do not understand why it fails. It is considered the
responsibility of the person sharing the script to include clear instructions on
how to run it.
Reference Implementation
========================
This format is already supported `in pipx <https://github.com/pypa/pipx/pull/916>`__
and in `pip-run <https://pypi.org/project/pip-run/>`__.
Rejected Ideas
==============
Why not include other metadata?
-------------------------------
There is no obvious use case for other metadata, and if a project *does* need to
specify anything more than some 3rd party dependencies, it has probably reached
the point where it should be structured as a full-fledged project with a
``pyproject.toml`` file.
What about version?
-------------------
The one obvious exception is a script version number. The use cases for a
version are, however, very different from those for dependencies, and it seems
more reasonable to keep the two separate. There are already existing conventions
for keeping a version number in a script (a ``__version__`` variable is a common
approach) and these seem perfectly adequate.
Why not make the dependencies visible at runtime?
-------------------------------------------------
This would typically involve storing the dependencies as a (runtime) list
variable with a conventional name, such as::
__requires__ = [
"requests",
"click",
]
This has a number of problems compared to the proposed solution.
1. The consumer has to parse arbitrary Python code, which almost certainly means
using the stdlib AST module, making it much harder for non-Python code to
read the data, as well as making Python code that does so significantly more
complex.
2. Python syntax changes every version. While the requirement data only uses a
simple subset, the full file still needs to be parsed to *find* the
requirement data.
3. This would reserve a specific global name (``__requires__``) in the above,
potentially clashing with user code.
4. Users could assume that the value can be manipulated at runtime, and would
get unexpected results if they tried to do so.
Furthermore, there is no known use case where being able to read the list of
requirements at runtime is needed.
It is worth noting, though, that the ``pip-run`` utility does implement (an
extended form of) this approach. See `here <pip-run issue_>`_ for further
discussion.
Should scripts be able to specify a package index?
--------------------------------------------------
The pip requirements file format allows a lot more flexibility than a simple
list of requirements - it allows pip options, including specification of
non-standard indexes. The requirements format is not standardised, though, and
never will be in its current form, as it includes a lot of pip-specific
functionality.
This proposal deliberately does not try to replicate the full feature set of a
requirements file. It would be possible to implement "some" features, for
example being able to add extra index locations. However, it is difficult to
know where to draw the line, and not all consumers of this data may be passing
the dependencies to pip (for example, a script vulnerability scanner).
If a script needs the full requirements file capabilities, it can be shipped
with an accompanying requirements file. While this means the code can no longer
be shipped as a single file, it has probably reached a point of complexity where
"having everything in a single file" is no longer an appropriate goal anyway.
There is more discussion of this point in `the previously mentioned pip-run
issue <pip-run issue_>`_.
What about local dependencies?
------------------------------
:pep:`508` does not allow local directories or files as dependecy specifiers.
This is deliberate, as such forms are not portable, and the reasoning applies
equally to single file Python scripts that are being shared.
For purely local use, however, it *is* possible that a script might want to
depend on a local library. While this specification does not allow this, it is
not unreasonable for tools to loosen the specification to "anything that can be
passed to pip as a requirement". In a practical sense, this is easier for tools
to implement, as they can simply pass the requirements to pip and let pip do the
validation.
To be compliant to this standard (and hence tool-independent) only :pep:`508`
requirements may be used, though. A standard cannot reasonably defer part of its
specification to an implementation-defined rule, like "whatever pip supports".
Why not use a more standard data format (e.g., TOML)?
-----------------------------------------------------
Simplicity. There is nothing in a list of requirements that can't be expressed
in the form of plain text, with one requirement per line. Using a more capable
format adds complexity in parsing and a higher learning curve for users, with no
gain. There are no obvious future enhancements to this format which might need a
more complex format - as has already been noted, once a project gets complex,
the next step is to transition to a ``pyproject.toml`` based structure, *not* to
try to push the bounds of the single script format any further.
Open Issues
===========
None at this point.
Footnotes
=========
.. _pip-run issue: https://github.com/jaraco/pip-run/issues/44
Copyright
=========
This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.