PEP 631 - Dependency specification in pyproject.toml based on PEP 508

ofek · August 20, 2020, 7:01pm

Response to: PEP 621: how to specify dependencies?

This is the dependency option based on PEP 508: https://github.com/python/peps/pull/1571

PEP: 999
Title: Dependency specification in pyproject.toml based on PEP 508
Author: Ofek Lev <ofekmeister@gmail.com>
Sponsor: Paul Ganssle <paul@ganssle.io>
Discussions-To: https://discuss.python.org/t/pep-999-dependency-specification-in-pyproject-toml-based-on-pep-508/5018
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 20-Aug-2020
Post-History: 20-Aug-2020


Abstract
========

This PEP specifies how to write a project's dependencies in a
``pyproject.toml`` file for packaging-related tools to consume
using the `fields defined in PEP 621`_.

Entries
=======

All dependency entries MUST be valid `PEP 508 strings`_.

Build backends SHOULD abort at load time for any parsing errors.

::

    from packaging.requirements import InvalidRequirement, Requirement

    ...

    try:
        Requirement(entry)
    except InvalidRequirement:
        # exit

Specification
=============

dependencies
------------

- Format: array of strings
- Related core metadata:

  - `Requires-Dist`_

Every element must be an `entry <#entries>`_.

::

    [project]
    dependencies = [
      'PyYAML ~= 5.0',
      'requests[security] < 3',
      'subprocess32; python_version < "3.2"',
    ]

optional-dependencies
---------------------

- Format: table
- Related core metadata:

  - `Provides-Extra`_
  - `Requires-Dist`_

Each key is the name of the provided option, with each value being the same type as
the `dependencies <#dependencies>`_ field i.e. an array of strings.

::

    [project.optional-dependencies]
    tests = [
      'coverage>=5.0.3',
      'pytest',
      'pytest-benchmark[histogram]>=3.2.1',
    ]

Example
=======

This is a real-world example port of what `docker-compose`_ defines.

::

    [project]
    dependencies = [
      'cached-property >= 1.2.0, < 2',
      'distro >= 1.5.0, < 2',
      'docker[ssh] >= 4.2.2, < 5',
      'dockerpty >= 0.4.1, < 1',
      'docopt >= 0.6.1, < 1',
      'jsonschema >= 2.5.1, < 4',
      'PyYAML >= 3.10, < 6',
      'python-dotenv >= 0.13.0, < 1',
      'requests >= 2.20.0, < 3',
      'texttable >= 0.9.0, < 2',
      'websocket-client >= 0.32.0, < 1',

      # Conditional
      'backports.shutil_get_terminal_size == 1.0.0; python_version < "3.3"',
      'backports.ssl_match_hostname >= 3.5, < 4; python_version < "3.5"',
      'colorama >= 0.4, < 1; sys_platform == "win32"',
      'enum34 >= 1.0.4, < 2; python_version < "3.4"',
      'ipaddress >= 1.0.16, < 2; python_version < "3.3"',
      'subprocess32 >= 3.5.4, < 4; python_version < "3.2"',
    ]

    [project.optional-dependencies]
    socks = [ 'PySocks >= 1.5.6, != 1.5.7, < 2' ]
    tests = [
      'ddt >= 1.2.2, < 2',
      'pytest < 6',
      'mock >= 1.0.1, < 4; python_version < "3.4"',
    ]

Copyright
=========

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.


.. _fields defined in PEP 621: https://www.python.org/dev/peps/pep-0621/#dependencies-optional-dependencies
.. _PEP 508 strings: https://www.python.org/dev/peps/pep-0508/
.. _Requires-Dist: https://packaging.python.org/specifications/core-metadata/#requires-dist-multiple-use
.. _Provides-Extra: https://packaging.python.org/specifications/core-metadata/#provides-extra-multiple-use
.. _docker-compose: https://github.com/docker/compose/blob/789bfb0e8b2e61f15f423d371508b698c64b057f/setup.py#L28-L61

ofek · August 20, 2020, 8:57pm

Regarding: https://github.com/python/peps/pull/1571#issuecomment-677885083

Would anyone here be willing to sponsor this PEP?

brettcannon · August 20, 2020, 10:04pm

I would change the title as there will be another PEP for the exploded format which could legitimately have the same title as well

ofek · August 20, 2020, 10:08pm

Sure thing!

bernatgabor · August 20, 2020, 10:37pm

I think @pganssle seemed agreeable to this choice, so maybe he can help you on this? It’s a +1 from me.

steve.dower · August 20, 2020, 11:59pm

Not opposed to Paul helping out here if he wants to, but wanted to call out that a sponsor doesn’t have to be in favour of the proposal (that would be a champion). The sponsor is just to make sure the quality and process are followed well enough that the steering council isn’t forced to allocate a delegate for any half-written document from any random person.

It’s more of a mentor role than a coauthor.

ofek · August 27, 2020, 12:24am

What is the usual path toward finding a sponsor?

pganssle · August 27, 2020, 3:15pm

Sorry, I thought this was assigned a PEP number already and that that implied you had a sponsor. I’m happy to be the sponsor.

Not sure what the “usual path” is. I would guess that the normal path is to propose an idea somewhere like python-ideas and if it gets traction you ask for a sponsor in the thread and (failing that), e-mail the core-mentorship mailing list. Obviously in this case that’s all unneccessary.

You can send me an e-mail (paul at <my-last-name> dot io) if you have any questions about the process that PEP 1 doesn’t cover. You can add me as a sponsor in your PR. (I also sponsored PEP 609 in case you want to crib the Sponsored-By header from there).

ofek · August 27, 2020, 3:23pm

Thank you! I just added you to the PEP.

pganssle · August 27, 2020, 6:56pm

Regarding the substance of the PEP, I notice that you have added the option to specify a file by using an inline table, like so:

[project]
dependencies = { file = 'requirements.txt' }

I think that this is not a good idea, for a few reasons:

I think it encourages an anti-pattern (single-sourcing your dependencies with a requirements.txt file).
It means that the static metadata is not all contained within the pyproject.toml file — now you need to know that not only is the pyproject.toml file present, but also the requirements.txt file is, and you have to read both of them to get at the full metadata.
Even if it were desirable to have this feature, I’m not sure I like the false dichotomy between “source everything from a .txt file” and “write everything out in the pyproject.toml file”.

I think we can at least leave this out of the first version of the spec — though maybe for backwards compatibility reasons we would want to specify that a parser SHOULD expect the possibility that either dependencies or an individual dependency might at some point allow a table? (That really only applies to parsers other than backends, since for a backend you can use build-system.requires to declare a minimum version on your backend).

ofek · August 27, 2020, 8:09pm

You’re the sponsor so I’ll change the PEP to whatever you think is best, but I feel very strongly that this is the most desirable approach.

It makes the transition to utilizing PEP 621 as seamless as possible. Often, CI and testing scripts use the dependency file in multiple places with various tools. Allowing a separate file avoids a refactor that is large, or impossible if using outdated tools.
Many organizations (such as my employer Datadog) dynamically define dependencies at wheel build-time, particularly for internal packages. This could be based on feature flags, a dependency resolver step, etc.
Dependency vulnerability scanners like Snyk, PyUp, and GitHub’s Dependabot will likely not support the new standard for a while, and the use of such tools is often required for compliance reasons.
It makes it easier for tools to support the use case of apps rather than libraries since you almost always keep dependencies in a separate file at that point.

Now to respond directly:

Fair point, but that is what most packages do already e.g. how setup.py reads a requirements.txt.
The logic of PEP 621 will certainly be wrapped up in a library, so no one will actually be implementing it themselves.
The readme field already reads from a file in the exact same way: https://www.python.org/dev/peps/pep-0621/#readme

pganssle · August 27, 2020, 8:40pm

Just to clarify, my role as Sponsor has nothing to do with the contents of the PEP, and I don’t have to endorse it or like it at all. Sponsors only exist for procedural reasons — to ensure that PEPs meet a minimum quality threshold and to ensure that there’s a core developer to help with the process. Please do not defer to me on that basis.

With regards to your merits:

It makes the transition to utilizing PEP 621 as seamless as possible. Often, CI and testing scripts use the dependency file in multiple places with various tools. Allowing a separate file avoids a refactor that is large, or impossible if using outdated tools.

I don’t think this is right. setuptools does not allow you to use requirements.txt in a setup.cfg file, and there’s a very clear distinction drawn between install_requires and requirements.txt. The transition will already be seamless, especially because you can always just add dynamic = ['dependency'] and specify your dependencies in setup.py or something else that allows a requirements.txt file.

I personally think that this is a workflow we don’t want to be easy, because “it’s hard to achieve this” is a good signal that you shouldn’t do it. I made a similar agument when I argued against including this sort of thing in setuptools.

If that’s the case, the field should be explicitly declared as dynamic. That you could use this field to circumvent the “everything specified here is static” requirement is actually a very strong reason to not include this feature.

They could already support it today if they were to use PEP 517 build hooks. That said, what goes in dependencies is not what goes in requirements.txt. They are two different things. If you are using dependabot or something, I would expect you to have an install_requires with loose pins and use something like pip-compile to generate requirements.txt (which dependabot can bump). This is another case where “this is an anti-pattern” and we want to make it harder, not easier.

(Response to the 4th point is basically the same).

This is a good point. I still think it’s a bit of a necessary evil rather than something that should proliferate.

Edit: Forgot this point:

I’m not sure that this is actually true (though it could be), but I was thinking of even simpler applications that are just doing something like “parse this pyproject.toml and see its dependencies”. It makes it much simpler if they don’t need to go through an additional layer of indirection (particularly if people are just collecting pyproject.toml files for analysis).

I don’t think it’s fair to say that complexity doesn’t matter because we’ll abstract it away with a library anyway.

pf_moore · August 27, 2020, 8:47pm

I agree strongly with @pganssle here.

In addition to all the points he makes, I’d also say that dependency specification for pyproject.toml is already extremely divisive. Adding extra features at this stage is likely to just split support even more, and increase the risk of not being able to get any sort of consensus (and consequently getting the PEP rejected).

ofek · August 27, 2020, 8:58pm

Alrighty then, I removed it. Hopefully we can add it back in the future

pf_moore · August 27, 2020, 9:50pm

Personally, I hope we never do I can’t think of a reason I’d support this option - see this for why being able to use your requirements.txt to specify your install_requires dependencies is a misfeature.

ofek · August 27, 2020, 10:19pm

Yes, I also link Donald’s post to people often

There are however valid use cases for storing install_requires in a simple text file. Just because it’s called requirements.txt does not mean it’s used as an app’s concrete dependencies. The vast majority of requirements.txt that libraries define are in fact intended to be and treated as requirements.in and I’m perplexed as to why we don’t acknowledge this more.

pganssle · August 27, 2020, 10:43pm

I think it’s still an open question as to whether there are valid use cases and how to design it so this sort of thing doesn’t do more harm than good. I think that most people who have a reasonable use case and understand this distinction could be happy with any number of minor inconveniences, like reading the file as part of setup.py or using a dynamic dependency and a backend that supports loading dependencies from files.

Even if you want to single-source your install_requires from pyproject.toml, it’s actually not terribly difficult to create a “requirements.in” from a PEP 621 pyproject.toml (at least not in the current format):

import toml

with open("pyproject.toml", "rt") as f:
    deps = toml.load(f).get("project", {}).get("dependencies", [])

with open("requirements.in", "wt") as f:
    f.writelines(deps)

Seems like it would be kinda easy for pip-compile or a simple script in the local directory to generate a temporary requirements.in from your install_requires. Again, this is not a terribly huge barrier for a project to overcome for the rare cases that actually would want this functionality.

In any case, I think we’ve settled that it won’t go in the initial version, but to leave the door open for this and other potential enhancements a bit, I’m thinking we may want to explicitly say that future versions of dependencies may either be a table or a list containing one or more tables, and that consumers of PEP 631-dependencies should choose an appropriate behavior (e.g. throwing an exception if the information is required and warning or considering the information unreliable otherwise).

I’m not sure if we’re still designing the spec to be implemented by anyone other than backends — if not, this kind of warning is less important (since you’d only use new features if they are supported by your backend anyway).

ofek · August 27, 2020, 11:12pm

I think that is even more than what I speak of. My notion was simply storing a library’s dependencies in a file. Not to eventually be resolved by some tool into a tracked file, etc., literally just the concept of storing install_requires in its own file like many libraries do today.

ofek · August 28, 2020, 4:23am

Okay, here is just one use case.

All of the official integrations that the Datadog Agent ships are stored side by side in a monorepo and every integration’s dependencies are stored in a requirements.in. At Agent build-time, the dependencies are resolved and installed.

Our workflow makes heavy use of those files:

Our tooling ensures that all version/marker combinations of dependencies are the same. So for example if the tls integration pins cryptography to 2.8, then so must every other integration or else the CI fails.
When we modify a dependency, we do so en masse with a single command rather than manually making the same change to X number of files.
All integrations define an extras called deps that reflects what is in that file. Users can then install one of our hosted integration wheels for situations when an integration (or its dependencies) is not shipped by default or desire an upgrade outside of the Agent release cycle.
Dependency vulnerability scanners use these files.

After I finish Hatch v1, I was planning on building all integrations with it (introducing pyproject.toml) and removing all setup.py and MANIFEST.in files. However, it’s not so easy without the ability to use a standard requirements file.

The options are:

Implement dependencies / optional-dependencies in Hatch only for this feature, which is a real bummer because not only is this a standard thing many projects do, but the dependencies are very much static and not dynamic in the PEP 621 sense.
Put everything in pyproject.toml and still keep the requirements file (generated in this case from pyproject.toml) for the dependency vulnerability scanners. Additionally, I’ll need to be allowed work time to change all the tooling. In the case of the mass dependency pinning command, I’ll need to use a style-preserving TOML library like tomlkit (which is buggy) or write a custom parser, essentially treating the file as not-TOML. This would not get past code review, nor would I want it to.
Do nothing and continue using setuptools.

Among those options, I’d choose 3. It’s not the end of the world, but quite discouraging.

Many foundational libraries (again, not apps) do this way, like lxml, the official Kubernetes client, etc.

Also, it’s quite common to define multiple requirements files and read them in as extras such as for test dependencies.

pf_moore · August 28, 2020, 7:39am

It’s intended as a supplement to PEP 621, so it would be used by the intended audience of that PEP, which includes people writing scripts/tools to introspect source trees. So definitely not just backends.