PEP 665: Specifying Installation Requirements for Python Projects

I’m down sick, so I haven’t really kept up with the discussions in this thread – other than reading Brett’s most recent comment. It’ll probably be close to the end of the week that I can look at the rest of this discussion properly.

FWIW, this can also be written as (assuming TOML 1.0, which is a safe assumption IMO):

hashes.sha256 = "..."

So, it’s not even an added set of brackets, but just adding the hash name. I think that’s fine.

2 Likes

Conceptually you can’t do it without introducing another file (or CLI flag), since pyproject.toml is meant to be redistributed. Just because there’s a file here that can technically have some syntax typed into it, doesn’t mean that it fits the mold of what that file contains.

If we are going to support URLs that are binary blobs with no filename to parse, I think it would be more flexible to just have a filename field, and define it such that the filename field takes precedence over any filename found in the URL. That way if we add additional concepts to wheel naming (gpu, cpu features, whatever) this spec doesn’t need updated with every iteration.

I may be missing something, but I don’t see how this is true at all? Right now the spec produces a dict that ends up looking like:

{
    "mousebender": [
        {
            "needs": {"type" : "array", "value" : [
                {"type" : "string", "value" : "attrs>=19.3"},
                {"type" : "string", "value" : "packaging>=20.3"}
            ]},
            "code": [
                {
                    "hash-value": {"type" : "string", "value" : "c5953026378e5dcc7090596dfcbf73aa5a9786842357273b1df974ebd79bd760"},
                    "type": {"type" : "string", "value" : "sdist"},
                    "url": {"type" : "string", "value" : "https://files.pythonhosted.org/packages/35/bc/db77f8ca1ccf85f5c3324e4f62fc74bf6f6c098da11d7c30ef6d0f43e859/mousebender-2.0.0.tar.gz"},
                    "hash-algorithm": {"type" : "string", "value" : "sha256"}
                },
                {
                    "hash-value": {"type" : "string", "value" : "a6f9adfbd17bfb0e6bb5de9a27083e01dfb86ed9c3861e04143d9fd6db373f7c"},
                    "type": {"type" : "string", "value" : "wheel"},
                    "url": {"type" : "string", "value" : "https://files.pythonhosted.org/packages/f4/b3/f6fdbff6395e9b77b5619160180489410fb2f42f41272994353e7ecf5bdf/mousebender-2.0.0-py3-none-any.whl"},
                    "hash-algorithm": {"type" : "string", "value" : "sha256"}
                }
            ],
            "version": {"type" : "string", "value" : "2.0.0"}
        }
    ]
}

All you would need to do (assuming no other changes happen to the spec besides making it work file by file):

{
    "mousebender": [
        {
            "needs": {"type" : "array", "value" : [
                {"type" : "string", "value" : "attrs>=19.3"},
                {"type" : "string", "value" : "packaging>=20.3"}
            ]},
            "code": {
                "hash-value": {"type" : "string", "value" : "c5953026378e5dcc7090596dfcbf73aa5a9786842357273b1df974ebd79bd760"},
                "type": {"type" : "string", "value" : "sdist"},
                "url": {"type" : "string", "value" : "https://files.pythonhosted.org/packages/35/bc/db77f8ca1ccf85f5c3324e4f62fc74bf6f6c098da11d7c30ef6d0f43e859/mousebender-2.0.0.tar.gz"},
                "hash-algorithm": {"type" : "string", "value" : "sha256"}
            },
            "version": {"type" : "string", "value" : "2.0.0"}
        },
        {
            "needs": {"type" : "array", "value" : [
                {"type" : "string", "value" : "attrs>=19.3"},
                {"type" : "string", "value" : "packaging>=20.3"}
            ]},
            "code": {
                "hash-value": {"type" : "string", "value" : "a6f9adfbd17bfb0e6bb5de9a27083e01dfb86ed9c3861e04143d9fd6db373f7c"},
                "type": {"type" : "string", "value" : "wheel"},
                "url": {"type" : "string", "value" : "https://files.pythonhosted.org/packages/f4/b3/f6fdbff6395e9b77b5619160180489410fb2f42f41272994353e7ecf5bdf/mousebender-2.0.0-py3-none-any.whl"},
                "hash-algorithm": {"type" : "string", "value" : "sha256"}
            },
        }
    ]
}

Each particular project already supports having multiple entries, there should be no reason that we can’t limit each entry to a single file, and just add an entry for each file. This actually mimics how any Python installer that correctly handles already has to think about dependencies since every other part of the ecosystem (besides PyPI’s JSON API and Web UI, but no installer should be using either of those), mainly the PEP 503 API, presents a list of files, and you have to get the metadata is attached to each of those files (currently fetched primarily by downloading the file itself, soon by downloading a metadata file, hopefully some day just baked in as part of the API).

If a tool interacts with Python’s ecosystem and doesn’t already treat the metadata attached to a file instead of to a version, then that tool either breaks with certain completely valid wheels or they’ve gone out of their way to compensate in some other fashion.

It’s very weird to me that this PEP chooses to buck the established pattern that metadata is associated with a file, which seems to me like it can do nothing positive but somewhat minimize the size of the file (and if w’re that worried about that, there’s a lot of other places we should look a trimming first before we start sacrificing correctness) and has the downside that there are almost certainly going to be edge cases and wheels out there that will just have buggy behavior because of it.

I picked replacing requirements.txt replacement, because I think that covers both use cases perfectly fine (and I think my initial set of suggests almost covers it). Much like how pip freeze and pip-tools builds on top of a more fully featured format to implement a lock file, I think the same would be true here. Given the power to replace requirements.txt, you also have the power to use it to implement a lock file, you just have to be more careful with either the locker or the installer (or both) to ensure that you’re using the features that give you the desired end result. I like things that are general enough to be cleanly usable for multiple use cases.

I would also be OK with just a traditional lock file, I think in that case there’s still room for a new requirements.txt format, but it’s possible for that new format to be pip specific (just as requirements.txt is intended to be now), and pip could just “lock” or “compile” that down to the hypothetical traditional lock file.

I’m just not a fan of the current middle ground approach which feels like “lock file, but with whatever features needed to also implement the thing poetry/pdm calls a lock file”, mostly because it feels like those features mean that implementors have to pay most or all of the cost of the first thing, without getting all of the benefits of that thing.

1 Like

Oh nice, that’s actually even more readable (IMO) both on it’s own and in diffs then the initial example since it puts all the relevant information on a single line :smiley:

If we’re worried about the 3 characters of extra length we could even shorten it slightly to:


hash.sha256 = "..."

Which only has a single extra character (for the sha256 case at least, which is going to be the most common case for a while, maybe always!).

1 Like

You could chop that back down to parity, too, by going with hash.shaFF instead. :wink:

I’ve answered this poll, but in reality neither choice is particularly important to me. And I’d even be perfectly OK with the current “both”. What I want is not related to what use cases are supported, but rather how the PEP expresses what tools must do in order to be able to claim they support the PEP.

For producers, there’s no requirements at all that I can see, other than producing a syntactically valid file. It’s fine to (for example) produce the following file:

[metadata]
needs = ["mousebender<2.0.0"]

[[package.mousebender]]
version = "2.0.0"

Note that this file isn’t consistent, as it doesn’t provide a package version that meets the top level requirement. And yet a locker that produced this would be considered perfectly valid. And there’s nothing requiring a locker to actually lock anything - as far as I can see, it could just dump out the input requirements and a suitable subset of PyPI into a “lockfile” and claim to be done.

For consumers, it’s the same. There’s no statement on whether a consumer has to check hashes, or even has to ensure that what gets installed actually satisfies the requirements in needs.

In fact, one of the two explicit requirements on installers is that they “MUST error out if they encounter something they are unable to handle”. Which in effect means that it’s legitimate for an installer to claim to “support PEP 665” and to not implement large chunks of it (as long as they error out if they encounter those chunks). But it’s not even clear what “unable to handle” means. If there’s a hash in a lockfile, is an installer required to either check the hash or error out because they “can’t handle” hashes? If an installer only supports wheels, is it required to error out if there’s a sdist in the lockfile? Does that change if there’s also a valid wheel specified?

Individual lockers and installers can make choices for all these unanswered questions, of course. But a stated goal of the PEP is to allow developers to choose any locker they want, and consumers to choose any installer they want. And as soon as we allow implementation defined choices, we risk incompatibilities.

I feel like I’m just repeating myself at this point, I don’t think I’m saying anything new any more. But I still feel like people haven’t understood what my concern is here. I don’t know if this post makes things any clearer, but I’m going to stop after this attempt, unless people ask me specific questions - if I haven’t made myself clear by now, I likely never will :slightly_frowning_face:

2 Likes

So if I do

[[package.six.code]]
type="wheel"
url = "..."
hashes.sha256 = "8abb2f1d86890a2dfb989f9a77cfcfd3e47c2a354b01111771326f8aa26e0254"
hashes.md5 = "529d7fd7e14612ccde86417b4402d6f3"

This will automatically be turned into

{
  "type": "wheel",
  "url": "...",
  "hashes": [
    {"sha256": "8abb2f1d86890a2dfb989f9a77cfcfd3e47c2a354b01111771326f8aa26e0254"},
    {"md5": "529d7fd7e14612ccde86417b4402d6f3"},
  ],
}

?

If that the case I think I’m OK to the change if we mandate the TOML format and ordering of keys (by algorithm name) to reduce diff.

1 Like

No, that’ll turn into a dictionary with two keys, sha256 and md5. See “dotted keys” in TOML: English v1.0.0

I don’t see why we’d want this to be a list of dicts if it’s per-asset, which is what @dstufft’s example used.

1 Like

Surely we should be defining the data structure and that it’s serialised using TOML, but not mandating any particular layout of the TOML output?

If we start defining the layout, tools won’t be able to use standard TOML libraries, and using the format becomes an order of magnitude harder. That’s not a cost that I’d consider to be worth paying just to get cleaner diffs…

2 Likes

There’s no way any of the currently available TOML libraries for Python serialise this using the dotted key syntax anyway. Or if people are expecting that they’ll produce “diffable” output, they might be in for a surprise.

1 Like

A reminder that there are 12 hours left to fill in the poll in PEP 665: Specifying Installation Requirements for Python Projects - #140 by brettcannon.

1 Like

For in Nixpkgs it is crucial we get for the supported platforms the locations, filenames and hashes of the artifacts. Poetry works great with Nix. Even though Poetry keeps it open exactly which artifact you would use (e.g. when there are multiple wheels) the consumer of the lock file, in our case poetry2nix, implements defines additional rules for selecting artifacts. While locking exact artifacts would be great, that realistically only works with source builds if you want to support multiple platforms. Since I think we all agree we want to be able to support binary wheels as well, to me the only realistic solution is the method Poetry uses.

2 Likes

Yup yup. It would still be a good idea to make recommendations on what we know would be easier to read/inspect.

2 Likes

Just a quick update: I have sent a new draft of the PEP to my co-authors and select folks for feedback. It’s practically a rewrite based on the all feedback we received here. Once it’s ready for public consumption I will start a new topic and cross-link from here.

3 Likes

I have posted our latest draft at PEP 665, take 2 -- A file format to list Python dependencies for reproducibility of an application.