PEP 665: Specifying Installation Requirements for Python Projects

This is sort of what I feel, except I think it either doesn’t solve enough problems at once, or it solves too many problems at once :smiley:. I think we need to either trim it down or we need to make it more general. The middle ground it’s in feels like it has too much power for “lock file” case, but not enough power for the other cases.

1 Like

You can be if you want to be :slightly_smiling_face:

1 Like

One quick note. Both PEP 503 and PEP 508 direct URL currently only allow exposing one hash algorithm/value pair per URL, so even if we make this a table, it would only contain one single element unless the lock file is built from non-standard sources. I’m OK changing that in a new revision of the file format in the future, but it is unnecessary to do it at this time IMO.

1 Like

PEP 503 only supports a single hash because it wasn’t really designed and it developed organically, or rather the simple API did, and PEP 503 attempted to just document the status quo rather than make any drastic changes. When we did the migration from MD5 to SHA256, we did it by making any client that couldn’t understand the SHA256 hash simply not use any hash at all. That was “OK” at the time since, if my memory serves, all of those clients weren’t using TLS at all anyways.’

If we were designing it from scratch, it would be silly not to include a mechanism for gracefully migrating hashes (and certainly any sort of new repository API would have that, and if we ever do have to migrate hashes again, we will shoe horn that into PEP 503).

Here’s the example from the PEP, rewritten to use what I would propose.

version = 1

[tool]
# Tool-specific table ala PEP 518's `[tool]` table.

[metadata]
marker = "python_version>='3.6'"

needs = ["mousebender"]

[[package.attrs]]
version = "21.2.0"
needed-by = ["mousebender"]

[[package.attrs.code]]
type = "wheel"
url = "https://files.pythonhosted.org/packages/20/a9/ba6f1cd1a1517ff022b35acd6a7e4246371dfab08b8e42b829b6d07913cc/attrs-21.2.0-py2.py3-none-any.whl"
hashes = {sha256 = "149e90d6d8ac20db7a955ad60cf0e6881a3f20d37096140088356da6c716b0b1"}


[[package.mousebender]]
version = "2.0.0"
needs = ["attrs>=19.3", "packaging>=20.3"]

[[package.mousebender.code]]
type = "sdist"
url = "https://files.pythonhosted.org/packages/35/bc/db77f8ca1ccf85f5c3324e4f62fc74bf6f6c098da11d7c30ef6d0f43e859/mousebender-2.0.0.tar.gz"
hashes = {sha256 = "c5953026378e5dcc7090596dfcbf73aa5a9786842357273b1df974ebd79bd760"}



[[package.mousebender.code]]
type = "wheel"
url = "https://files.pythonhosted.org/packages/f4/b3/f6fdbff6395e9b77b5619160180489410fb2f42f41272994353e7ecf5bdf/mousebender-2.0.0-py3-none-any.whl"
hashes = {sha256 = "a6f9adfbd17bfb0e6bb5de9a27083e01dfb86ed9c3861e04143d9fd6db373f7c"}

[[package.packaging]]
version = "20.9"
needs = ["pyparsing>=2.0.2"]
needed-by = ["mousebender"]

[[package.packaging.code]]
type = "git"
url = "https://github.com/pypa/packaging.git"
commit = "53fd698b1620aca027324001bf53c8ffda0c17d1"

[[package.pyparsing]]
version = "2.4.7"
needed-by = ["packaging"]

[[package.pyparsing.code]]
type="wheel"
url = "https://files.pythonhosted.org/packages/8a/bb/488841f56197b13700afd5658fc279a2025a39e22449b7cf29864669b15d/pyparsing-2.4.7-py2.py3-none-any.whl"
hashes = {sha256 = "ef9d7589ef3c200abe66653d3f1ab1033c3c419ae9b9bdb1240a85b024efc88b"}
interpreter-tag = "py2.py3"
abi-tag = "none"
platform-tag = "any"

It seems the PEP is taking the stance that we’re not going to need to do that ever. So if we never need to use this functionality, it is almost exactly equal to the original example in terms of readability. The only real difference is it moves everything onto one line and adds a set of curly braces.

2 Likes

But what are those capabilities? That’s an undefined thing here since I don’t know what your installer does (not) support. It seems you’re asking for the PEP to define what potential capabilities an installer would need to have and then list those required capabilities somehow. You don’t have this with requirement files today either (but maybe you wish you had this?).

That’s fine and between you and your locker.

Sure, but users are not generally doing any of that day-to-day.

Right, so I still don’t know how to take that. :sweat_smile: Is another tool encroaching on pip’s territory and gaining traction a good or bad thing?

I wouldn’t read into the resolution behaviour too much; assume a proper resolver.

That’s between you and your locker just like it’s between you and pip today.

I’m not specifically, but it also requires innovating even more to support it as I don’t know any tooling that directly supports it short of creating a “lock” file that is very much tied to your platform and the exact files you installed (which is fine and possible with this PEP).

Not sure, but I don’t see any reason not to support this use-case just because pip doesn’t support it. In the end it’s bits off the wire, so providing out-of-band info so those bits can be interpretting appropriately doesn’t seem like a bad thing.


OK, I’m going to flat-out ask: what do people want here?

There’s the “give me requirements files” which still requires running a resolver and doesn’t really “lock” in a traditional sense, but it does restrict what is considered at installation time and allows for the potential cross-platform dependencies files that PDM and Poetry have found successful.

Then there’s the “give me requirements files, but w/o needing a resolver” which basically means a traditional lock file which doesn’t require a resolver (at most marker resolution), but which inherently means the lock file is platform-specific; what pip freeze/pip-tools found successful.

We tried to come up with something that services both needs as you can view the more flexible PDM/Poetry solution having a stricter subset to cover the pip freeze/pip-tools solution. To me, it seems to have failed based on the reaction we are getting (at least in its current form).

I have only one full rewrite left in me on this PEP (unless @pradyunsg and @uranusjr have more :grinning_face_with_smiling_eyes:), so I am now asking all of you to vote on what you want. I can then have a think on the topic and make a file format proposal we can iterate on and then based on the agreed-upon format, update the PEP.

  • Traditional lock file (i.e. no resolver necessary; pip freeze/pip-tools)
  • Basically pip requirements file (i.e. resolver required; PDM/Poetry)

0 voters

2 Likes

I’m down sick, so I haven’t really kept up with the discussions in this thread – other than reading Brett’s most recent comment. It’ll probably be close to the end of the week that I can look at the rest of this discussion properly.

FWIW, this can also be written as (assuming TOML 1.0, which is a safe assumption IMO):

hashes.sha256 = "..."

So, it’s not even an added set of brackets, but just adding the hash name. I think that’s fine.

2 Likes

Conceptually you can’t do it without introducing another file (or CLI flag), since pyproject.toml is meant to be redistributed. Just because there’s a file here that can technically have some syntax typed into it, doesn’t mean that it fits the mold of what that file contains.

If we are going to support URLs that are binary blobs with no filename to parse, I think it would be more flexible to just have a filename field, and define it such that the filename field takes precedence over any filename found in the URL. That way if we add additional concepts to wheel naming (gpu, cpu features, whatever) this spec doesn’t need updated with every iteration.

I may be missing something, but I don’t see how this is true at all? Right now the spec produces a dict that ends up looking like:

{
    "mousebender": [
        {
            "needs": {"type" : "array", "value" : [
                {"type" : "string", "value" : "attrs>=19.3"},
                {"type" : "string", "value" : "packaging>=20.3"}
            ]},
            "code": [
                {
                    "hash-value": {"type" : "string", "value" : "c5953026378e5dcc7090596dfcbf73aa5a9786842357273b1df974ebd79bd760"},
                    "type": {"type" : "string", "value" : "sdist"},
                    "url": {"type" : "string", "value" : "https://files.pythonhosted.org/packages/35/bc/db77f8ca1ccf85f5c3324e4f62fc74bf6f6c098da11d7c30ef6d0f43e859/mousebender-2.0.0.tar.gz"},
                    "hash-algorithm": {"type" : "string", "value" : "sha256"}
                },
                {
                    "hash-value": {"type" : "string", "value" : "a6f9adfbd17bfb0e6bb5de9a27083e01dfb86ed9c3861e04143d9fd6db373f7c"},
                    "type": {"type" : "string", "value" : "wheel"},
                    "url": {"type" : "string", "value" : "https://files.pythonhosted.org/packages/f4/b3/f6fdbff6395e9b77b5619160180489410fb2f42f41272994353e7ecf5bdf/mousebender-2.0.0-py3-none-any.whl"},
                    "hash-algorithm": {"type" : "string", "value" : "sha256"}
                }
            ],
            "version": {"type" : "string", "value" : "2.0.0"}
        }
    ]
}

All you would need to do (assuming no other changes happen to the spec besides making it work file by file):

{
    "mousebender": [
        {
            "needs": {"type" : "array", "value" : [
                {"type" : "string", "value" : "attrs>=19.3"},
                {"type" : "string", "value" : "packaging>=20.3"}
            ]},
            "code": {
                "hash-value": {"type" : "string", "value" : "c5953026378e5dcc7090596dfcbf73aa5a9786842357273b1df974ebd79bd760"},
                "type": {"type" : "string", "value" : "sdist"},
                "url": {"type" : "string", "value" : "https://files.pythonhosted.org/packages/35/bc/db77f8ca1ccf85f5c3324e4f62fc74bf6f6c098da11d7c30ef6d0f43e859/mousebender-2.0.0.tar.gz"},
                "hash-algorithm": {"type" : "string", "value" : "sha256"}
            },
            "version": {"type" : "string", "value" : "2.0.0"}
        },
        {
            "needs": {"type" : "array", "value" : [
                {"type" : "string", "value" : "attrs>=19.3"},
                {"type" : "string", "value" : "packaging>=20.3"}
            ]},
            "code": {
                "hash-value": {"type" : "string", "value" : "a6f9adfbd17bfb0e6bb5de9a27083e01dfb86ed9c3861e04143d9fd6db373f7c"},
                "type": {"type" : "string", "value" : "wheel"},
                "url": {"type" : "string", "value" : "https://files.pythonhosted.org/packages/f4/b3/f6fdbff6395e9b77b5619160180489410fb2f42f41272994353e7ecf5bdf/mousebender-2.0.0-py3-none-any.whl"},
                "hash-algorithm": {"type" : "string", "value" : "sha256"}
            },
        }
    ]
}

Each particular project already supports having multiple entries, there should be no reason that we can’t limit each entry to a single file, and just add an entry for each file. This actually mimics how any Python installer that correctly handles already has to think about dependencies since every other part of the ecosystem (besides PyPI’s JSON API and Web UI, but no installer should be using either of those), mainly the PEP 503 API, presents a list of files, and you have to get the metadata is attached to each of those files (currently fetched primarily by downloading the file itself, soon by downloading a metadata file, hopefully some day just baked in as part of the API).

If a tool interacts with Python’s ecosystem and doesn’t already treat the metadata attached to a file instead of to a version, then that tool either breaks with certain completely valid wheels or they’ve gone out of their way to compensate in some other fashion.

It’s very weird to me that this PEP chooses to buck the established pattern that metadata is associated with a file, which seems to me like it can do nothing positive but somewhat minimize the size of the file (and if w’re that worried about that, there’s a lot of other places we should look a trimming first before we start sacrificing correctness) and has the downside that there are almost certainly going to be edge cases and wheels out there that will just have buggy behavior because of it.

I picked replacing requirements.txt replacement, because I think that covers both use cases perfectly fine (and I think my initial set of suggests almost covers it). Much like how pip freeze and pip-tools builds on top of a more fully featured format to implement a lock file, I think the same would be true here. Given the power to replace requirements.txt, you also have the power to use it to implement a lock file, you just have to be more careful with either the locker or the installer (or both) to ensure that you’re using the features that give you the desired end result. I like things that are general enough to be cleanly usable for multiple use cases.

I would also be OK with just a traditional lock file, I think in that case there’s still room for a new requirements.txt format, but it’s possible for that new format to be pip specific (just as requirements.txt is intended to be now), and pip could just “lock” or “compile” that down to the hypothetical traditional lock file.

I’m just not a fan of the current middle ground approach which feels like “lock file, but with whatever features needed to also implement the thing poetry/pdm calls a lock file”, mostly because it feels like those features mean that implementors have to pay most or all of the cost of the first thing, without getting all of the benefits of that thing.

1 Like

Oh nice, that’s actually even more readable (IMO) both on it’s own and in diffs then the initial example since it puts all the relevant information on a single line :smiley:

If we’re worried about the 3 characters of extra length we could even shorten it slightly to:


hash.sha256 = "..."

Which only has a single extra character (for the sha256 case at least, which is going to be the most common case for a while, maybe always!).

1 Like

You could chop that back down to parity, too, by going with hash.shaFF instead. :wink:

I’ve answered this poll, but in reality neither choice is particularly important to me. And I’d even be perfectly OK with the current “both”. What I want is not related to what use cases are supported, but rather how the PEP expresses what tools must do in order to be able to claim they support the PEP.

For producers, there’s no requirements at all that I can see, other than producing a syntactically valid file. It’s fine to (for example) produce the following file:

[metadata]
needs = ["mousebender<2.0.0"]

[[package.mousebender]]
version = "2.0.0"

Note that this file isn’t consistent, as it doesn’t provide a package version that meets the top level requirement. And yet a locker that produced this would be considered perfectly valid. And there’s nothing requiring a locker to actually lock anything - as far as I can see, it could just dump out the input requirements and a suitable subset of PyPI into a “lockfile” and claim to be done.

For consumers, it’s the same. There’s no statement on whether a consumer has to check hashes, or even has to ensure that what gets installed actually satisfies the requirements in needs.

In fact, one of the two explicit requirements on installers is that they “MUST error out if they encounter something they are unable to handle”. Which in effect means that it’s legitimate for an installer to claim to “support PEP 665” and to not implement large chunks of it (as long as they error out if they encounter those chunks). But it’s not even clear what “unable to handle” means. If there’s a hash in a lockfile, is an installer required to either check the hash or error out because they “can’t handle” hashes? If an installer only supports wheels, is it required to error out if there’s a sdist in the lockfile? Does that change if there’s also a valid wheel specified?

Individual lockers and installers can make choices for all these unanswered questions, of course. But a stated goal of the PEP is to allow developers to choose any locker they want, and consumers to choose any installer they want. And as soon as we allow implementation defined choices, we risk incompatibilities.

I feel like I’m just repeating myself at this point, I don’t think I’m saying anything new any more. But I still feel like people haven’t understood what my concern is here. I don’t know if this post makes things any clearer, but I’m going to stop after this attempt, unless people ask me specific questions - if I haven’t made myself clear by now, I likely never will :slightly_frowning_face:

2 Likes

So if I do

[[package.six.code]]
type="wheel"
url = "..."
hashes.sha256 = "8abb2f1d86890a2dfb989f9a77cfcfd3e47c2a354b01111771326f8aa26e0254"
hashes.md5 = "529d7fd7e14612ccde86417b4402d6f3"

This will automatically be turned into

{
  "type": "wheel",
  "url": "...",
  "hashes": [
    {"sha256": "8abb2f1d86890a2dfb989f9a77cfcfd3e47c2a354b01111771326f8aa26e0254"},
    {"md5": "529d7fd7e14612ccde86417b4402d6f3"},
  ],
}

?

If that the case I think I’m OK to the change if we mandate the TOML format and ordering of keys (by algorithm name) to reduce diff.

1 Like

No, that’ll turn into a dictionary with two keys, sha256 and md5. See “dotted keys” in TOML: English v1.0.0

I don’t see why we’d want this to be a list of dicts if it’s per-asset, which is what @dstufft’s example used.

1 Like

Surely we should be defining the data structure and that it’s serialised using TOML, but not mandating any particular layout of the TOML output?

If we start defining the layout, tools won’t be able to use standard TOML libraries, and using the format becomes an order of magnitude harder. That’s not a cost that I’d consider to be worth paying just to get cleaner diffs…

2 Likes

There’s no way any of the currently available TOML libraries for Python serialise this using the dotted key syntax anyway. Or if people are expecting that they’ll produce “diffable” output, they might be in for a surprise.

1 Like

A reminder that there are 12 hours left to fill in the poll in PEP 665: Specifying Installation Requirements for Python Projects - #140 by brettcannon.

1 Like

For in Nixpkgs it is crucial we get for the supported platforms the locations, filenames and hashes of the artifacts. Poetry works great with Nix. Even though Poetry keeps it open exactly which artifact you would use (e.g. when there are multiple wheels) the consumer of the lock file, in our case poetry2nix, implements defines additional rules for selecting artifacts. While locking exact artifacts would be great, that realistically only works with source builds if you want to support multiple platforms. Since I think we all agree we want to be able to support binary wheels as well, to me the only realistic solution is the method Poetry uses.

2 Likes

Yup yup. It would still be a good idea to make recommendations on what we know would be easier to read/inspect.

2 Likes

Just a quick update: I have sent a new draft of the PEP to my co-authors and select folks for feedback. It’s practically a rewrite based on the all feedback we received here. Once it’s ready for public consumption I will start a new topic and cross-link from here.

3 Likes

I have posted our latest draft at PEP 665, take 2 -- A file format to list Python dependencies for reproducibility of an application.