PEP 665: Specifying Installation Requirements for Python Projects

h-vetinari · August 5, 2021, 4:09pm

Thanks for this reply, the “locked requirements.txt” is much clearer IMO, thanks. I think it would be good to avoid the word “installation” for the (name of the) PEP - this might mislead that it’s about the installation of the project for which the lock file is generated, rather than all of its dependencies.

I have a couple more thoughts/questions though:

I still don’t understand how platform-specific dependencies would be expressed in the needs syntax, given that some projects do use those.
Should the PEP formulate what should happen in case files are yanked from PyPI after the lockfile has been generated?
Regarding source trees & sdists vs. cross-platform: why not enforce a platform-specific tag for the lockfile (or at least warn) as soon as uncompiled sources are specified? This wouldn’t be an undue constraint IMO, but set more realistic expectations.

How would that work without again resolving the broad set of dependencies down to a unique hash per package?

brettcannon · August 5, 2021, 6:44pm

PEP 508 markers.

So you can yank versions, but not individual files (you have to use build numbers to simulate “yanking” a specific file). And yanking purposefully doesn’t remove files so that they can still be accessed by installers for those wanting to use them. So the PEP does not and I think should not take a stance on yanked versions (that’s between you and your locker and/or installer).

That’s covered in the Rejected Ideas section. I also think this is better covered by the installer who you can tell to skip anything that requires code execution (e.g. require an --allow-sdists and --allow-source-trees flags).

Because the “broad set of dependencies” is already recorded in the lock file and pip should have all the info it needs to resolve what to install from the lock file itself.

brettcannon · August 5, 2021, 7:01pm

Done, along with some more clarifications in PEP 665: add missing link targets · python/peps@05ef4c3 · GitHub.

h-vetinari · August 5, 2021, 7:23pm

I think it would be reasonable to mandate a warning for the installer in such cases.

I just looked over the rejected ideas and I’m not sure where this is covered.

I get this is not the main intended usecase, but since it was brought up specifically, I’m asking. I don’t see how the structure of the example file (e.g. [[package.attrs]] and [[package.attrs.code]]) would lend itself to encoding of:

For example, needs tags would need to be able to vary by version of the package, and so would hashes. I’m wondering how the lockfile can encode enough information that an installer doesn’t have to go fetch the metadata for individual builds again (you’ve already answered the part that a resolver is necessary in such cases).

uranusjr · August 5, 2021, 8:09pm

PEP 592 already does.

brettcannon · August 5, 2021, 8:35pm

peps/pep-0665.rst at 90600cc410291574f17e75ad638822e2f28ec151 · python/peps · GitHub unless I’m misunderstanding what you’re asking.

Let me flip the question around and ask you what metadata do you think is missing? The available files are all listed there and the needs key lists the projects that are depended on, and each package lists the version it represents, so I’m not sure what information you think pip typically uses that isn’t captured by the PEP.

pf_moore · August 5, 2021, 9:34pm

Was that the wrong link? That commit doesn’t change the “Installer Expectations” section at all.

pradyunsg · August 6, 2021, 7:28am

I think so. Here’s the commit that I think is relevant" PEP 665: more clarifications based on feedback · python/peps@90600cc · GitHub

pradyunsg · August 6, 2021, 7:37am

We can’t. The only tool that will definitely have this information is the locker. My reading of PEP 592 tells me that they have to behave as specified there.

The information about a package being yanked is encoded in the simple index pages. The installer is (by design) avoiding interacting with those pages – that’s the main optimisation.

This means that the installer does not have the information to present a warning. Also, note that this information is inhenrently temporal (a package you pinned/locked today could be yanked by tomorrow) so the only way to prevent that would be to somehow require the installer interact with the index, which basically eliminates a bunch of important characteristics of this design.

h-vetinari · August 6, 2021, 8:28am

Indeed, we were misunderstanding each other. I asked “why not enforce a platform-specific tag for the lockfile (or at least warn) as soon as uncompiled sources are specified?”, regardless of the mechanism - i.e. metadata.marker (rather than filename) would be fine. Also note the conditional on using sources directly.

Give a range of versions of package A that are legal in a broad lockfile, each version of A might have a different version range for its dependency B. The way I read the spec and the example, the needs-clause is on the package-level (not the .code level), and that would make it impossible to accurately represent such a situation.

Perhaps I’m also misunderstanding something, or not imaginative enough how this could fit into the given example lockfile. If the use case for “broad” lockfiles is in scope, then it would help to have an example of how that might look.

h-vetinari · August 6, 2021, 8:31am

OK, makes sense given the constraints.

brettcannon · August 6, 2021, 6:42pm

The needs clause is at the package version level (hence why it’s an array of tables and not a single table per package), but you’re right about it not being at the code level. I believe Pradyun spoke earlier in this thread about the rejected idea of trying to support differing requirements per-file. But you can definitely have differing requirement per-version of a package.

ncoghlan · August 9, 2021, 11:57pm

And the key thing to remember is that those version dependent requirements can have environment markers in them to restrict them to particular target environments.

The PEP’s qualification that it doesn’t allow for file specific dependencies in the “code” tables could probably be clarified a bit by restating that PEP 508’s environment marker support is the way to declare environment specific dependencies.

FWIW, I think this design decision makes sense on TOOWTDI grounds: projects with universal wheels may still have platform specific dependencies, so the latter shouldn’t require having multiple code tables in the lockfile. And once environment markers are allowed in the individual requirements, also allowing requirements to be listed in the code tables would be redundant.

ncoghlan · August 10, 2021, 12:40am

Commenting on the overall PEP: I really like the proposal. Just a few suggestions:

I think it makes sense to lean into the “Needs aren’t quite the same thing as installation requirements, they’re locked installation requirements for exact versions that can be resolved from within the package set described in the lock file” terminology distinction
As Nathaniel suggested, I think it would make sense to include a copy of the “install_requires” that was used to generate each “needs” listing (including the top level one), both for debugging and to emphasise the distinction between specified requirements and locked needs
If the top level installation requirements are added, I’d like to see a metadata.constraints field added as well to capture any externally imposed version constraints on the locking process (e.g. when pipenv is told to only upgrade what it has too, or when an external constraints file is used)
I think it would potentially also make sense to include “lock_date” and “indexes” metadata in a standardised format. However, it’s definitely not necessary to achieve the goals of the PEP, and I’m not sure the auditing gains would justify the extra hassle for lockfile generators.

uranusjr · August 10, 2021, 2:08pm

Or maybe we could just use install_requires (or more technically Requires-Dist) as needs entries directly? (Maybe with some name and version normalisation.)

kushaldas · August 10, 2021, 5:52pm

This will also help to reproduce the environment in a better shape.

brettcannon · August 10, 2021, 6:57pm

I think this depends on …

What specifically would need to change in the PEP to allow for that since I feel it already is allowed? We say that PEP 508 specifiers are allowed and the PEP essentially mandates a resolver in the installer already, so what’s preventing using the needs field for this with the way it’s worded? Just a flat-out statement of “you don’t need to tighten the PEP 508 specifiers and can leave them as the original input to the locker”? Or are people explicitly concerned lockers may output different things for needs compared to their Requires-Dist input that was received and want to force lockers to record their original input?

Also don’t forget that lockers have a tool table to use, so they could choose to record this on their own.

How so?

Couldn’t that be covered by needs/install_requires? The concept of constraints on top of installation requirements is actually a novel concept when it comes to packaging specifications. If you want to record the requirements that the locker was working with, why split it across potentially two keys and not just what everything resolved to?

Once again, lockers can record anything they like in the tool table, so mandating something should have a very broad benefit. For me, this seems like just more bookkeeping instead of recording the input into the resolver.

I would say that’s something for lockers to put into their tool table.

njs · August 10, 2021, 7:57pm

These two parts of your reply seem to contradict each other? In the first part IIUC you’re saying the solution to wanting to know what the original Requires-Dist fields were is, the needs field will exactly match it, so installers can use that to confirm that the wheel they’re installing has consistent metadata with the wheel that was used to generate the lock file. In the second part IIUC you’re saying that the solution to constraints etc. is that the needs field won’t match the Requires-Dist fields. Can you clarify?

brettcannon · August 10, 2021, 8:44pm

The responses make different assumptions as to whether Nick’s request was implemented.

The key question from me is what exactly are people wanting recorded in a lock file that is not covered by the PEP? I have heard what was used as input into the lock file (i.e. Requires-Dist), but then Nick tossed in constraints (which are a pip-only concept). So I’m asking whether people are after what the inputs to the locker’s resolver were (which is the first response), or if they really want every single potential input to a locker recorded (the second response)?

And in the name of simplicity I’m pushing back, trying to find a great example of a need here which is worth shouldering people reading a diff of a lock file every time instead of a “nice to have” in a typical day-to-day scenario. For instance, the mention of auditing as a reason seems out of scope as that’s essentially saying you don’t trust the locker. If that’s true then what it records in the lock file as it’s “input” is the least of your concerns. And if a locker doesn’t record details you want it to record then you can always use another one or reference the original input you provided.

Now if you were to say, “it’s to independently verify the inputs didn’t change when examining a lock file diff”, then that makes a bit more sense to me. But then again, my question then becomes what does this get you if the PR/commits don’t contain changes to the input you provided in the first place? Are you thinking more of ad-hoc command-line generation and trying to force people to write down their requirements somehow? Basically what is the motivation I would put into the PEP to explain why people have yet another line in their diff to examine when the gist of why something is there is captured by the needs data?

ncoghlan · August 11, 2021, 12:30am

My interest is in recording the inputs to the lock process, which generally consists of four things:

when it was locked (lock_date)
which package indexes were queried (indexes)
which packages and versions were requested (top level and per package version install_requires entries)
what other version constraints were applied (top level constraints entry)

My motivation for the last item was actually pipenv’s --keep-outdated flag moreso than constraint files, as that flag effectively uses a previous version of a lock file to impose soft version pins on the new version (the pins are only broken if necessary to satisfy requirements, rather than the default behaviour of upgrading everything to the latest version).

The general gist of these entries is to allow an assessment of “how up to date is this lock file?” looking only at info in the lock file. A recent lock file with loose input requirements, no external constraints, and locked directly against PyPI is going to have the latest version of everything. But lock files that were generated a long time ago, or have heavily constrained input requirements, or were locked against a private package index or a local directory would need to be treated with more scepticism.

However, a not-so-incidental benefit is to improve debuggability of lockers, by requiring them to emit both outputs and their parsed inputs in a standardised format. This isn’t so much a matter of failing to trust the locker implementations as it is a matter of wanting to be able to capture how they were invoked (e.g. locking directly against PyPI may not be desirable in some organisations, and CI could enforce that if the lock process inputs were captured in a standardised way. Similarly, a project like OpenStack could error out if the common version constraints hadn’t been applied)