Ambiguity in lock file spec when file names disagree

Let’s take wheels as an example. There is package.wheel.name, package.wheel.path and package.wheel.url. None of those are exclusive, so any combination can be set. The question that has come up thanks to @sbidoul and this PR is what if those keys disagree about file names (this also applies to sdists and archives)?

My thinking is name takes precedent no matter what, but if path and url are both set but not name and there’s a conflict with the file name then the lock file is rejected for being ambiguous. You could argue sdists are a special case since the file name doesn’t contain critical metadata thanks to package.name and package.version, but “Special cases aren’t special enough to break the rules”.

So my questions are:

  1. Do people like what I’m proposing?
  2. If so, is this just a doc update, a version bump on lock files, or a PEP-level change?

Sounds reasonable to me, and I think this can be just a doc update, because the problematic scenario should (as far as I can see) not happen in normal usage, it would take special effort to construct a lockfile that was inconsistent like this.

In fine with this rule for wheels, and for sdist as well, for symmetry.

For archives I’m less sure as there is no package.archive.name field and the archive name is normally not relevant, so if it differs for path and URL it should not matter in practice.

This also makes me think we should validate that names of package.wheels[] and package.sdist are consistent with package.name and package.version. I would also suggest that as a doc update.

What if something went horribly wrong and the URL says spam-1.0.tar.gz and the path said spam-1.0-py3-none-any.whl? Now you don’t know how to install the file (you don’t even know the file format). Otherwise I’m not sure how I would work with archive files from an installer perspective.

That I’m less concerned about as it doesn’t impact installation. But I would be okay making that a “SHOULD” if @pf_moore is since it’s a cheap consistency check.

Can we take a step back here?

We’re talking about a very specific case, where we have a [packages] entry with no name specified, but with both path and url given. My first question is why is this even allowed? What is the use case for specifying both path and url? But let’s ignore that for now - making the two mutually exclusive would be an incompatiblke spec change (at least in theory).

The key thing is what installers should do if both are specified. The spec says to prefer path over url for wheels and sdists, but for other types (presumably) leaves it to the installer to decide. And in any case, the installer would logically act as if only the entry that was chosen is present, ignoring the other entry. That would mean that inconsistency is irrelevant - there’s no situation where anyone would care about the two values having matching names, because the two values would never be used in the same context.

So I’m inclined to agree with @sbidoul that differences shouldn’t matter in practice.

I commented on the PR linked above - for that specific case (which only talks about sdists and wheels), I think the rule “pick the first of name, path and url which exists, and use that” is correct, and is covered by the existing spec.

We could add a requirement that the names match when more than one is specified, and/or extend all of this to cover other package types, but there’s a potential to break real-world uses if we do so[1]. And I’m not sure there’s enough benefit to be worth doing so. What do we gain here, given that I think we have an answer within the spec for the motivating PR?

I agree with this - it’s the obvious use for the package.name and package.version fields, and I’d be surprised if tools didn’t do this anyway. Explicitly stating that tools SHOULD do this makes sense to me.


  1. Although FWIW, pip lock appears to only generate one of path or url. ↩︎

@sbidoul pointed out that the installation section of the spec isn’t normative, which is a point I’d missed. But I think I’d rather just make the rule on which attribute to prefer normative (which just formalises one part of the process of interpreting a lockfile) instead of create new consistency rules, which potentially change what counts as a valid lockfile.

Potentially caching versus download. The hashes guarantee there wouldn’t be a difference where you get the file from.

I’m fine with that. So name > path > url?

2 Likes

If it is a SHOULD can packaging.pylock validate that?

Even if package.name and package.version do not directly influence installation, I would prefer if that was validated out of the box, for instance to ensure that all wheels and sdist of a package entry are actually for the same package name. Same for version.

:+1:

I see no reason why not. Tools can choose not to use packaging.pylock if for some reason they really don’t want to do that validation. The only reason I prefer SHOULD over MUST is to avoid making any existing tools that don’t validate non-compliant.

If the consensus is that we want to be strict here, we can do that by declaring that lockfiles where package.name and/or package.version are inconsistent with the filename are invalid[1]. That requires producers to ensure they don’t produce garbage, rather than requiring consumers to do the checking (but still noting that they should validate).


  1. For version, the spec may already require this - it says that version MUST NOT be specified if it cannot be guaranteed to be consistent with the code. which seems to me to be a roundabout way of saying that if it is specified then it must be consistent ↩︎

I’m fine with a SHOULD as the strictness isn’t technically important to installation, just weird the file name doesn’t line up. But if people want to go farther then I’m also fine with that.

Clarify file name precedence for archive, sdist, and wheel specifications in pylock.toml by brettcannon · Pull Request #2018 · pypa/packaging.python.org · GitHub covers file name precedence. I’ll wait on file name parsing until we decide on “SHOULD” or not.