Let's permit `+local.version.label` in version specifiers

Removing the legacy version parsing and legacy specifier parsing? Absolutely. This removes some really horrible semantics (like <= 4r843n03x4f87283 being a permitted version specifier, and rg q39g82394gx,39p2gbpv2xev being a valid version) and moves us to a place where PEP 440 is enforced not just on PyPI but across the ecosystem.

The first link in the first post will, with the right clicks, take you to https://github.com/pypa/packaging/issues/321 which has a lot of discussion around this. IMO, this is a case where any change is a breaking change. It will be a breaking change for some users and we’ll need to figure out how to roll this out gracefully in pip. That said, I am sure that we’ll be in a better place on the other side of this, even though we’ll have a subset of the community feel a bunch of migration pains / breakage due to this.

1 Like

I had a look at fixing this up in NumPy & co, but on a closer read of PEP 440, it seems that this is also not valid? The version scheme doesn’t seem to allow anything after .devN. Is this right? If so, that’s pretty annoying. People go to great lengths to include git hashes in versions of nightlies, and it’s quite helpful when debugging (.dev1234 is meaningless, while .dev1234.<git-commit-hash> is not). And the .devN.xxx where N is commits since the last tag in upstream repo, and xxx is the commit hash works perfectly fine since the sorting on N happens first and that’s all that is interesting here.

pep-0440/#dvcs-based-version-labels seems to purposefully exclude this, but doesn’t account for the N meaning “commits from last tag” that various projects have arrived on. It’s also used by some of the more popular tools for version generation from a VCS (see python-versioneer#version-string-flavors and /setuptools_scm/#default-versioning-scheme).

I’m still unclear on the scope here. Disallowing pkgname >= 1.2.3+local seems fine, that has a limited impact. But PEP 440 explicitly disallowing uploading 1.2.3+local to any public index is unhelpful and seems gratuitous. Forbidding it on PyPI seems fine, but what about other public indexes like the PyTorch and the scipy-wheels ones?

It seems there are many topics mixed up here.

  1. The original discussion is simply about whether to allow local version segments in version specifiers. I think we’ve settled on no for that.
  2. Personally I think it’s OK-ish to upload packages with a local version segment to your own index. Anything connected to the internet is technically public so meh I say do what you want, especially for the purpose of putting Git hashes in the file name.
  3. The PyTorch issue is more complicated since they do want to distinguish between different cuXXX variants. But what they’re currently doing doesn’t actually achieve their intention (before or after the PEP 440 enforcement) anyway, and nobody is actually complaining about that (the complaint was about version specifiers, see first point), so there’s no rush getting it right.
  4. I guess we could introduce a new segment in the version string that can simply contain an arbitrary string (with some obvious limitations e.g. no dash). The package builder can put whatever they want and it will simply be ignored when the version is used for comparison. Perfectly suitable for nightly tagging, and PyTorch can also use this without abusing the local version segment (it still would work actually “work”, but again, nobody is complaining about that).
  5. To actually fix PyTorch’s problem, we’ll need real virtual dependency logic. @njs briefly mentioned a proposal, but a lot of detail needs to be filled. This should be a separate, very involved discussion.
5 Likes

Thanks @uranusjr, that’s an excellent summary of this discussion!

Thanks @uranusjr, that’s a helpful split. Amending PEP 440 for (2) would be useful I think.

For PyTorch, it does work as intended by the PyTorch team, and it allows users to reliably install the correct build variant, via use of --extra-index-url rather than version specifiers. The assumptions about how it’s used in this thread are not correct, and neither is the bug report linked in Pradyun’s original message (that seems to be one confused user). See Start Locally | PyTorch for the actual instructions that PyTorch gives to end users if you’re interested.

1 Like

PEP 440 doesn’t disallow it, it recommends against it unless you understand what you’re allowing.

PEP 440 says:

Local version identifiers SHOULD NOT be used when publishing upstream projects to a public index server, but MAY be used to identify private builds created directly from the project source. Local version identifiers SHOULD be used by downstream projects when releasing a version that is API compatible with the version of the upstream project identified by the public version identifier, but contains additional changes (such as bug fixes). As the Python Package Index is intended solely for indexing and hosting upstream projects, it MUST NOT allow the use of local version identifiers.

Note that for general public index servers, the language is SHOULD NOT, and it’s only MUST NOT for PyPI.

Taken from RFCs definitions of SHOULD NOT:

SHOULD NOT This phrase, or the phrase “NOT RECOMMENDED” mean that
there may exist valid reasons in particular circumstances when the
particular behavior is acceptable or even useful, but the full
implications should be understood and the case carefully weighed
before implementing any behavior described with this label.

I think that is the correct stance to take, which perfectly allows pytorch to do what they’re doing with tagging versions.

1 Like

Thanks for clarifying @dstufft. Without knowing the exact definition you linked to, I have to say that SHOULD NOT and NOT RECOMMENDED sound very different. The latter fits better. I was, and still am, worried that the PEP only saying “SHOULD NOT upload to a public index” may lead someone to want to remove support from some needed infra package, like packaging or pip. If y’all agree that there’s no chance of that happening, then I guess the status quo is fine.

Not being a native English speaker, RFC 2119 is actually how I internally understand those words. When I was in grad school there’s a course that taught us how research English is not really English but a variant of it with much more concrete definitions, implicitly put in place because not all researchers speak English natively and understand intricate differences of words like should and must, and even more importantly, even natives don’t always agree on them because language changes but text can’t. This is why things like RFC 2119 exist; they intentionally discourage from choosing between words. I guess this is just my much-too-long way to say I strongly disagree with changing the wording.

3 Likes

Even as a native (and mostly monolingual) speaker of English, but
one steeped in Internet protocol engineering standards from an early
age, IETF BCP14/RFC2119 and the various earlier RFCs which
independently defined those terms in their boilerplate have caused
me to internalize those precise interpretations any time I see them
in technical documentation or discussion. In order to avoid
confusion, however, it’s convention to fully capitalize the words
when that is the intent, or better still add boilerplate in
documentation referring to the standard.

2 Likes

First sentence in PEP 440’s Definitions section:

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.

All of the uses of them remain fully capitalized :slight_smile:

Yes, I meant in the general sense of interpretation in any technical
document or discussion. I fully agree that PEP 440 is very clear
that it means them in the IETF standard sense (which I greatly
appreciate!).

1 Like