Removing the legacy version parsing and legacy specifier parsing? Absolutely. This removes some really horrible semantics (like <= 4r843n03x4f87283 being a permitted version specifier, and rg q39g82394gx,39p2gbpv2xev being a valid version) and moves us to a place where PEP 440 is enforced not just on PyPI but across the ecosystem.
The first link in the first post will, with the right clicks, take you to https://github.com/pypa/packaging/issues/321 which has a lot of discussion around this. IMO, this is a case where any change is a breaking change. It will be a breaking change for some users and we’ll need to figure out how to roll this out gracefully in pip. That said, I am sure that we’ll be in a better place on the other side of this, even though we’ll have a subset of the community feel a bunch of migration pains / breakage due to this.
I had a look at fixing this up in NumPy & co, but on a closer read of PEP 440, it seems that this is also not valid? The version scheme doesn’t seem to allow anything after .devN. Is this right? If so, that’s pretty annoying. People go to great lengths to include git hashes in versions of nightlies, and it’s quite helpful when debugging (.dev1234 is meaningless, while .dev1234.<git-commit-hash> is not). And the .devN.xxx where N is commits since the last tag in upstream repo, and xxx is the commit hash works perfectly fine since the sorting on N happens first and that’s all that is interesting here.
I’m still unclear on the scope here. Disallowing pkgname >= 1.2.3+local seems fine, that has a limited impact. But PEP 440 explicitly disallowing uploading 1.2.3+local to any public index is unhelpful and seems gratuitous. Forbidding it on PyPI seems fine, but what about other public indexes like the PyTorch and the scipy-wheels ones?
The original discussion is simply about whether to allow local version segments in version specifiers. I think we’ve settled on no for that.
Personally I think it’s OK-ish to upload packages with a local version segment to your own index. Anything connected to the internet is technically public so meh I say do what you want, especially for the purpose of putting Git hashes in the file name.
The PyTorch issue is more complicated since they do want to distinguish between different cuXXX variants. But what they’re currently doing doesn’t actually achieve their intention (before or after the PEP 440 enforcement) anyway, and nobody is actually complaining about that (the complaint was about version specifiers, see first point), so there’s no rush getting it right.
I guess we could introduce a new segment in the version string that can simply contain an arbitrary string (with some obvious limitations e.g. no dash). The package builder can put whatever they want and it will simply be ignored when the version is used for comparison. Perfectly suitable for nightly tagging, and PyTorch can also use this without abusing the local version segment (it still would work actually “work”, but again, nobody is complaining about that).
To actually fix PyTorch’s problem, we’ll need real virtual dependency logic. @njs briefly mentioned a proposal, but a lot of detail needs to be filled. This should be a separate, very involved discussion.
Thanks @uranusjr, that’s a helpful split. Amending PEP 440 for (2) would be useful I think.
For PyTorch, it does work as intended by the PyTorch team, and it allows users to reliably install the correct build variant, via use of --extra-index-url rather than version specifiers. The assumptions about how it’s used in this thread are not correct, and neither is the bug report linked in Pradyun’s original message (that seems to be one confused user). See Start Locally | PyTorch for the actual instructions that PyTorch gives to end users if you’re interested.
PEP 440 doesn’t disallow it, it recommends against it unless you understand what you’re allowing.
PEP 440 says:
Local version identifiers SHOULD NOT be used when publishing upstream projects to a public index server, but MAY be used to identify private builds created directly from the project source. Local version identifiers SHOULD be used by downstream projects when releasing a version that is API compatible with the version of the upstream project identified by the public version identifier, but contains additional changes (such as bug fixes). As the Python Package Index is intended solely for indexing and hosting upstream projects, it MUST NOT allow the use of local version identifiers.
Note that for general public index servers, the language is SHOULD NOT, and it’s only MUST NOT for PyPI.
SHOULD NOT This phrase, or the phrase “NOT RECOMMENDED” mean that
there may exist valid reasons in particular circumstances when the
particular behavior is acceptable or even useful, but the full
implications should be understood and the case carefully weighed
before implementing any behavior described with this label.
I think that is the correct stance to take, which perfectly allows pytorch to do what they’re doing with tagging versions.
Thanks for clarifying @dstufft. Without knowing the exact definition you linked to, I have to say that SHOULD NOT and NOT RECOMMENDED sound very different. The latter fits better. I was, and still am, worried that the PEP only saying “SHOULD NOT upload to a public index” may lead someone to want to remove support from some needed infra package, like packaging or pip. If y’all agree that there’s no chance of that happening, then I guess the status quo is fine.
Not being a native English speaker, RFC 2119 is actually how I internally understand those words. When I was in grad school there’s a course that taught us how research English is not really English but a variant of it with much more concrete definitions, implicitly put in place because not all researchers speak English natively and understand intricate differences of words like should and must, and even more importantly, even natives don’t always agree on them because language changes but text can’t. This is why things like RFC 2119 exist; they intentionally discourage from choosing between words. I guess this is just my much-too-long way to say I strongly disagree with changing the wording.
Even as a native (and mostly monolingual) speaker of English, but
one steeped in Internet protocol engineering standards from an early
age, IETF BCP14/RFC2119 and the various earlier RFCs which
independently defined those terms in their boilerplate have caused
me to internalize those precise interpretations any time I see them
in technical documentation or discussion. In order to avoid
confusion, however, it’s convention to fully capitalize the words
when that is the intent, or better still add boilerplate in
documentation referring to the standard.
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.
Yes, I meant in the general sense of interpretation in any technical
document or discussion. I fully agree that PEP 440 is very clear
that it means them in the IETF standard sense (which I greatly
appreciate!).
I just stumbled onto this. I note that this would address some of the issues that keep popping up regarding extra-index-url (wanting some local builds to be prioritized over pypi). Indeed, allowing something such as:
torch >= 1.13.1 ; localversion=cu117
This would presumably allow to specify a constraint file such as:
mypackage ; localversion=myprivatetag
which would require that mypackage is only considered if it has the local version myprivatetag which can’t be published to PyPI.
Hi @mboisson! This is a 1.5+ year old discussion that ended with a conclusion to do things a certain way and relevant rollouts have already happened. I think it would be good to not reopen this thread with a new idea.
Note that versions with local version labels are already supposed to be preferred over versions permitted on PyPI.
That said, I also wanna respond to the idea proposed: I don’t think using local version labels as an environment marker works well — their semantics don’t really fit what we want here with missing markers being errors and the marker environment being different based on what choice is made for a different dependency makes this a no-go.
I think the wider consensus is that what we want is proper support for variants of a package based on platform capabilities/ABI etc and how to make that feasible is being discussed in other topics right now. The current discussion has meaningfully moved forward from when this was brought up and is happening over at Selecting variant wheels according to a semi-static specification now (which has links to various related threads in OP).
Hi @pradyunsg.
Thanks, and sorry for reopening this. There has been so much discussion that have lead nowhere near a workable solution for us that we keep running into this kind of thread looking for one.
Local versions are preferred, but not whenever there’s a new version on PyPI, so it is better than nothing, but not a proper solution. What we really need is a way, whatever shape or form it takes, without requiring to run a server/proxy, to have pip ignore pypi for specific packages (and specific packages only).
I’ll check that other link to see if that might help to achieve this.