Dynamic project names and PEP 621

vyasr · November 23, 2022, 11:17pm

I am a fan of the static package configuration espoused by pyproject.toml and PEP 621. As I migrate projects to use it, I find myself running up against various edge cases and limitations, but most of them seem to have relatively elegant solutions, with one exception: the requirement of a static project name:

A build back-end MUST raise an error if the metadata specifies the name in dynamic.

I would like to better understand for this limitation and either a) see if PEP 621’s language could be relaxed, or b) come up with a community-approved solution within the confines of PEP 621. Hopefully I’m not rehashing a bunch of conversations that have already taken place, but please do link me if this has already been discussed beyond the threads I link below.

(A quick note: as a new user I can only post a limited number of links so instead of linking to other threads on these forums I just include the second half of the URL in parenthesis. Prefacing those with https://discuss.python.org/t/ should take you to the discussions.)

a) Relaxing the PEP language

Based on the very long discussion on PEP 621, I suspect that relaxing this restriction would be quite controversial and probably is not worth pursuing further. As far as I can tell, this restriction has been in place since the earliest iterations of the PEP (pep-621-storing-project-metadata-in-pyproject-toml/4513). Based on a comment on the final thread (pep-621-round-3/5472/5) it seems like making the name static was important to clearly define the filename, which is used for package installation purposes, but I assume that’s not the only reason since the version is also used for that but was deemed acceptable to be dynamic. Are there other important reasons that making the name dynamic would cause problems?

b) Working around the restriction

There seem to have been a few relevant discussions about this problem on this forum:

The question was raised on this prior thread (limitations-of-pep-621/12934) with the use case of producing nightly packages with a different name from the main line of the package, but the conversation ended up more focused on some of the other issues there.
Another thread (options-to-build-the-same-package-in-different-ways-with-different-build-dependencies/4458) also touches on the point tangentially but is more focused on dynamic dependencies than names.
Finally, a third thread (idea-selector-packages/4463) proposes a generic solution that would address one of the potential use cases of dynamic names, namely to delineate packages that would ideally be differentiated by something like platform tags but don’t fit into the exact set of platform tags supported by PEP 425. A good example of this is for packages targeting accelerators that require certain drivers to be installed, such as Cupy versions targeting a specific CUDA version.

The feature described by option 3 would be very powerful but seems like a much more complex task that would have broader implications across the entire packaging ecosystem. The solution proposed in option 1 is much smaller in scope and perhaps more appropriate: just modify pyproject.toml during the build process. However, this change seems to be somewhat against the spirit of PEP 621. We’re taking something that we claim is not dynamic and then modify it outside the confines of the normal development process. Moreover, the fact that the Python standard library has a way to read TOML files but not write them means that any such modification will require third party libraries or perhaps non-Python solutions (e.g. a simple sed command). Do others find that concerning as well, or does moving forward with this type of modification of pyproject.toml seem like the best approach? If so, is there a suitable place that this could be documented as the preferred solution? It doesn’t quite feel like something that belongs in the PEP, but I also don’t know of any better place since documenting it in just the setuptools docs would leave other build backends and their users out of the loop.

Thanks!

brettcannon · November 23, 2022, 11:49pm

It stems from the fact that a project’s metadata must have a name and a version and people had demonstrated needs for a dynamic version number. Allowing name to be dynamic also makes it a bit more complicated to process the file and know what it applies to; all you know is it’s a file in some repo or sdist or something but otherwise have no identifier.

The PEP is done, so the spec now resides at pyproject.toml specification - Python Packaging User Guide . You would need to write a new PEP to relax that requirement.

vyasr · November 24, 2022, 12:39am

Could you provide some advice on what you think is the best path forward here? Do you think that the case for a dynamic version was simply much stronger than the case for a dynamic name at the time, or do you think that there are fundamental reasons that making the name dynamic would cause more headaches than the version? I’m trying to gauge the viability of such a PEP since I haven’t been involved in that process before. Are there places where it would make sense to gauge interest in or opposition to (e.g. from package managers) such a change?

On a more immediate note (given that the timelines for a PEP are potentially long), in lieu of any alternative existing in the short to intermediate term do you see any issues with the direct modification of pyproject.toml during a build process? Do you think there are better solutions here, or is that a viable approach that you don’t see issues with a project adopting unless and until a general solution is found?

CAM-Gerlach · November 24, 2022, 1:34am

@brettcannon might be able to elaborate further, but it seems to me that he addressed this above, mentioning both points played a role in the decision. As to the former,

and as to the latter,

Furthermore, this would mean that despite an sdist being uploaded to PyPI under a specific project name, when actually built it could have any arbitrary distribution name that may or may not match the sdists and could vary non-deterministically (I guess you could do this if you opted out of Pyproject metadata completely, but I’d argue there a big difference between that and endorsing it in the official interoperable standards).

While you do present a limited use case for being able to do this for development purposes (nightlies, building different distribution packages from the same pypproject.toml, etc), there doesn’t appear to be an equally strong case for doing this inside the source artifacts that are actually distributed to users and built by a standard Pyproject (PEP 517) build backend.

Brett might have a better answer for you, but the way I see it is that generating the project name (or indeed, the whole pyproject.toml file, as some projects do) is a preprocessing step that occurs prior to the actual standardized build step, rather than part of it. This would mean people naively invoking a Pyproject build frontend/integration frontend on your checked-in source tree would only get whatever the default static name for the package is without running some other step beforehand, but you’d be able to generate as many sdists and wheels with as many different names as you want for distribution, and those would be static and work just like any other package.

It seems to me like that would satisfy all the use cases yet presented (nightly builds, variant packages, etc), would be a lot simpler and more appropriate solution than trying to build dynamic name support into the Pyproject metadata spec and all the tools that support it, and can be easily implemented right now with a trivial shell or Python script (or more elaborate tooling, if desired, that may already exist).

pf_moore · November 24, 2022, 9:11am

Also note PEP 643, which mandates that the name and version metadata in a sdist must be marked as non-dynamic. This is not the same as PEP 621, so in theory if the project uses a backend that doesn’t support PEP 643, they could get away with this, but:

The project would break when the backend added PEP 643 support
Installers like pip assume the sdist name is definitive, so if the project name (or version!) changes, the installer would probably give an error.

For projects being built from a raw source tree, you could have a dynamic name, handled in the same way as dynamic versions. But it’s fighting the intent of the standards, IMO.

Note that dynamic versions are not intended to allow the version to change arbitrarily, rather they are intended to allow the (fixed) version to be stored somewhere other than in the metadata - essentially to solve the problem of people wanting to “single source” the version. I don’t believe anyone has provided a compelling example of somewhere that the project name would need to be “single sourced” in this way.

CAM-Gerlach · November 24, 2022, 9:32am

Just to make sure I have this clear, this means marked as dynamic in the PKG-INFO/Core Metadata (not necessarily in the pyproject.toml the sdist contains)—right? Since the whole point of the PEP is that tools can now rely on the core metadata fields in the PKG-INFO fields unless they are marked dynamic?

Just out of curiosity, are you aware of the implementation status? Last I seem to recall, few of the major backends had implemented it yet, which because of the serial nature of metadata versions (which I believe you and I discussed previously) means that theory PEP 685 adoption is blocked too, and so too PEP 639 (though it seems to be less of a problem in practice).

(I assume you mean PEP 643 here)

steve.dower · November 24, 2022, 10:40am

I’ve faced similar limitations with Nuget. My approach there was to dynamically replace the metadata file entirely before building.

You definitely want your sdist to have a static name, and regardless of any environment settings or options, the sdist should always build a package with a static name.

But I don’t see any reason why your source directory needs to be static. You probably want some kind of sensible default in there, because dev tooling will likely start using it one day, but if your release process swaps out the pyproject.toml for separate sdist builds, I don’t see any issue there.

pf_moore · November 24, 2022, 10:43am

Yes

Everybody is (I believe) waiting on PyPI accepting metadata 2.2 uploads, which is being tracked in Support Metadata Version 2.2 · Issue #9660 · pypi/warehouse · GitHub. Apart from setuptools (which has unique issues due to the fundamentally dynamic nature of setup.py) I believe most other backends have indicated at some point that supporting PEP 643 isn’t that hard for them (but I may be wrong on this, don’t quote me!)

To be honest, I’m a bit concerned that Warehouse is such a bottleneck here. I’m not sufficiently familiar with the codebase to know what the problem is (apart from the normal “too much work, not enough people” issue ) but being able to update metadata standards seems pretty critical to a lot of what we’re doing.

Whoops, yes, sorry - I’ve fixed that.

pradyunsg · November 24, 2022, 12:23pm

pip does already error out on inconsistent version or name metadata, compared to what it expects.

ofek · November 24, 2022, 4:55pm

I might have missed it, but what’s the use case?

pf_moore · November 24, 2022, 5:10pm

The one I saw mentioned was nightly builds. But I don’t know how that would work in practice, as you’d need to change the name every time in your install commands and dependencies. So yeah, I’ve yet to see a convincing use case for this.

steve.dower · November 24, 2022, 5:19pm

My similar use case was that I was building CPython for x86, x64 and ARM64, and wanted packages with different names for each of these (because it was easier than overriding the install tool when you wanted to install files for multiple platforms at once). You can see these packages here (all the *.nuspec files).

I assume the OP wants to do something similar, but because all of our tooling is built around a fixed filename, they’re trying to come up with an approach that preserves that. My suggestion is to have a “default” pyproject.toml that specifies a good enough package name, and when building alternate releases (e.g. mypackage vs mypackage-nightly), just modify the pyproject.toml before running the sdist build.

pf_moore · November 24, 2022, 5:32pm

Ah, so one “-nightly” package. That makes some sense, but I agree it seems to me sufficient to just modify the pyproject.toml at build time. I don’t think dynamic names are a pattern we particularly want to encourage, even if it is useful in some edge cases.

ofek · November 24, 2022, 5:48pm

For nightlies why not encode that in the version?

dustin · November 24, 2022, 6:31pm

I don’t think anything is blocking it, just a matter of someone (a maintainer or a contributor) prioritizing the work.

dustin · November 24, 2022, 6:42pm

Given OP’s work with Nvidia and the mention of Cupy/CUDA versions, I’m going to guess this is to ship projects that encode the CUDA version in the project name, like cupy-cuda92 · PyPI.

If that’s the case, I wonder if the actual problem is that the wheel spec has no way to specify GPU tags, as discussed in What to do about GPUs? (and the built distributions that support them)

ofek · November 24, 2022, 6:55pm

Ohhh I see, yes that’s a better solution for that use case

vyasr · November 24, 2022, 8:45pm

Thanks for elaborating @CAM-Gerlach. Reminding me of the emphasis on sdists is helpful, I forget about that. I agree, from that perspective preprocessing the pyproject.toml in builds seems perfectly reasonable since it would generate sdists with consistent package names and pyproject.toml specs without causing any issues. That also addresses my concern about this violating the spirit of the PEP since preprocessing the metadata still generates consistent sdists. Sounds like @steve.dower and others are already doing that without issue.

Note that dynamic versions are not intended to allow the version to change arbitrarily, rather they are intended to allow the (fixed) version to be stored somewhere other than in the metadata - essentially to solve the problem of people wanting to “single source” the version.

@pf_moore to clarify, one of the common and important examples of this is using a tool like setuptools_scm or versioneer to pull the version out of source control, right?

I might have missed it, but what’s the use case?

Given OP’s work with Nvidia and the mention of Cupy/CUDA versions, I’m going to guess this is to ship projects that encode the CUDA version in the project name, like cupy-cuda92 · PyPI.

If that’s the case, I wonder if the actual problem is that the wheel spec has no way to specify GPU tags, as discussed in What to do about GPUs?

Yup, this is exactly the use case that I have in mind, and what I alluded to in my original post (including the discussions of platform tags and selector packages):

Finally, a third thread proposes a generic solution that would address one of the potential use cases of dynamic names, namely to delineate packages that would ideally be differentiated by something like platform tags but don’t fit into the exact set of platform tags supported by PEP 425. A good example of this is for packages targeting accelerators that require certain drivers to be installed, such as Cupy versions targeting a specific CUDA version.

Specifically, I’m working on creating pip wheels for RAPIDS, so I’m working through the various different issues that we run into here. The “what to do about GPUs” thread is very long and touches on a large number of additional topics that I don’t want to bring into this thread (although they are very important and will need to be addressed eventually), but I’d be happy to try to find a better solution for the specific problem of GPU tags/environment markers. My recollection is that there were issues with adding GPU information to the platform tag because of the existing optional build tags, but environment markers were a potentially better solution that avoided some of the pitfalls of modifying the build tag. Selector packages also seemed viable (seems to be similar in spirit to a conda metapackage) but also seemed like it would require much broader changes to the packaging ecosystem (correct me if I’m wrong). For the moment I am likely to move forward with the pyproject.toml preprocessing as a way to keep my own projects moving, but I’m interesting in implementing one of these more general solutions with some guidance from more informed members of the Python packaging world.

pf_moore · November 24, 2022, 8:48pm

Yes, that’s right. And that’s why I don’t see a parallel with name, as source control isn’t typically the “one true source” of the project name…

steve.dower · November 24, 2022, 10:34pm

They certainly don’t fit with the direction pip has taken with dependency resolution anymore, so yeah, probably have to rule them out completely now.

Still, I think you can emulate them somewhat by separating your packages into a “main” package with extras. So you might have rapids as the core package, then rapids_cpu, rapids_cuda92, rapids_cuda101 etc. that can all be installed in parallel, but rapids has the logic to choose the right one at runtime. And maybe the more specific packages only have platform-specific logic in them, and most of the generic logic is still in the central package, so you aren’t duplicating a ton.

Then users can pip install rapids-cuda101 (which depends on rapids) and they get the right one, or pip install rapids[cuda101] if you want to add extras, but they end up with two packages installed. Or they can install everything, change their hardware, and still have a working setup because it picks the right version dynamically. If it can’t find the one it needs, you can display the exact message at that time for what to install.

I know the problem with this is that the way CUDA works is you end up duplicating a lot of code and everything gets huge. That’s unfortunate, and I’d love to see a better design for its builds, but it doesn’t mean that any other approach to packages is any better. Silently choosing the right build for a given GPU only makes it harder to choose a specific build, so I’m not a huge fan of that.