Yeah, that’s my concern. Without having a feeling of how dynamic and widespread that data is, I’m not sure if that would be too different per platform to make locking build dependencies feasible.
That would avoid having to call any PEP 517 hooks for dependency resolution, which is nice. But what about build dependencies? It seems that would necessitate calling the hooks to get the dynamic build dependencies if they were to be locked as well, negating that performance bonus.
I guess there’s 3 options when it comes to building sdists.
Lock down all build dependencies.
Create a constraints file, but all for additional build dependencies as necessary.
Just install what’s needed.
A compromise between 1 and 2 would be to do 1, but allow for 2 if some flag was set. Otherwise you would be expected to create the lock file on the platform you want to lock versus getting to create lock files for various platforms from one machine.
I think there’s probably an embarrassing hole in the standards there, as there’s no way to statically record the build dependencies in a sdist. Oops I think we should probably standardise something for that, but in the meantime yes, we’ll have to read the sdist’s pyproject.toml and call get_requires_for_build_wheel at lock time to determine the build dependencies.
Locking build dependencies is an interesting question, though. Pip’s isolated builds just solve the build dependencies at build time - and I believe build does the same. So existing build tools would only be able to lock build environments in non-isolated mode. That’s OK, in principle, but I think we want to be careful not to specify something that needs a whole new build tool to be created…
Disclosure: I haven’t read a lot of the prior art (just skimmed through this discussion). So apologies if this is not a productive comment.
For sdists, would it be possible to just include as much information as possible in the lock file and let pip decide at runtime via a flag what gets enforced? In other words include a hash of the the source and one of the produced wheel (is that how one checks that the output matches byte for byte?) and let the user (via a flag to pip) check at runtime if the produced wheel hashes should be checked. This way one could work around compiling on a different architecture or a non-reproducible package by saying “please dont enforce sdist built wheel hashes (but still enforce the source matching)”. Maybe with a warning listing the offending packages so users can go bug the maintainers of those packages to make them reproducible.
I don’t see a benefit in that case. Since wheel files are already specific to a platform, if you’re specifying a wheel file then you want that wheel file, period. Otherwise you’re after the sdist and not the wheel at all. So I don’t quite see the benefit of dropping the hash check and the security risks that entails.
I was thinking for the situations where the generated wheel is not deterministic byte for byte under the same wheel tag (I thought that was a concern raised earlier in the thread). I do see how it would be a security risk to allow that, even if it is opt-in.
If you’re generating the wheel from an sdist then there’s nothing to record about the generated wheel as the sdist itself should be what you’re locking against.
It may have been, but I personally think that’s the wrong approach to take with this. I personally think the lock file should be locking to files that you know are good inputs, not simply metadata of what may be.
How much do people care about locking the build dependencies for an sdist? If we go with the assumption that the install dependencies gathered from an sdist are consistent across platforms (by using markers appropriately and using Core Metadata 2.2 or later to let us rely on PKG-INFO), is resolving the dependencies down to a flat list of wheels and sdists (when people are okay allowing sdists) good enough for people? I’m trying to constrain the scope here so we know how far we need (not) go. Going from easiest to hardest to implement, it’s:
Lock wheel files
Lock wheel files and sdist files
Lock wheel files, sdist files, and the build dependencies for each sdist file
I think PEP 665 showed that option 1 isn’t necessarily tenable as a spec (but I personally think it is enough to potentially support in some tool). But is option 2 enough?
The reason I’m asking is right now you have to go through build APIs to get the build dependencies for a wheel. That means you’re already part way to building the wheel (and potentially all the way if you can’t rely on PKG-INFO), and so my brain is going, “then why can’t you just go one step farther and use the wheel?” With option 2 my brain isn’t tripping over itself like that, hence this question.
Personally, I don’t care in the slightest, but we’ve seen a number of issues on pip about details of what build dependencies get used, which suggests to me that there’s people who do care. (One case I recall is projects depending on numpy need to be built with the same version that is used at runtime - they have workarounds to ensure this, but I get the impression it’s all a bit tricky and fragile).
I’d suggest getting input from people wanting to use sdists when locking scientific packages, to make sure their requirements are met.
It’s quite difficult to distribute those built wheels, basically requiring your own index server and either rebundling to manylinux or distribution of the base environment. I find it far easier to simply create a Docker image for the app and distribute that, negating the need for a lockfike.
However, 2 is a good start and a compromise until perhaps we have more comprehensive metadata for build time dependencies. 2 is what poetry users are seeing at the moment and would be a welcome improvement I think.
That’s one way to solve it, but there are others. The difficulty of any solution is going to come down to how people choose to deploy code to where it needs to be. That means it’s varied and no way to please everyone.
That’s actually an interesting variant in all of this as that shifts the view of locking build dependencies to be a universal concern to a lock file versus a per-sdist concern.
But you’re right, scientific projects do sometimes treat numpy as an API target, hence conda supporting metadata to specify that sort of requirement.