Insights into how poetry.lock works cross platform

Being able to see from the sdist metadata that dependencies are the same everywhere, without needing a build at all, is even better. I’d be OK with a locking tool that produced cross-platform lockfiles for sdists with non-dynamic dependencies, and lockfiles that were specific to the platform they were created on if one or more sdists needed declared their dependencies to be dynamic.

3 Likes

It’s not particularly hard to construct reasonable situations where this can be useful. For instance, you could have a wheel for an older version of macos prior to some system library being included as part of the “base” install that uses a backport or alternative mechanism for implementation, then a wheel that targets a newer version of macos that does not.

I don’t think it makes it impossible to support cross platform lock files, since you can (as you mentioned) fetch the dependencies from the wheel files as part of generating that lockfile. It makes it take more effort, but that’s unavoidable IMO since it’s sort of a fundamental part of how our packaging works currently and attempting to close the barn door after the horse has left the stable is going to (silently) produce invalid lock files, which seems like the worst possible option.

But fetching every wheel’s metadata is only hard right now, PEP 691 is accepted and implemented, and PEP 694 is being worked on right now. Those are stepping stones on the path to having dependencies show up in the repository API, which means that you can get all of those dependencies without fetching every single wheel.

It’s impossible to support portable lockfiles for arbitrary sdists as things currently stand, and just writing in the docs that “hey you can’t do that” isn’t going to stop people from doing it. It will just be silent errors that constantly pop up and cause a constant low level stream of friction.

If we wanted to do this, the only path forward would be to make dependencies not allowed to be dynamic in an sdist, but you still have the problem of the 3.6 million releases with 6.5 million individual files that don’t fit into that on PyPI today, and a hypothetical tool needs to deal with them as well.

Which, in the past, there was broad push back on doing that in sdists, because there are projects who do interesting things to select dependencies, and they just choose not to publish wheels in some cases as well because sometimes not even environment markers or the wheel tags are expressive enough for them.

I don’t object to wheels having different dependencies in principle, but should we not add to environment markers so they can express as much as the wheel tags, so the most common use cases can switch to environment markers and save a lot of people’s time?

4 Likes

275, as far as I can tell.

Out of the 310,000+ latest versions of packages that ship wheels, there are 275 that ship wheels with different sets of dependencies. That is according to this query over the pypi BigQuery dataset.

1 Like

Be aware that information may be inaccurate, particularly for wheels that were uploaded prior to Jul 8, 2020.

Thanks for the tip. Limiting the query to upload_time > '2020-07-08' still leaves me with ~230,000 packages.

2 Likes

It would be great to understand how it compares to download stats. From a user perspective, it doesn’t matter if it’s low in the absolute count of packages. It matters a lot if only 1 or 2 highly downloaded packages makes use of this functionality. Either intentionally or accidentally. Certainly at my $dayjob we ran into some issues a few times that forced us to stop sharing lockfiles and we decided to generate a lockfile per installation target (which is annoying overhead and a pain for mac).

I do however see this as a big opportunity in terms of a roadmap towards a lockfile. I would love your thoughts @brettcannon

It seems to me like python packaging was never intentionally designed to behave this way. It does at the moment due to a specification gap and entrenched existing behavior, rather than by design. Correct me if I’m wrong.

Removing this ambiguity in the standards seems to me like it unlocks a huge amount of value and simplification towards a potential lockfile.

It unlocks this:
The ability to create a cross-platform lockfile, that includes sdist and bdist, as long as the locking platform is able to successfully install packages (or via a dry-run installation).

That is a massive amount of value added in my opinion and it is something the community has been asking for over the years. I could imagine it might be challenging to enforce this sort of behavior in warehouse, but perhaps I’m wrong. Even if it’s not enforceable in warehouse, if it’s documented that the expectation is that all distributions for a (package, version) pair MUST have the same dependencies, then it provides some desirable invariants that downstream tooling and designs can build on.

I know that @dstufft has some concerns, are these truly something that you won’t budge on, or can you see a world where this extra degree of freedom isn’t permitted?

3 Likes

The trouble with any proposal like this is that there are a huge number of private projects, unpublished on PyPI. Regrettably, in my experience working on pip, these are typically where the bulk of the cases that take advantage of these unfortunate loopholes in our specs live.

Yes, agreed. It’s not something that can be forced on users, but it’s still workable potentially.

Do we have a sense if it’s feasible for repositories to implement? Maybe the first non-sdist distribution uploaded for a (package, version) pair pins the dependency metadata?

If repositories can implement it, then perhaps artifactory and Warehouse (and others) can implement a soft fail warning? And then begin to hard fail after some time.

Another option could be to make some future lockfile opt-in. So users opt-in to the invariant being true if they want to use a lockfile. Packages that are incompatible with a lockfile will become compelled to be compliant because users will ask for it. Similar to the new resolver rollout potentially.

It’s basically what Metadata 2.2 does. If the sdist doesn’t mark the dependency metadata as “dynamic”, tools can assume that all wheels generated from that sdist will have identical metadata.

What needs to be done is for PyPI to accept Metadata 2.2 (work in progress) and backends to start using it (which they can’t realistically do until it’s uploadable). Projects can still make their dependency metadata dynamic, but it will be marked as such and lockers can identify them and refuse to create cross-platform lockfiles.

The problem of course is that older sdists will not use metadata 2.2, so everything will be assumed dynamic by default. But that just means that there will be a delay before metadata 2.2 is prevalent enough to mean that cross-platform lockfiles are common.

4 Likes