PEP 665, take 2 -- A file format to list Python dependencies for reproducibility of an application

The word same exact packages is part of ambiguity.

For my own use case I’d be happy to have just complete version pins. Output of pip freeze is even close to what I want with main issue being pip freeze gives you packages in an environment and not resolution of a specific list of packages. pip resolve command that took a list of dependency constraints and gave a pip freeze like output would be fine. So in practice today I’m pretty happy with pip compile and version pinned requirements.txt is reasonably well understood by a lot of tooling.

I don’t understand where the ambiguity comes from. Could my definition, as above, be interpreted differently by different people?

Bit for bit equality of packages after installs? Or are two packages equal if they have exact same versions? Especially for packages where some build step (sdists) might occur.

It would install the appropriate packages for the OS/arch combination from the lockfile, potentially verifying each file’s checksum before installing. If sdists or VCS checkouts are involved, then it might not be a bit-for-bit match unless the strict reproduceability option was enabled (see above about that).

For me that would be fine - but personally I was always OK with the “pip-tools style pinned requirements” approach. I don’t have any major use cases myself, my interest is mainly just “as an installer maintainer, how do you expect me to implement this?”

From the POV of “is it an acceptable approach for a proposal to take”, my measure is “would people use it?” The problem with PEP 665 was that too many people were saying they wouldn’t use it in its existing form, they would “wait for the follow-up version that supported sdists”.

An installer implementation would be fairly straightforward. For each package in the lock file, it would look up the OS/arch-specific variant (if possible, otherwise a noarch wheel, or sdist, in that preference order, or VCS checkout if specified). Each package would be downloaded if not already in the cache, and its integrity would be verified. For wheels, the checksum could potentially be calculated during extraction, but sdists would be verified prior to installation. Each package would then be installed, in the order they appear in the lock file. The packaging tool would already have sorted them in the order where no package would be installed prior to its dependencies. Optionally, the installer could clear out any packages not belonging to the lock file (IIRC, Poetry can do this already).

For sdists and VCS checkouts, build dependencies could be locked on a per-package basis.

Extras for all packages could be pinned too, so that the user could install a different set of extras each time.

Yeah, sdists and VCS checkouts should definitely be supported in some form. As far as the “absolute reproduceability” use case goes, I would like to hear comments from someone who actually needs that.

1 Like

This approach is currently blocked by the fact that wheel-building is not reproducible, so it’s difficult to use a built wheel for checksums (since another environment can’t practically check against them). There’s been efforts, but we’re not there.


Regarding whether absolute reproducibility much be achieved, I think there’s enough interest from people not needing that level of strictness to introduce a format. So arguably the effort should be put more to form a narrative around this not-completely-strict use case, instead of trying to bend the format so it also satisfies absolute reproducibility.

4 Likes

What exactly are the blockers? File timestamps and permissions? Anything else? The wheel spec should be updated to cover both. I feel timestamps are pretty useless in wheels, so always setting them to 1980-01-01 should probably be in the spec. Permissions are a tad more problematic. Perhaps always setting them to 0o664 (regular files) and 0o775 (executables) or similar values should be standardized. The installer would respect the umask when installing those wheels.

Does this sound good? If it does, I’ll open a new discussion to get the ball rolling on that.

I have that use case, hence why I wrote the PEP. :wink:

I’m working on it. :grin:

Unfortunately that can quickly become too naive of a view due to requirement markers. For instance, what about libraries that you only need on certain OSs or Python versions? And what if they have their own dependencies that change what version should be installed?

And that’s a lot more work than simply relying on wheels, hence why I designed the first version the way I did in the PEP.

What does “same exact package” mean here? For instance, if a library on PyPI has both a pure Python wheel and one that comes with an extension module, is it the “same exact package” regardless of which of those two wheels you installed? What about an sdist that specifies different dependencies based on what’s in the OS? Does “OS” mean “exact same files on the machine”, or simply “same OS version” regardless of what’s on the file system/installed by the user at the time the sdist is built?

I’m approaching this from a beginner/“works on my machine” perspective, where being very precise with what you expect to be installed leads to less frustration/tears.

Probably need to ask @kushaldas what blockers there are for wheel reproducibility.

1 Like
  • Fixed path for wheel building (required for C extensions)
  • SOURCE_DATE_EPOCH value (we need a default value across projects)
  • Different library files (versions/checksums) in the manylinux wheels
  • ??? needs more inspection of the things writtenin C/Rust other languages for extensions

I skipped the last two points (not building manylinux wheels at all) and providing the rest in asaman.

2 Likes

SOURCE_DATE_EPOCH is the only thing on this list that I understand. The wheel package supports this, but it would be nice to amend the standard so that timestamps are constant.

Could you elaborate on the rest?

Maybe, but if a large amount of users require support for sdists (and VCS checkouts?), then dropping sdist support out of the spec is a non-starter.

Any sdist that produces a different set of requirements based on their current environment in the way you described is a lost cause as far as strict reproduceability is concerned. We can’t expect locking to work reliably with such projects, and it should not prevent us from proceeding with the work. Conditional dependencies, on the other hand, are fine. We just need to resolve each branch of these dependencies and record the results.

As far as pure-Python/binary wheels for the same dependency go, I consider reproduceability to mean that, all variants of the package (for any applicable versions) are recorded in the lock file, and the installer then determines, based on the OS and architecture, which variants to install. Thus, with the same installer options, OS and architecture, the installer would produce the exact (bit by bit) same result every time, at least when only wheels are involved. VCS checkouts and sdists might not produce the same results, obviously, if the surrounding compilation environment has changed, as we have no control over that.

The definition of “OS” here should really be the “platform” which is the combination of platform.system(), platform.architecture() and platform.libc_ver() (did I miss anything critical here?).

1 Like

I’ve tried to explain my perspective on what I expect out of a lockfile

But it didn’t really seem to get traction or my point across.

I think that discussions about true reproducibility go off-track, because they mean different things to different people, and in the strictest sense, are probably impossible to achieve in a practical way (or would require boiling the ocean). I don’t believe that byte-for-byte reproducibility is required for a lockfile to be useful. I think it’s a noble goal to pursue, but I don’t see it being a requirement for a lockfile standard. This can be demonstrated to be true because pip-tools and poetry (and others I’m sure) have created non-reproducible lockfiles that are still very useful. And if / when true reproducibility comes along for python wheel builds, then even better, it still wouldn’t make the lockfile standard any less useful.

I’ll try again with another metaphor for what I’d like to see from a lockfile standard.

I would like a lockfile to be a standalone specification for a “wheelhouse” that I can use across platforms. It would contain a list of wheels and sdists that are required at build time and runtime. The key point being that NO other artifacts or indexes should be consulted that aren’t in the lockfile.

pip install --no-index --find-links=/tmp/wheelhouse --requirement lockfile.txt

I think many would find such a lockfile useful, particularly if it was possible to produce them across platforms (I know this isn’t possible at the moment without some tweaks to PEPs around what metadata can be dynamic across platforms).

I personally would have loved PEP 665 as an incremental step by the way and would be comfortable if sdist support came later, however I do recognise and understand that there would likely be many who wouldn’t use a lockfile until sdist support came along.

6 Likes

All of this sounds compatible with what I’ve suggested: having a list of wheels, sdists (and maybe VCS references), pinned to specific versions. The installer would not install anything not explicitly on the lock file, but not necessarily all of them. I’m sure you won’t mind the checksumming feature. Are we in alignment?

I would want hash checking of all input artifacts (wheels, sdist, vcs bundles) listed in the lockfile that are used during installation. That’s functionality that I would expect and exists in pip already.

I think this PEP fell short because:
a) it didn’t support sdists or vcs
b) “reproducibility” discussions side-tracked the conversations because people thought this would be a compelling use-case to justify the PEP

I don’t think strict “reproducibility” is the compelling use case.

I think “deterministic inputs” or SBOM or some other use-case with weaker guarantees than strict reproducibility, but still very likely to be “practical reproducibility” would hopefully be enough to convince a PEP delegate to approve a PEP.

3 Likes

I’m in complete agreement.

@brettcannon I’m sorry if I haven’t seen you explain your use case (where absolute reproduceability is a requirement) in detail, but could you elaborate?

I don’t think you should be thinking in terms of “convincing a PEP delegate”, but in terms of “satisfying user requirements”. At least that’s what you need to do to convince this PEP delegate :slightly_smiling_face:

First of all, work out what people mean when they say they want lockfile support. Seriously. No-one has yet done this. PEP 665 ducked the question by avoiding the term “lockfile” and focusing on reproducible builds. But it then failed to make the case that reproducible builds were a sufficiently important use case. Doing the same thing with a different scope won’t change the outcome. What will change it is if a PEP focuses on what people want to do.

And standardising existing practice is a great way of doing this. Look at Poetry’s lockfiles. Or PDM’s. Or even pip-tools and pinned requirement files. Work out what they do well, and what people complain about - real world complaints, not theoretical limitations. Fix the problems and tidy up the good points.

If reproducible builds are an important use case, there should be people doing them now. What locking tools do they use? Add those into the mix, by all means. Work out how to support reproducible builds without making it a problem to use lockfiles for cases where reproducibility isn’t the key factor. If people doing reproducible builds don’t currently use lockfiles, document that in the PEP and declare the use case out of scope.

Or not. Ultimately, how to propose a standard is the choice of the PEP author and sponsor. All I can say is how I, as PEP delegate, will review a PEP. And what I’ll be looking for is a PEP that:

  • Addresses actual use cases
  • Builds on existing lockfile solutions
  • Has broad community consensus that this is the right solution
  • Actually solves the problem (i.e., we’re not going to get a “lockfile v2” proposal appearing any time soon).
6 Likes

We have a full website/project dedicated to reproducible builds. You can start reading from there.

I just want to note that various wordings in the proposal were somewhat awkward to me. Take this sentence for example:

Result in byte-for-byte installation output iff install dependency versions for a target platform are all bdist and the lockfile was produced on the target platform (PEP 665)

The iff part would indicate that if not all bdist or the target platform match, the production should not result in byte-for-byte installation, even though it’s a totally logical thing to do in many scenarios (wheel installation is simply unpacking and multiple runs can easily be byte-for-byte identical).

Various points under MUST NOT also have the same wording issue, since the use of the key words indicates the format would be invalidated if it does any of them, under any circumstances. I’m assuming that is not the intention, but it certainly does not help get your point across.

3 Likes

And I don’t have any insight on what “large” is in this case.

For them, sure, but since I’m doing all the work right now it isn’t a blocker for me. :wink: And I’m not suggesting something could come about later, but I need to start somewhere and my needs are explicitly met by restricting myself to wheel files, so I’m managing scope creep by worrying about the simplest requirement to get what I need out of this.

It really depends on whether you want a lock file upfront to be usable across platforms, or if you’re okay amending new platforms as necessary to the lock file? For instance, does the lock file need to work on Windows, manylinux, macOS, etc. all at once in one go, or can you start with your platform and add others as people come forward with the appropriate details for their platform (e.g., wheel tags, etc.)?

I’m personally not aware of anything requiring a PEP to make this work. What are you specifically thinking of?

To start, I think there’s some over-assigning of the term “reproducibility” to what I was proposing. To be clear, my goal was to make sure users got the same wheels downloaded and installed when using a lock file with the same markers and wheel tags. So there was never any expectation that the files themselves would be exactly the same simply because stuff in the scripts/ directory will end up with their shebang rewritten.

There’s basically three use cases I had in mind when I wrote the PEP.

One is security/consistency. For work we download wheels and install them for distribution within VS Code extensions (e.g., the Black extension ships with a copy of Black). We need to make sure that what we use in CI is the same as what gets shipped ultimately to the VS Code Marketplace (so version pinning). But we also have to do our due diligence as much as we can to make sure no one “pulls a fast one” and substitutes out some file for something that is completely different (that’s where verifying hashes comes in). We also have a desire to have lock files be platform-specific as we can ship extensions that are platform-specific, e.g. ship the accelerated version of Black on macOS, Windows, and Linux for Python 3.11 - 3.7 and then fall back to the pure Python support on other platforms.

Two, consistency when your dev and production environments are different. This is for the macOS or Windows developers who push to a Unix production environment. Being able to all agree to what you are developing against locally is consistent as well as how you all agree with what is in production takes a lock file that can handle different platforms appropriately (e.g., pip-tools doesn’t do this because it creates its resolved list of packages and versions for the current platform and then fakes cross-platform support by listing every single wheel possible when you specify --generate-hashes; that doesn’t guarantee any platform-specific stuff on other platforms is included).

Three, is simply avoiding “it works on my machine”. Beginners do not always understand that they may have accidentally installed something different than their friends did even if things are pinned down to the version as you may have gotten the pure Python wheel and they got the accelerated version. And try as they might, not all packages will nail the compatibility story perfectly (Python’s own stdlib can’t even pull that off). So having a lock file that can differentiate down to the wheel means it’s potentially easier to diagnose when there’s drift in what was installed.

The people who really care will cache the wheels they use and restrict their installers to just pulling from the personal index/cache, so they effectively cut out PyPI and any potential external source of bytes. I have also used pip-tools with --generate-hashes, stripped out the hashes for any files that don’t pertain to me, and then use pip install --no-deps --require-hashes --only-binary :all: -r requirements.txt (and this is what I have pip-secure-install · Actions · GitHub Marketplace · GitHub do and thus what we use at work).

1 Like