Pre-PEP: Add ability to install a package with reproducible dependencies

Context

PEP 751 introduced a standardized format for reproducible package installation when working on a Python project.
At the same time, tools like pipenv, poetry, uv, and pipx made it much easier to install a Python project in an isolated virtual environment (venv).
So there is mostly a reproducible set of dependencies (= locked) and Python interpreters a Python project is tested against, before it is released as a Python package.
However, there is currently no defined way to reproduce this environment when installing the Python package for use with a Python project.

Motivation

Python projects are released[1] with dependencies as abstract as possible in order to prevent dependency hell. This especially means that upper version bounds of dependencies are discouraged.
With a growing number of packages and dependencies, the likelihood that a new release of a (sub-)dependency introduces a conflicting or, in the worst case, malicious change increases significantly.

This proposal aims to keep the abstract nature of Python dependencies while allowing users to install a verified set of dependencies for one Python package into their venv.

In other words: Add the possibility to install one Python package into a user’s venv with dependencies that were verified to be valid.

Use cases:

  • A data science Python project that wants to provide reproducible results[2], while still being usable by others for their own projects.
  • Authors of Python libraries with many dependencies providing a way for users to set up a venv in a state that was verified to work.
  • Companies that need to ensure a specific, validated[3] set of external dependencies is used.
  • Maintainers of Python infrastructure who want to give less-technical users a cross-platform way to install a reproducible set of packages without requiring them to learn new tools.
  • Authors of tools that want to allow Python users to use them as reliably as possible.
  • Making deployments of Python applications into venvs reproducible while keeping flexibility[4].
  • Allow backporting to a working version of a Python package and/or its dependencies.

Non-Goals

  • Install more than one Python package as locked.

    • Due to mismatched pinning, it is nearly impossible to install two or more packages using their individual lock files.
    • Still, authors of package B could use pinning provided by package A to generate their own lock file with additional or changed dependencies[5].
  • Create applications / End-user distribution.

    • This proposal is only intended for installing one package into a venv in a reproducible way.
    • While many Python projects serve both as a library and as an application, and can be used as applications from a venv, this proposal does not cover end-user application deployment.
    • It touches only a small subset of the end-user application problem space. That space requires solving interpreter deployment and platform-specific distribution. See BeeWare or PyInstaller for tools that address that space.

:page_facing_up: Proposal: Including and handling pylock.toml in Wheels

To allow for reproducible installation of Python packages, it is proposed to include the pylock.toml (or named lock files like pylock.<name>.toml, as defined in PEP 751) inside a wheel’s *.dist-info/pylock/ folder.

As explained in PEP 770, adding new files or folders to a wheel does not require a new metadata version.

The existence of a lock file SHALL NOT be required.
The content of the lock file SHALL reflect a verified state of dependencies[6] for the given Python package release.


Package Manager Handling

Python package installers SHALL NOT consider this file unless explicitly requested by the user.
Special-purpose tools (like pipx, uv tool, etc.) that focus on installing main entry-point packages MAY differ in behavior.

A package installer SHALL require user confirmation if any requirement in the lock file is to be installed from a source different from the one used for the original wheel[7].


How to Include Lock Files in Wheels

Including lock files in wheels would be the responsibility of the build system.

The build system could either copy an existing pylock.toml file or generate one dynamically during the build process[8].


Changes to requires-python

The strict lower-bound restriction for requires-python in the lock file specification (PEP 751) shall be relaxed to support full version specifiers, like those used in packages.requires-python.

This allows packages to document the full range of Python interpreter versions they were tested with[9].


FAQ

Why link the lock file to the wheel?

  • Wheels are releases of Python projects and typically represent a tested state of the package.
  • So it’s natural to link the lock file used during testing with the final wheel.

Why not distribute the lock file separately via URL?

  • Adds a dependency on a second service being available at install time.
  • Requires the user to trust an additional installation source.
  • Makes the lock file mutable (since it can be changed after release). In contrast, a wheel is already safely stored with hashes in the index. If you trust the index, you can trust the lock file bundled within the wheel.

Alternatives

Here are possible alternatives — some are currently used as workarounds:

  • Custom package index with only a subset of validated packages[10]

    • :cross_mark: No real locking
    • :cross_mark: Hard to maintain
  • Dynamically add locked dependencies as Requires-Dist during build

    • :white_check_mark: Works today without standard changes
    • :cross_mark: Loss of abstract dependency model
    • :cross_mark: Locked dependencies are hidden from tools
  • Distribute lock files alongside wheels, similar to .METADATA (warehouse issue #8254)

    • :white_check_mark: No standards change needed
    • :cross_mark: Manual coordination needed to upload lock files with releases
    • :cross_mark: Requires extra HTTP requests to detect and fetch
    • :cross_mark: Difficult to support named lock files
    • :cross_mark: Requires changes to wheel-hosting services like PyPI or devpi
  • Add lock file URL

    • :cross_mark: Requires additional services to be trusted and available
    • :cross_mark: Would require a metadata version change (unlike the current proposal)

For Later

There are still some open questions if this proposal moves forward. Since the discussion isn’t there yet, let’s keep these as placeholders:

  • What should tools do if a user installs a package with a lock file and later adds another dependency that conflicts with the lock? Should behavior be defined?
  • Will this place additional strain on PyPI? Who should be contacted to coordinate?

Disclosure: I used an LLM to proofread/improve (NOT generate) this post as I am not a native English speaker and a bit dyslexic.

This is the continuation of: Pre-PEP: Include pylock.toml files inside wheels


  1. A word about terminology: A released Python project is a Python package ā†©ļøŽ

  2. Which is very important, e.g. when releasing a scientific paper ā†©ļøŽ

  3. e.g. regarding licenses (which may change between project releases), CVEs, compatibility with CPU flags, etc. ā†©ļøŽ

  4. In an organization I work for, services are deployed within a venv, and it is common for the people responsible for deployment to dynamically install extra packages/debuggers during daily business ā†©ļøŽ

  5. A standard would allow reusable tooling to be developed, while still supporting very project-specific needs ā†©ļøŽ

  6. This is most likely achieved by requiring the wheel to be built as the result of a successful CI/CD pipeline that executed tests with the given lock file ā†©ļøŽ

  7. This is a security requirement. If a user installs a package from a trusted index, they should not unknowingly install dependencies from unexpected sources ā†©ļøŽ

  8. This makes the process fully customizable per project, while still enabling standardized tools ā†©ļøŽ

  9. As discussed in the thread on Locking a PEP 723 single-file script, special attention must be given to Python version compatibility. Wheels are typically built once for a specific Python interpreter. With newer Python versions, users are often forced to build from sdist — which may be difficult or even impossible in some environments ā†©ļøŽ

  10. This is used in one of my organizations and creates many problems. ā†©ļøŽ

2 Likes

Note that I am really not a fan having a lock file supplied as an additional URL.

The environment I am working in (energy sector) is very tight on security. Systems are often not connected to the internet and only very limited services are allowed to be connected from customers to service operators.
So requiring to add an extra service is next impossible.

And I also don’t think of mutability of lock file resources as feature as their whole purpose is to allow for a kind of immutability to increase security.


To continue the discussion of Pre-PEP: Include pylock.toml files inside wheels here a copy from my last post:

I’m also considering reaching out to uv and pipx about supporting lock files included within the package folder itself, not just in the dist-info directory.
(Though to clarify: while there can be multiple package folders, the standard does not forbid placing files in the dist-info folder — only subdirectories are reserved.)


What I still don’t fully understand is:
How would a completely optional standard harm the packaging ecosystem?

I genuinely don’t have your depth of experience, so I’d appreciate any concrete examples or past situations where optional features have caused harm — especially if there’s a story/example you could share. That would really help me understand your concerns better.


From my perspective (as detailed in the revised proposal), there are several strong reasons to include lock files inside wheels:

  • Security & reliability: No dependency on external services during installation
  • Reproducibility: As the build system includes the lock file used to pass tests tests pass, for the most pipelines it is ensured they will work.
  • Simplicity: It’s easy to associate a lock file with the package it belongs to, and tools (as welll as humans) can find and use it without needing additional APIs, tokens, or infrastructure

So far, the only downsides I’ve seen mentioned are:

  • The confusion between ā€œwheelsā€ and ā€œapplicationsā€ (which I’ve tried to address more clearly in the rewritten proposal)
  • Slightly increased wheel size

If I’ve missed any additional concerns about including a lock file in the wheel, or if there are others you’d like to raise, I’d really appreciate if you could point me to them.


P.S. I’m a slow writer — the initial and revised versions of the proposal took me more than 10 hours in total — so I may only be able to follow up or respond toward the end of the week. Thanks for your patience!

3 Likes

OK, so this phrase seems to indicate that the changes as purely optional. I would like to clarify if the following changes are also completely optional (i.e. may not be implemented depending on the tool used):

To be blunt, this is a terrible argument. PyPI is not a secure system, so if you’re that tight on security, you should not be installing from PyPI anyway, but should be using a (curated, security vetted) mirror. And if you’re doing that, you can ā€œeasilyā€ (I appreciate that when it comes to corporate security, nothing is ā€œeasyā€ :slightly_frowning_face:) host lockfiles for the applications you want to support as well.

I appreciate that ā€œwe can’t just install from a random URL on the internetā€ is a real issue in some corporate environments, but I don’t think the Python packaging standards should be designed to work around such ill-advised policies.

Also, if this is your concern, another option is for PyPI to host lockfiles as well as wheels. Then tools could install from the lockfile, without needing that file to be included in the wheel. This better separates the two use cases and makes it easier to audit the lockfile (you don’t need to download and unpack the wheel). I’m not specifically proposing this option, just pointing out that you’ve not really thought through the possibilities, but rather you’ve thought of one specific approach and leapt straight to proposing it as a standard.

Actually, I see that you did consider this option, as ā€œDistribute lock files alongside wheels, similar to .METADATAā€. But frankly, your downsides seem pretty weak. It feels like you simply went looking for reasons to reject this option, rather than giving it serious consideration.

Hosted lockfiles don’t have to be mutable. If they were hosted on PyPI, for example, they wouldn’t be mutable. If they were hosted on github, they could be referenced with an explicit commit ID, and hence would be immutable.

If you publish lockfiles in an insecure manner, they are insecure. This is hardly surprising.

To be perfectly honest, my concern about harm to the ecosystem is largely a ā€œgut feelingā€ based on my years of experience with packaging. That’s not a very good answer, though, so I’ll try to articulate my concerns more concretely. Please understand, though, that these are simply my best attempts to describe my concerns - point by point rebuttals won’t help here, you need to look at the underlying points I’m trying to express.

First of all, the big problem with standards is that once we have them, getting rid of them is next to impossible. If we don’t get the design right first time, we’ll be stuck with something suboptimal. The best example we have of this is the wheel format itself. When we developed it[1], we thought that it was clear and pretty future proof. But we have found over the years that there are limitations, and migrating to a new format is far harder than we expected. To the extent that there’s a significant amount of work going on in the wheel-next project looking at how we can implement a smooth transition to a new version. This proposal isn’t as major as the wheel spec, but on the other hand, it feels like it’s not been thoroughly thought out, so it would be very easy for us to end up with a standard that needs revision or replacement, and that simply isn’t an easy thing to do. Better to wait until we have a better understanding of the problem before committing to a design, IMO.

Also, describing the proposal as ā€œoptionalā€ is naive, I’m afraid. Once anything gets standardised, there’s a serious social pressure to use that solution, and not to look at alternatives or to innovate any more. So even an optional standard stifles progress, by locking us into a design.

My second big concern is that even after all the work that’s gone into packaging, we still don’t have a good understanding of how people package applications. We don’t know the constraints they work under, or the use cases they have. That’s a huge gap in our knowledge, and in my view, until we have a much better appreciation of these points, we’re simply not equipped to know what a good solution would even look like. I don’t want this to sound like some sort of perfectionist ā€œwe can’t do anything until we have the ultimate solutionā€ demand, it’s not that at all. But even incremental improvements need to fit into the overall strategy, and we don’t have a strategy yet, and I’m not even sure we have enough information to formulate one.

It’s worth remembering that the wheel format was designed for publishing Python packages, not applications. There are significant limitations in capabilities that make wheels inappropriate for many applications. There’s no ā€œpost install scriptā€ capability, which would be needed to (for example) register the application with the system (start menu and registry entries on Windows, and similar things on Unix). Similarly, there’s no customisation of uninstalls - it’s just a simple ā€œdelete all the files that got installed and hope that’s enoughā€.

Honestly, if we are serious about distributing applications like this, I suggest that we create a new format, designed explicitly for applications. This ā€œpyappā€ format could include the application code and a lockfile that describes any dependencies. It could have additional metadata to support things like start menu registration, file type associations, additional cleanup to be done on uninstall, etc. I’m not personally convinced such a format would be the right solution, as opposed to integrating with existing application managers like Windows Store, chocolatey, homebrew, apt and dnf - but it seems like a much more reasonable approach than forcing (part of) what’s needed for application installation into the wheel format.

Basically, though, this proposal just feels like a ā€œquick fixā€ rather than a well thought out evolution in how Python packaging supports application deployment.

I’ll also make some specific comments on your new proposal.

Most Python projects (certainly libraries) support use with a range of versions of their dependencies. That’s what the dependency specifiers are for, and it’s necessary if the project is to be installed alongside other projects in the same environment. Locking dependencies is only feasible if you only ever intend to install the project in a dedicated, isolated environment. That’s not the normal situation for the vast majority of Python projects.

This proposal only targets the minority of projects that are designed to be installed into a dedicated environment, rather than alongside other projects. Maybe that’s a reasonable subset of projects to consider, but your motivation suggests this is useful for a much broader range of projects than it is.

These are all use cases for locking, but they don’t demonstrate any need for lockfiles to be included in a wheel. If anything, they suggest that lockfiles can be useful even for projects that don’t include them in the wheel, meaning that the proposal isn’t the right solution.

This suggests a misunderstanding of lockfiles. A lockfile doesn’t ā€œlock a packageā€, it describes an environment, which typically contains many packages. If I do pip lock requests sympy, I get a lockfile which describes an environment containing both requests and sympy. I find it hard to reconcile the fact that you can do this with the view of lockfiles that you seem to hold.

??? So this isn’t a proposal to distribute applications that can be installed via pipx install app? But that’s precisely what you originally claimed it was.

You’ve not given any reason why the lockfile has to be included in the wheel, rather than being published separately. And as I pointed out above, by requiring the lockfile to be in the wheel, you exclude use cases where someone wants a locked installation of a package that doesn’t include a lockfile in the wheel.

The second option is a huge demand on build backends, as you’re requiring them to implement a resolver. IMO, this is an unreasonable thing to expect.

What makes you think that testing is done with the exact requirements in the lockfile? There’s nothing in the proposal that ensures that’s the case. Most projects test with a matrix of configurations, and there’s no reason to assume that this matrix can be captured in a single lockfile.

Without seeing your original, it’s hard to be sure, but overall I don’t think the LLM improved anything. The proposal is in my opinion too wordy, and the arguments given are plausible-sounding but weak, and don’t really stand up to scrutiny (all of which are common problems with LLMs in my experience).


  1. and I was involved in that, so this is first hand knowledge ā†©ļøŽ

9 Likes

Speaking with my PyInstaller hat on, there’s a lot more than one answer to this question. You can ask it to N different people and get N different answers with most of the answerers thinking that their answer is the answer and not one of many. This proposal is a good example of that – how many people do you think would say that application packaging is about smuggling a lockfile into a wheel? :slight_smile:

With that in mind, I do want to encourage some open mindedness towards ideas that either don’t solve all application distribution problems (like your later mentioned start menu items) [1] or aren’t targeting the application packaging that you call application packaging.

That said though, I also don’t see what putting the lockfile in the wheel achieves. The downloading wheels is secure but downloading lockfiles isn’t (where the lockfiles are most likely internally hosted) sounds backwards to me. And given that any generic file storage server (most likely Artifactory) would do, the argument about needing custom hosting/tooling also doesn’t hold any sway with me.


  1. My personal belief is that the Python ecosystem would be better served with different tools for different kinds of application deployment than one be all/end all solution ā†©ļøŽ

8 Likes

+1 on that. And if it seemed like I wasn’t being open to such possibilities I apologise. But I think that to start with, we need at least some broad consensus on what role packaging (including packaging standards[1]) has to play in application distribution.


  1. For example, should we have considered PyInstaller when discussing lockfiles? ā†©ļøŽ

6 Likes

@abravalheri indeed the requires-python change might be harder to make optional. The build system is just the only place that could add such a file, so if it doesn’t implement it, it would be still okay as the existence of the file is optional.

Well actually that is what we are doing: We use own index with self curated packages. That’s why there is

A package installer SHALL require user confirmation if any requirement in the lock file is to be installed from a source different from the one used for the original wheel.

Which gets significantly harder when the lock file is not coming from an index.

The original can be found in the Diff (Rev 5 vs Rev 6) of the first original thread.

This got lost by switching threads:

After reading some of the replies, I realized the discussion has shifted more toward the topic of application deployment.
Because of that, I’ve rewritten the original post (please have another look before continuing the discussion), starting more strongly from the use cases and clearly limiting the scope.

As was pointed out creating an application is a much different problem scope, therefore I revisited the reasons for my idea of adding the lock file to wheel. They are more about defining a common way to be able to distribute/use a lock file to reproduce the environment a package was tested with and therefore is safe to install.

But I feel like we are to many different things are open or understood differently in order to forward with answering that question.

Maybe at a later point some else takes it from here or at least knows what not to do.

I will probably create an internal tool that does include a lock file in wheel and installs it from there, because it really suits my personal use case.
It just seemed straight forward and easy to me, that I thought why not propose it as general way.
At least now I understand why that not as easy as thought.

1 Like

I’m a strong -1 on this idea for reasons others have already mentioned. However, I think if you have the bandwidth to drive a proposal it might be useful to add a new distribution subdirectory inside artifacts that would allow for arbitrary contents such as the lock file you are using: Binary distribution format - Python Packaging User Guide

edit: Alternatively, you could add a subdirectory called tool that would have subdirectories that could be utilized exclusively for various tools. I don’t know which one I prefer.

3 Likes

Rather than create a new standard for embedding lockfiles in wheels, I would personally prefer to understand which other parts of the app deployment ecosystem would benefit.

This new proposal seems to have pivoted from the original proposal of application distribution.

Now it’s back at locking projects, so I’m unsure why VCS or github are not sufficient for these use-cases?

It seems to address the main use-cases in my opinion:

  • data science project - pylock.toml in repo
  • OSS library authors - pylock.toml in repo
  • Companies validate dependencies - pylock.toml in repo, private package mirrors, CI validation
  • Maintainers of python infrastructure - Is this an app deployment use case?
  • Authors of tools - Is this an app deployment use case?
  • Deployment of apps - This is an app deployment use case?
  • Backporting - Feels like robust tests and pylock.toml in repo can get there
2 Likes

Even so I don’t see that it makes sense to continue here I still would like have use case understood:

My use case is not about locking the dependencies of a project I am developing but rather using the pylock.toml of a dependencies in my project.

Let’s say I want to use apache-airflow version 1.2.3 in my project and want to ensure that it works as tested by the apache-airflow developers (i.e. how the pipeline that released it, tested it).
I currently need to find the repo and somehow find the correct lock file related to that version. [1].

After that I found the lock file I need to copy it to my own project and then it should work. But updating/downgrading I again have to find the correct lock file for the given release.

I hope this was understandable.


  1. To be correct currently the apache airflow maintainers provide some pip constrains file ā†©ļøŽ

Do you have examples beyond airflow? Airflow is a bit of an outlier (though popular).

You likely wouldn’t want to use the airflow lockfile, but rather their constraints file as described in the airflow installation guide

pip install "apache-airflow[celery]==3.0.3" --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-3.0.3/constraints-3.9.txt"

2 Likes

As the proposal of this thread seems the wrong way / to premature, I could image doing that if there are more people interested in this.

Like the message if you interested and send me PM if could imagine helping out. Knowing that the process takes long I don’t want to do it alone.

Out of the top of my head no public project only organisation internal ones.

PS: The constrains file is mentioned in my footnote.

1 Like

For the record, although I’m not interested in the proposal itself, I think we benefit from learning about the use cases which motivate it.

Does this strategy work for you today? I wouldn’t expect it to handle a second package.
If you have pkgA and pkgB as dependencies, and want to ā€œimport their locks/constraintsā€, they will quite likely conflict.

I think that there is another layer to unpack here in terms of your requirements. You say you want to get the same environment as what the library maintainers tested, but that’s impossible to guarantee. You could have a different Python version – even a different point release – or be running on a different OS, etc etc. If your tests pass, and your application has its own lockfile, ideally that would be good enough.

I get that you want to be assured – to the best of our ability with existing tools – that you get a working environment. But something is not lining up here for me in understanding what the ā€œingredientsā€ are for the environment you’re building. Is there only one dependency?

6 Likes

I agree that a lock file isn’t a good answer, for many of the reasons already given. I haven’t seen anyone suggest a solution like ā€œexclude all releases from any dependencies done after the package in question was releasedā€ though. That seems like a better thought out answer that addresses the problem of new releases of other packages breaking something, and is implemented by uv’s exclude-newer option which you can give a timestamp: https://docs.astral.sh/uv/reference/settings/#exclude-newer.

3 Likes

FWIW, if app projects want to publish their ā€œas testedā€ CI locks in a way that end users can readily opt in to using, while still clearly separating the abstract dependency declarations from the full transitive lock, then publishing tools adopting a conventional extra name like ā€œapp-lockā€ or ā€œtested-lockā€ seems like a nicer way to do it than embedding additional files that tools have to know how to extract.

1 Like

That doesn’t offer all of the other benefits that lockfiles do - hashes, ability to install without a resolution step, etc.

I still think that the best option[1] here is to set up a means of distributing lockfiles, independent of existing distribution artefacts. Although I don’t see why publishing them on an existing channel like github isn’t enough, people clearly do want some sort of ā€œcentral lockfile indexā€, so maybe there’s enough interest to motivate someone to take such an idea forward.


  1. Although I will reiterate that I don’t think lockfiles were designed as a format for publishing environment (or application) definitions, they were designed as a way of recording them for use in the development process. So it’s only the ā€œbest optionā€ in a qualified sense - I expect there will still be rough edges resulting from trying to use lockfiles in a way they weren’t designed for. ā†©ļøŽ

The sigstore project ships a lockfile: https://github.com/sigstore/sigstore-python/blob/main/install/requirements.txt

This is used to give end users a ā€˜more secure’ installation option (i.e. with version pins and hashes already set): https://www.python.org/downloads/metadata/sigstore/

Sure, but that level of reproducibility isn’t generally appropriate for a publisher to enforce (just because the publisher tested against PyPI or their own private index server doesn’t mean all consumers of that app should do so).

The part that seems most reasonable to me is having a conventional way to publish ā€œtested dependency combinationsā€ for projects that care to do so. I agree extras aren’t ideal for that task, but lock files aren’t ideal for it either (as you mentioned in your footnote, so I don’t think we really disagree here).

I mainly mentioned the extras option because of this rejected alternative listing in the original post:

Putting the transitive lock into an extra ticks at least the second box in that list (since the main dependency set is still abstract), and arguably checks the third one as well (for tools that are aware of any conventions adopted in this space)

It’s unfortunate that a transitive lock extra would have to list direct URLs instead of pinned versions to allow hashes to be included in that approach, but adding hash=<hash-algorithm>:<expected-hash> specifiers (or something along those lines) to the environment marker syntax feels like it would be a smaller change than devising a way to distribute full lock files via package index servers.

1 Like

Further, most projects address this pinned requirements issue by providing a requirements.txt in their repo, and their install instructions are either to clone the repo and pip install the requirements file, or to run a script which does it for them.

Very few have the infrastructure and knowledge that Airflow has to provide a constraints file, which is a tool specific feature.

First the semantics of extras would need to be made more rigorous. Currently it is ambiguous in the spec if the requirement of foo[locked] requires the extra locked to be installed in the user environment, and for at least pip and uv it does not.

Second, this discards all the reasons why locked files were developed and why this isn’t done in the first place, instead of lock files.