Pre-PEP: Recording virtual environment provenance

A recurring problem with virtual environments is that it’s possible for them to become “stale” due to changes made outside the virtual environment:

  • the underlying Python runtime is removed or upgraded/downgraded to a different feature release
  • the environment is copied or otherwise transferred (network drive, USB key) to a machine with a different operating system or CPU architecture
  • the environment was set up for an external software component that is removed or upgraded/downgraded to a different feature release (e.g. CUDA, ROCm)
  • the environment was set up for some other hardware that is no longer present (e.g. an nVidia GPU being replaced with an AMD GPU)

The concept below aims to help mitigate this problem for the first two examples, while leaving room for future extensions that address recording expectations regarding other external hardware and software dependencies.

It doesn’t attempt to specify what consumers should do with the information once it is available, it just aims to define a way for tools creating and manipulating virtual environments to record information regarding the assumptions those tools were making when installing packages.

The proposal also covers a couple of other problems that come up with virtual environments:

  • attempting to manage a single virtual environment with multiple tools, leading to it getting into an unexpected state
  • not having a record of what the expected state of a virtual environment actually is (the package level RECORD files only record what is actually present, not what is expected to be present)

venv-info folder

The first part of the proposal is to standardise on venv-info as a new top level folder in virtual environments to hold information about the assumptions made when installing packages into that virtual environment.

The suggested name is derived from the venv stdlib module and the .dist-info suffix used on package installation metadata folders.

On its own, the folder serves as an unambiguous marker that a folder is a virtual environment (rather than something like a portable base Python runtime, such as those provided by python-build-standalone).

Custom metadata files

To avoid name clashes, all custom metadata files added to venv-info should be prefixed with the PyPI package name of the tool defining the custom metadata file (for example, venvstacks_layer.json).

Standardised metadata files

venv-info/MANAGER

Directly analogous to the INSTALLER file in dist-info directories, this file is for tools that manage virtual environments (like tox, nox, pdm, uv, pipx, poetry, venvstacks, etc) to record that the environment has been created by a specific tool.

This will allow for a better UX when someone does something like run uv lock in a project folder managed by pdm and vice-versa (at least in cases where the .venv/venv-info/MANAGER file already exists)

venv-info/pylock.toml

This internal lock file would record the list of packages that are expected to be present in the environment. Optionally, it may also record the provenance of those packages (by including the details of the wheels or source artifacts that were used to perform the package installation).

These lock files should NOT include any non-empty marker fields, as they’re a record of packages that are actually installed in the environment, not packages which might need to be installed under different circumstances.

venv-info/environment.json

This would be a JSON file, initially defined with a single top-level markers key. This subdictionary would hold a record of the environment markers that were in effect when the environment was set up. The extra, extras, and dependency_groups keys that are used for package selection at installation time are intentionally omitted.

For example:

{
  "markers": {
    "implementation_name": "cpython",
    "implementation_version": "3.13.7"
    "os_name": "posix",
    "platform_machine": "x86_64",
    "platform_python_implementation": "CPython",
    "platform_release": "6.6.87.2-microsoft-standard-WSL2",
    "platform_system": "Linux",
    "platform_version": "#1 SMP PREEMPT_DYNAMIC Thu Jun  5 18:30:46 UTC 2025",
    "python_full_version": "3.13.7",
    "python_version": "3.13",
    "sys_platform": "linux"
  }
}

The top level “markers” key is used to allow us to potentially add more information to this file in the future. Possible additions include more hardware details, operating system details in platform specific formats (e.g. capturing the contents of /etc/os-release on Linux), or other elements used in wheel selection (such as the target macOS version, or the Linux libc compatibility target).

Out of scope

Capturing info for base runtime environments

The environment.json and pylock.toml files would potentially also be relevant for base Python runtime environments.

As Python runtimes across various platforms and use cases have all sorts of conflicting layouts with different rules regarding where data files that need to be modified at installation time are allowed to be stored, defining a location for such files is outside the scope of this potential PEP.

Checking this info on runtime startup

When CPython (or another runtime) detects that it is running in a virtual environment, it could potentially look for and read-in venv-info/environments.json. Alternatively, it could run that check lazily the first time it triggers ImportError (avoiding the performance hit on every startup, while still being able to report that modules might be missing due to an ABI mismatch).

Leaving this out is part of the “not attempting to specify what consumers should do with this information” scope management approach. There are a lot of UX improvements that could be made if the information is made available by virtual environment creation tools. Working out which ones should be made is a sufficiently complex discussion that it isn’t feasible to have it at the same time as standardising the exact information to be recorded (even though it’s useful to speculate on the possibilities when deciding what information should be made available).


While I’ve thought about this problem at various times for a while, actually drafting this pre-PEP was prompted by a recent discussion with @brettcannon, @pf_moore, @sirosen, and @glyph after Glyph pointed out that the wheel variants proposal makes it even easier to run into this environment inconsistency problem.

10 Likes

+1 on the idea in general.

One procedural question I have is whether this would be a packaging PEP or a core PEP. Most of the implications are on packaging tools, but virtual environments themselves are a core feature. Given the traditional reluctance of the core devs to get involved in packaging, I’d be concerned that if we make this a packaging PEP we won’t get the necessary buy-in from the core team to implement and maintain the invariants the PEP specifies. Although in practice, the only real impact on core is that the core venv library must create the venv-info directory (and see below for why I don’t even think that is necessary).

We already have that with the pyvenv.cfg file. I don’t think we need two markers, so I’d stick with the existing one, and allow venv-info to be optional. We may need to formally document that every virtual environment contains a pyvenv.cfg file - the current venv documentation mentions the file, but doesn’t commit to it being a mandatory part of a venv.

As far as the standardised metadata files are concerned, I have some specific questions:

  • venv-info/MANAGER - presumably this is optional, and can be omitted if the tool that created the environment (stdlib venv or virtualenv) doesn’t consider itself a “manager”? In reality, tools will need to be prepared for the file to be missing for backwards compatibility in any case.
  • venv-info/pylock.toml - who would write this? I assume (for example) that pip wouldn’t, as pip doesn’t manage environments. I’m struggling to understand the lifecycle of this file - it seems to me, for example, that any unmanaged invocation of pip would invalidate it.

In general, I agree with the principle of just making the information available without overspecifying what consumers might do with it, but I think we need some exploration of use cases, as tools that might potentially write such files will either need information to decide whether to do so, or justifications if the standard requires them to do so.

3 Likes

Eh, I don’t think that’s really true anymore - between myself & Brett, plus the folks that were already involved in packaging before becoming core devs, the overlap’s at least 5 people at this point (and includes a couple of ex-Steering Council members). Small compared to the total number of active core devs, but more than usually take an interest in the maintenance of any particular stdlib module.

This is the reason I didn’t list stdlib changes as definitely being out of scope: while it wouldn’t make sense for venv to create the MANAGER or pylock.toml files, it could potentially be reasonable for it to create venv-info/environment.json, since it has all the data it needs to populate the “markers” field.

The argument against doing it that way is if we leave it to actual environment managers to handle this, it makes it easier to include additional non-marker fields for different platform like the libc variant, the libc version, and the macOS deployment target (as well as anything that comes up as part of the variants proposal).

For example, one possibility that occurred to me was a top-level "compatibility" field, with subfields keyed by sys_platform (so we can define them clearly as a JSON schema). Then we could have something like:

{
  "compatibility": {
    "linux": {
      "libc_variant": "glibc",
      "libc_version": "2.28",
    }
  }
}
{
  "compatibility": {
    "darwin": {
      "deployment_target": "14",
    }
  }
}

If there’s no “compatibility” key, then the default behaviour would be “install for the running platform environment”.

The stdlib couldn’t reasonably fill out such a compatibility field, while venv managers that support package installation almost certainly could, so I’m personally leaning towards “Make it a Packaging PEP, leave the stdlib out of it, at least for now, and most likely forever”.

Yeah, I didn’t explicitly say that, but everything here has to be optional by definition, as there are so many existing venvs that don’t have them (and so many tools that won’t create them). The actual PEP will need to be more explicit about that (and I like the idea of calling out pyvenv.cfg as a marker that will necessarily exist, since the runtime needs it to locate the base Python installation).

Correct. Once venv or virtualenv set up an environment they’re done with it, unlike the actual management tools that expect to have full control of the environments they create. The existence of MANAGER is meant to indicate “A tool owns this environment, and if that’s not you, you should probably leave it alone”.

The intent of the file is to be able to detect unmanaged invocations of pip by virtue of the divergence between the recorded pylock.toml and the current state of the environment. The internal lock file would be written by commands like pdm sync or uv sync as a snapshot of the external lock file that was used to drive the sync operation (whether that external lock file uses the standard format or not).

That actually makes sense to me, too. I wanted to avoid having to do that, but you’re right that omitting it will raise more questions than trying to come up with something useful.

We can use SHOULD and MAY wording to indicate that tools aren’t obliged to write any of these files, but doing so may enable UX improvements that would otherwise be challenging.

Note: Where the following ideas talk about actively managing an environment, think uv sync/pdm sync/etc. For simple installation into an environment, think pip install/uv pip install/etc. For this purpose, uv pip can be thought of as a separate tool from uv itself (since uv pip is a simple installer, while uv actively manages its environments)

For venv-info/MANAGER:

  • tools that actively manage the virtual environments they create (installing and removing packages, migrating to different Python versions, etc) SHOULD write the venv-info/MANAGER file to indicate that the environment is actively managed by that tool
  • tools that actively manage virtual environments SHOULD check for the venv-info/MANAGER file and emit a warning or error if the environment they’re being asked to operate on is actively managed by a different tool
  • tools that simply install into virtual environments SHOULD NOT check for the venv-info/MANAGER file (as they may be used as installation tools by the higher level environment managers, so checking for the file would pose a compatibility problem)

For venv-info/pylock.toml:

  • tools that actively manage virtual environments and keep them consistent with an external lock file (which may or may not use the standardised pylock.toml format) MAY choose to write an internal venv-info/pylock.toml that records a snapshot of the last synchronised state of the environment
  • tools that actively manage virtual environments, or separate environment auditing tools, MAY provide commands that check whether the state of a virtual environment is still consistent with the snapshot recorded in venv-info/pylock.toml

For venv-info/environment.json:

  • any tool installing into or running software from a virtual environment MAY check for the venv-info/environment.json file, and MAY emit warnings or errors if environment markers that are likely to affect the compatibility of installed packages are no longer consistent with the running environment. In particular, if sys_platform, platform_machine, implementation_name, or python_version are different from those recorded in the environment, any extension modules previously installed into the environment may not work (either at all, or as intended).

(That last section would need more recommendations for installers if we decided to define the compatibility section in the environment file)

2 Likes

Apologies, I was possibly over-reacting[1]. With particular regard to venv, though, I know that @steve.dower has in the past been very reluctant to commit to any details of how virtual environments are implemented, and while there’s good reason in some cases, I think that standardising and documenting the details of the venv layout so that tools can interact meaningfully with them is important. But I shouldn’t assume the worst here.

My main point was that as a packaging standard, do we even have the authority to dictate what the core venv module must do? Because if not, this might need to go to the SC if only to verify that the requirements placed on the core are acceptable.

Here, should we add the possibility of a tool (or series of user actions) that allows recording the current state of an environment into the lock file? I’m thinking of something like

venv/scripts/python -m pip freeze --format=pylock >venv/venv-info/pylock.toml

  1. I’m also a core dev with a packaging interest, although I don’t do much on venv ↩︎

2 Likes

FWIW, I share @ncoghlan’s assessment here - it’s a core PEP. The relevant community is the packaging community (which has a bunch of core devs familiar with & active within as well) and it probably gets a SC decision (which I expect they’ll delegate to someone).

5 Likes

To be clear, I am genuinely undecided, and currently leaning towards making the initial incarnation of the idea a pure packaging PEP. We could then potentially propose stdlib support in a subsequent language level PEP.

However, there’s one piece that means I’m not sure that’s the best approach (emphasis added):

If it’s only commands like pdm run or uv run emitting warnings about potentially incompatible environments and venv isn’t creating venv-info/environment.json by default that’s a lot of potential confusion reduction that we’re going to be missing out on.

At the same time, working out how to incorporate such warnings into the core runtime without making Python startup in virtual environments slower in general is a non-trivial design problem, so putting it in the same PEP that defines how to record the environment state in the first place feels like it could be a significant distraction.

I guess I could draft it as one PEP, and then we could split it later if the language level changes prove sufficiently controversial.

If we did that, I would define a separate runtime.json file, and make it solely contain the environment markers that are provided by the base runtime. environment.json would be renamed to installing.json, as a separate packaging level file of interest only to installers. It would still contain the markers field, to detect cases where the runtime had been updated, but the installed packages hadn’t been checked for consistency with that yet.

The language level changes I would propose:

  • venv records the runtime details when creating environments, and updates them if the environment already exists (emitting a message with the change details in the latter case). Nothing is done with MANAGER, pylock.toml, or installing.json, those remain packaging level concerns
  • when displaying unhandled exceptions, the runtime checks and reports any inconsistencies between itself, runtime.json and installing.json
  • faulthandler, when enabled inside a virtual environment, runs that consistency check on startup rather than waiting for a segfault to occur
  • the API for collecting the data and running the consistency check would most likely live in sysconfig
3 Likes

Would we allow tools to backfill environment.json (or any other name for that idea) in existing environments?
If we do, I start to think about how a tool could post- hoc fill that in. If we don’t, it’s a long wait before that data is universally usable.

The lockfile record might be interesting for pip-sync. We could definitely write it. I can’t quite see what benefit we get from reading it back though. It seems like a good fit, but I think I’d like more guidance on when to read that file.
Would it be read by some other tool when executing in that environment? If this was covered in the thread already, I didn’t see it.

I think this might not give the intended ownership over names. Can we add a venv-info/tool/ directory? Otherwise, whenever trying to standardize new files, we need to consider any existing packages.

I also suggest requiring that the name be required to follow the package name with a dot. So pip-tools.requirements.txt would be allowed, but pip-tools-requirements.txt belongs to a hypothetical pip-tools-requirements package.

The possibility of a well defined space for custom per-tool metadata is the part of this which first caught my eye. I think that will be very fun to play with, and will lead to some interesting new UX to explore.

4 Likes

Aye, we would definitely want to let tools fill in the info for previously created environments, including adding it for underlying venv creation interfaces that don’t populate it (e.g. any existing version of the stdlib venv module).

It does make schema versioning an open question, though (e.g. what if we define new relevant environment markers in the future?)

I like the tool directory idea, so my venvstacks layer example would become venv-info/tool/venvstacks.layer.json. Alternatively, we could make it venv-info/tool/venvstacks/layer.json (with tools creating their own subdirectory rather than prefixing their file names).

Edit, I missed this one the first time:

pip-sync would mostly just write pylock.toml for the benefit of venv auditing (i.e. being able to answer the question “Is this venv still in the state pip-sync put it into?” without having to look back at the input requirements list)

It might make sense to have pip-sync do that audit itself (reading the pylock.toml back, and emitting a warning like “Environment state no longer matches previous sync state” if that’s the case), but that depends on how common it is for people to intentionally mix pip-sync with ad hoc package installation (if mixing is common, an implicit audit would give too many false alarms to be helpful).

1 Like

idea itself seems fine, it’s just a cannonical place to store such expectations,

Small worry about the direction here in particular:

pypi isn’t the only index in many scenarios, maybe just “package name” here, there shouldn’t be multiple packages by the same name in an environment anyway.


Beyond that, I have a feeling this isn’t going to accomplish what it sets out to do without some other surrounding ecosystem work. There’s quite a few things not listed in your example of markers that can break a venv that fall under the first two, but I don’t think that’s a problem this proposal has to solve now to be of value, this is just laying the foundation for a place to put the info, there’s plenty of room to then cooperatively explore what info tools need, and the room for those tools to do much of that without any further specification required.

3 Likes

Or simply have a single packaging PEP, and as part of the pronouncement process, refer it to the SC for their approval of the core-specific parts. Or just ask the SC to confirm that they are OK for the packaging PEP delegate to be PEP delegate for the core elements of this PEP as well.

I don’t think we need to make this too process-heavy, as long as we make the SC aware that unlike other packaging PEPs, this one has core implications.

4 Likes

We already apply the “name reserved on PyPI” rule for the tool table in pyproject.toml.

Private projects are technically free to ignore the rule in both cases, since it’s only in public distribution that the risk of name clashes becomes a genuine concern.

1 Like

There are other public indexes. If private projects are free to ignore this rule, What should a public package hosted on another package registry do here?

I think this language reinforces thinking about PyPI as if it is the only index when it’s strictly unnecessary to do so here.

1 Like

Projects that wish to reserve a space in the tool folder need to register the name on PyPI. As noted earlier, this is the same rule as is used for [tool] in pyproject.toml: pyproject.toml specification - Python Packaging User Guide

If you would like to change that approach, it will need to be its own PEP, as I’m not going to propose a different scheme in this one.

2 Likes

For the record, I’m not against an alternative approach to name reservation. But finding one that

  • Preserves existing registrations via PyPI
  • Guarantees that a reservation is exclusive
  • Doesn’t suck (no Java style reverse-dns names, please!)

will be pretty hard. It’s not like this wasn’t discussed when the existing scheme was chosen.

But as @ncoghlan said, it’s a separate PEP, so please take any further discussion to a new thread.

3 Likes

In general, :+1:. I think it’s an improvement over what we have now, though I’m not sure if it addresses the same concerns as I was thinking of. In particular, it seems to be focused on the idea of virtual environments being “managed”, rather than just created and then effectively modified by installers via pip install … and so on. Perhaps that is sufficient for the relevant use cases.

When I’ve seen the issue initially mentioned for the wheel variants proposal, my thinking was more oriented at having logs of what installers do — while definitely far from perfect, knowing that a particular package was added or removed at a particular time can definitely help figuring out what may have caused a regression. Do you think it would make sense to add such a log to this proposal, or is that beyond the scope?


vs.

I think this is a bit inconsistent — we assume that pylock.toml would be updated whenever the virtual environment state changes, but environment.json would continue holding the initial environment markers, even though they would possibly be outdated (e.g. because glibc or the kernel changed. Perhaps environment.json should be updated when pylock.toml is; presumably the environment manager will be reevaluating all the requested packages based on the current environment markers anyway.

2 Likes

FWIW, this sounds to me like a new tool first (unless the design is based on an existing tool that has proven itself and I just missed all the references to it). I don’t see anything here that requires changes to core, even if such changes may (one day) improve the performance of it (e.g. a venv-like tool doesn’t have to use the real Python binary, so it can use extra checks if its users want them before it loads/launches the actual runtime).

Without a working implementation (not a prototype: a real, production-ready, in-active-use implementation), this concept is going to struggle with YAGNI and/or DRY. Putting it straight into core is all but guaranteed to ingrain what turn out to be poor decisions made without sufficient context or experience. It can absolutely prove itself as a separate tool first.

2 Likes

That’s a big chunk of why I was personally leaning towards “Packaging PEP first, maybe native integration later”.

I do think we want to consider the implications of potential future native integration, but design the proposal so it’s still useful even if that never happens.

1 Like

I’ve been thinking about the pylock.toml file idea a bit more:

  • as a record of what was actually installed (whether generated by a tool’s sync operation, or an environment scanning operation like pip freeze), it should record an exact Python version pin rather than just a lower bound (in addition to omitting any marker fields)
  • I’m wondering if we might also want to allow tools to optionally save an environment level RECORD file that listed the other metadata files, along with the RECORD files of any installed packages. I’m not sure the extra complexity would be worth it though, since we’re only trying to detect inadvertent modifications with a simple installation command, which a lock file would be sufficient to detect

I’ll call YAGNI[1] on the latter idea for now, though, and instead list it as a deferred idea rather than make it part of the initial proposal.


  1. You Ain’t Gonna Need It ↩︎

1 Like

The things I spotted were:

  1. environment.json being set up when the environment was created. Given that the stdlib venv is what creates environments, I assumed that would be done by venv. If the idea is that this file is “a file that a tool can create immediately after creating the environment”, then it’s not a core matter. Although that would require the tool to run an additional Python process[1] to get the information that venv has immediately to hand. And it weakens the proposal if environment.json is only present for managed environments.
  2. The formalisation of “how to recognise a virtual environment”. Whether that is the presence of a venv-info directory or the presence of pyvenv.cfg, this should be documented in the core docs, and it represents a commitment that core needs to maintain. We could just omit this, though (although I like the idea of having it).

We do need to be careful not to water this down to the point where it isn’t even a useful packaging standard, though. The impact on core isn’t particularly because there’s any special core features in the proposal - it’s mainly just that it’s a packaging standard, and the core venv module is a packaging tool[2], that we’d expect to adhere to those standards.


  1. Or use a custom venv builder class ↩︎

  2. In fact, it’s the reference implementation of virtual environment creation, used by essentially all other tools ↩︎

1 Like

While these are both true, they’re also necessarily going to be true for any environment running an older Python version (since the earliest the stdlib venv would start creating any new marker files would be with the 3.15 release next year).

With at least 3.10 → 3.14 running in that “venv not directly participating” state until they’re end-of-life, it makes sense to me to explicitly cover that case separately from the “What more can we do if venv and other parts of the language runtime are actively participating?” case.

Assuming you and Bernat (wearing your virtualenv maintainer hats) agree the extra files are worth writing, it wouldn’t quite be splitting the proposal into a “managed environments proposal” and an “unmanaged environments proposal”, since virtualenv would fill the gap until we had consensus for venv to emit them natively.

3 Likes