Making venvs relocatable friendly

It’s possible to make virtual environments portable if you place a bunch of restrictions on how they’re deployed, such as:

  • always using a peer installation of python-build-standalone as a runtime layer (or some other runtime that’s decoupled from system Python installations)
  • disallowing direct wrapper script execution without activating the venv first
  • disallowing both direct wrapper script execution and venv activation (requiring the use of module execution instead)
  • ensuring (or at least assuming) there aren’t any absolute paths in data files or symlinks
  • ensuring any compiled Python files generated in the old location are destroyed and recreated in the new location

conda-pack aims to do that for conda environments. My own venvstacks work project imposes these kinds of limitations to ensure environments can be successfully deployed with a few postinstallation tweaks rather than a full reinstall on the target system.

Having to regenerate pyvenv.cfg in the post-install is mildly irritating, but it’s far from being the trickiest problem in making redeployment to a different location work. So while we could make the startup calculations in CPython even more complicated than they already are, I don’t think it solves enough of the venv portability problem to be worth proposing as a standalone change.

2 Likes

It’s possible to make them portable if <5 things>

Your overall point that there are certain constraints imposed is true, but I’d quibble over some in that list. In any case, Bazel projects are built upon things being hermetic, isolated, relocatable, etc, so those sort of constraints are part and parcel and not barriers.

Having to regenerate pyvenv.cfg post-install is milding irritating, but far from the trickiest part
Make the startup complications even more complicated

pyvenv.cfg is the only remaining problem and it’s much more than just “mildly irritating”, insofar as Bazel projects are concerned (see further down).

The logic for this is already in Python’s startup, too. What prompted this thread was a regression we noticed in 3.14 beta:
venv using symlinks and empty pyvenv.cfg isn't recognized as venv / able to find python home · Issue #135773 · python/cpython · GitHub, which regresses the functionality Bazel projects require for effective venvs. In the bug, I go into detail about the minor change need to fix it.

Like I said, it already works today (i.e. rules_python and its users use this behavior), and has been working for about a year, and works back to, I think Python 3.10.


regenerating pyvenv.cfg post install
reproduce venvs on demand. if you can’t, that’s the real problem
If the packaging is going to bundle the Python runtime itself does it even need a venv?

My first post explains this lightly, but it’s probably time to restate it and expound a bit about why modifying pyvenv.cfg, or creating one later, isn’t possible.

I don’t want to go into Bazel-isms too much, but I’ll have to a bit. The two important things to keep in mind are that remote machines may create files you (or another remote machine) use (e.g. pyvenv.cfg), and that the outputs are immutable (e.g. you may be reading a cached, shared, artifact from a network filesystem).

First: there isn’t an install or setup phase, not in the typical sense. There’s just e.g. bazel build //:my_test. Running that command handles what e.g. docker bla; python -m venv; bin/activate; pip install would typically do. The output of this command is a directory tree of everything necessary and a wrapper to run the program. For Python, that means a venv and supporting files. You can then do things like bazel-bin/my_test to directly run it locally, or bazel test //:my_test to e.g. concurrently run 100 test variations on a remote cluster.

That command isn’t a simple “helper shell script” to do those tasks, though. It’s a distributed system optimizing to minimize the amount of repeated work, maximize the amount of parallelism, and prevent “it works on my machine but not yours” types of problems. In order for this distributed system to realize those benefits, it uses lots of isolation, remote workers, caching, and sharing of artifacts.

The implication of this distributed-ness, caching, and sharing, is outputs can’t be machine-specific or modifiable. If it wrote /home/user1/bin/python to pyvenv.cfg, then it wouldn’t work on user2’s machine and it and anything downstream of it can’t be cached and reused. By not modifiable, I don’t simply mean “doesn’t have +w”, I mean the files and directories you see locally may be a symlink to a network filesystem of a cached artifact and you simply cannot modify it. Even if it was modifiable, that won’t work because such a modification is local to your machine, and can’t be passed along to remote workers. This is important because, going back to the example of concurrently running 100 test variations, it allows workers to pull everything from cache and the only setup cost they really pay is execv() with different args.

Second, the “packaging” phase is usually very simple. e.g. tar’ing up the outputs and untar’ing them later. There’s no need for domain-specific (e.g. python) installation logic because all the heavy lifting (e.g. creating a venv for a python program) was handled earlier and is already relocatable. In the Bazel-verse, this isn’t unique to Python – all the languages act similarly.

“So just create a venv at runtime in a temporary directory / at runtime”

This is surprisingly non-trivial. A sibling Bazel project to mine actually started with that idea, and ultimately abandoned it. Ensuring cleanup of the temp files is problematic. Subprocesses interfere with signals. Race conditions invoking the same program twice occur. Platform-specific differences with shells and coreutils crop up. The OS reaping /tmp while the program is running. It’s just an endless variety of headaches.

It is so much vastly simpler, correct, and reliable when Python itself recognizes its in a venv and is able to find its home without having to create a bunch of file system state tied to the program’s life cycle.


there’s two definitions of relocatable being talked about

Hah, yeah. In retrospect, I probably should have used a different term to narrow the discussion scope. Ah well.

How the python -m venv tool (or similar) behaves isn’t too interesting to my environment (we don’t use it); essentially rules_python implements its own venv management tooling. Hence why I’m more interested in e.g. specifying how pyvenv.cfg gets understood by the runtime, since we otherwise control the rest (or nearly so).

Is this bundling a runtime with multiple venvs that need to be independent of each other?

Yes, that’s something people could do, if they wanted. A variation of this is one program might depend on another program with a different set/version of dependencies or Python version, and it’ll nest without issue. But to be clear, insofar as Bazel projects are concerned, this isn’t special, and doesn’t require any special support to make work. It simply Just Works because programs are more intrinsically isolated and relocatable under Bazel.

If you’re including a Python runtime that you’ve built can you modify it to do this already?

No, because the runtime used is decided by the user. “Built” here doesn’t mean “everything is built from source”, it more means, “collecting all the artifacts and laying them out in a directory tree”. Bazel and rules_python provide abstractions so users can wire in an arbitrary python runtime, be it python-build-standlone, from source, the system, or otherwise.

As a general principle, I don’t think that features should be added to Python solely to support one particular tool and its behavior. The language should be providing general features that have applicability in a broad sense. So your statements that “Bazel needs this” aren’t very compelling to me, no matter how important this is to Bazel. I appreciate that you may know of no other use cases (and indeed, maybe there aren’t any other use cases) but you should still base your arguments on more general principles than “what Bazel does” (at least if you want to convince me - I can’t speak for others).

As was noted in the linked issue, you have been relying on undocumented (and hence unsupported) behaviour. If the behaviour mattered to you, you could, and probably should, have raised a request at some point for the feature to be documented (and therefore protected from sudden changes without a deprecation and discussion). By your own admission, you’ve had since Python 3.10 to do this…

It appears that we made the change back in November, specifically to get feedback as early as possible on any impact from the change. It’s unfortunate that you’ve not identified and reported the issue before now - I suspect it’s far too late to revert the whole of the change you claim caused the problem, and there’s vey little time to develop any other fix for the regression.

I’ll further note that you don’t seem to be asking for the regression to be reverted, but rather for a new feature (interpretation of relative paths in pyvenv.cfg) - this is clearly not going to happen before Python 3.15 (3.14 is closed for new features at this point) so what do you plan on doing for 3.14?

I appreciate that you’re in a very difficult position here, but I don’t think there’s much chance of you getting a fix in Python 3.14 at this late stage, so you need to be prepared for that.

Yes. All of this is in fact nothing to do with “relocatable venvs” in any practical sense. Even if we accept that you mean “relocatable as long as you move the base Python along with the venv”, I don’t think that just interpreting relative paths is enough to achieve that. It may be enough for your purposes, but as I say, with language features we need to look beyond individual use cases to provide features that are generally applicable.

Maybe we can develop some concept of “a venv that is relocatable before you install any 3rd party packages into it” and make that useful while dumping the problem of either maintaining or not supporting relocatability when pagages get installed onto the packaging ecosystem. I don’t personally think that’s particularly helpful to our user base, but maybe there’s a solution here. But it’s a long way from a “quick fix”, and even if we did do that, I’d strongly argue for such “relocatable venvs” relying on the base interpreter not being relocated with the venv (as that’s what the overwhelming majority of our users would expect).

Going back to your request, maybe someone would be interested in supporting an undocumented[1] change to pyvenv.cfg processing to interpret relative paths as relative to the location of pyvenv.cfg rather than to CWD. I personally think it’s far too risky at this late stage in the 3.14 cycle, but I’m not the one who would be making the change, so it’s not my call. But even if that did happen, you’d still be basing your code on undocumented behaviour, and I hope that if nothing else, you’ve learned what a bad idea that is in practice. So you still need an exit strategy to a supported approach.


  1. Because a documented change would definitely be a new feature, albeit an obscure one ↩︎

2 Likes

I don’t think that features should be added to Python solely to support one particular tool and its behavior. The language should be providing general features that have applicability in a broad sense. So your statements that “Bazel needs this” aren’t very compelling to me, no matter how important this is to Bazel.

I tend to agree, hence why I’ve avoided talking about Bazel as much as possible. When people ask why it can’t be done another ways (e.g. generating pyvenv.cfg later), I have to give some details to explain why that won’t work in my environment.

That said, the feature under discussion – allowing a relative path for home, and having it relative to pyvenv.cfg, not CWD – isn’t particularly onerous.

Regarding general principals: the general principal I’m saying is: associating a venv to its python runtime shouldn’t be machine/user specific. Similarly, we can rephrase it in the terms you used: the home key being required to be present and/or an absolute path is not a broadly applicable behavior.

Why? Because creating a venv isn’t cheap. Sure, it’s cheap in the sense that a bare-minimum venv is simply a pyvenv.cfg file and bin/python3 symlink, but that’s not hugely helpful. In practice, it also means processing requirements.txt and populating additional files, which is expensive.

Someone linked to uv’s relocatable feature, which sounds pretty neat. I think its evidence I’m not alone in this sort of opinion that vevns be more portable/relocatable, by some definition.

Another general principal is: the smaller the install phase and less user/machine-specific points you have, the more reliable of software you get.

Finally, I’ll turn the question back: Why must the Python home path be absolute? I could, for example, create a top-level directory in my venv called “pypi_libs”. Then in $venv/lib/pythonX.Y/site-packages populate relative symlinks pointing to e.g. ../../pypi_libs/somepackage, and this will work just fine. The pyvenv.cfg home key being an absolute path requirement prevents doing this for the Python runtime itself, though. The net effect is, if you want to checkout and start working on the project, you’ve got to do extra setup, and hopefully our machine states are compatible.

As was noted in the linked issue, you have been relying on undocumented (and hence unsupported) behaviour.

As as I describe in the issue, I think calling it “unsupported” isn’t accurate. There’s a lot of existing code to make a pyvenv.cfg without a home key work.

By your own admission, you’ve had since 3.10 to do this

Not quite. Getting Bazel + rules_python + venvs working is something that’s only come about in the last year or so, and it was only until the last month or so we considered all the bugs shaken out.

The code path in the Python runtime being relied upon goes back to 3.10. Specifically, the logic for how an empty pyvenv.cfg file is handled.

it’s far too late revert the whole change, and very little time to develop a fix.

I’m not asking for the whole change to be reverted. By my estimation, the fix is one line, as I described in the issue. I actually have a pending change, with test, about ready.

I’ll further note that you don’t seem to be asking for the regression to be reverted, but rather for a new feature (interpretation of relative paths in pyvenv.cfg ) - this is clearly not going to happen before Python 3.15 (3.14 is closed for new features at this point)

Not quite.

In the context of that issue, I’m asking for the 3.14 regression to be fixed by restoring the 3.13 behavior. Like I said above, this doesn’t look onerous. In the PR of the change, there was some concern that ignoring pyvenv.cfg without home keys would cause issues.

In the context of this thread, I’m trying to find a path forward that doesn’t require absolute paths be used in pyvenv.cfg. From looking around at other PRs and issues while investigating that change that broke that behavior, I very much get the impression the intent is to eventually require that pyvenv.cfg have a home key and that the value be absolute. At which point, well, Bazel based projects are more-or-less entirely prevented from using venvs. Which you may not care about, but its

so what do you plan on doing for 3.14?

You mean, assuming the 3.13 behavior isn’t restored? I don’t know. More code spelunking to find some sort of work around, I guess. And probably tell our users they’re in for a rough/impossible time.

2 Likes

But note that that is the meaning of a relocatable venv that everyone else assumes (entrypoints find the virtual env’s python relatively, path to the original interpreter is still absolute), not the one that you’re asking for (which is the polar opposite).

Can I ask why exactly Bazel doesn’t allow absolute paths? Are these virtual environments really being moved or distributed across machines or is it just an extreme policy applied to all build collateral in pursuit of reproducible builds? Either way, why aren’t the absolute paths in the bin/Scripts entry-points also an issue?

Yes they may be moved across machines. Bazel has distributed caching and files produced may be copied over to another machine as is.

To make matters worse it’s possible that the files are some mounted network backed ones that could end up shared across multiple machines so even if you allowed some kind of post write hook it still would not work out well.

2 Likes

This is actually why I’m broadly in favour of the idea. As far as I’m aware, the main reason for requiring them to be absolute is because allowing them to be relative would have been a nightmare to support in CPython given the state of the sys.path initiatilisation code at the time. (I checked PEP 405, and whether the home key should be allowed to be relative apparently didn’t come up at the time)

Thanks to the subsequent clean up work, and the effort to migrate the initialisation to frozen Python code, I don’t believe that motivation holds anymore.

It would need a PEP to make it official, though, since CPython wouldn’t be the only implementation affected.

5 Likes

If it helps, this is also how I think about the problem. Disclaimer: I’m also interested in this functionality from a Bazel perspective, but also in non-Bazel contexts such as nixpkgs and container runtimes.

I really also enjoy referencing this from @brettcannon How virtual environments work

Really what may be needed here isn’t strictly a “venv”, but a hook or an easier way to launch a standalone python “site” or “home”. If a “venv” can be used, and it’s a practical path forward to unlock utility here, then great! But if there’s another approach that isn’t a “venv”, thats ok too. But PYTHONHOME isn’t a solution at the moment either.

Just as python-build-standalone created some custom cpython builds that make it possible to relocate a Python installation. It comes with various caveats, but in general, it works as advertised for 80%+ of scenarios and is leveraged by uv and a few other tools due to the utility that is unlocked when you are able to easily copy around python interpreters.

A true “venv” was originally created as a development tool to make it easier to work on numerous python projects without polluting the global (or user) python installations. It became a very popular tool and started to be used to run and install python projects in CI and in containers. In the container scenario, it’s actually not even recommended to use a “venv”, but there are still many who do this. Similarly, many also believe that a venv needs to be “activated” which is also not entirely true in general. The “activate.sh” scripts are intended for the interactive terminals in development scenarios.

What would be really nice would be a mechanism to create a “site” like the user site, but without making modifications to a python installation.

Essentially a solution that delivers on:

  • Bring Your Own Interpreter
  • Bring Your Own “site” (site-packages and friends)
  • At runtime, the interpreter is bootstrapped and the majority of things work as expected in terms of Python programs (the onus is on the user to ensure that the BYO “site” is relocatable and that script shebangs etc are either ignored or rewritten etc)

On the surface, this looks a LOT like a “venv”, but it also looks a bit like a global or user site installation.

I would see this being useful in scenarios like:

  • Container runtimes that execute Python applications
  • Build systems that compose discrete Python interpreters with packages on disk (site-packages)

It might be that a “venv” can remain a “developer side” tool and a “site” is a runtime tool. However, the differences do seem superficially shallow to me. But just sharing some thoughts in case that it helps.

3 Likes

I’ve started drafting a PEP for this at Comparing python:main...rickeylev:relative.venvs · python/peps · GitHub (RTD preview can be seen via pep pr PEP xxx: relative virtual environments, initial draft by rickeylev · Pull Request #4476 · python/peps · GitHub )

I’m also looking for Sponsor.

If someone is interested in contributing to the PEP, get in touch with me (Discourse DM, or some way via GH). If you’re a venv tool or package installer owner, then your thoughts are definitely appreciated.

(I’m not sure if this should be one pep or multiple peps (core runtime, stdlib venv relative-venv, informational packaging pep); for now I just put it in one)

are they really being copied between machines.

Yes, exactly what Mehdi says.

Either way, why aren’t the absolute paths in the bin /Scripts entry-points also an issue?

Those, too, would be an issue, yes. This is something package installers would have to handle. uv does some polyglot shell tricks for this, for example. (I think a native polyglot binary would be ideal, but shell works pretty well as an alternative).

the differences between developer venv and deployment site are superficial

Yeah, this captures my thinking as well. If you’re deploying an app, then two of the biggest things you’re concerned about are the python runtime and the installed packages, both of which are issues venvs are designed to handle. So, why not just use a venv? I also think there’s inherent value in the development environment closely matching the deployed environment.

(For context/disclaimer/transparency: Greg is involved with and uses rules_python.)

1 Like

Sorry, I remain a strong -1 on this idea, to the extent that (as a pip maintainer) I’d want pip to not support the part of the proposal relating to installer behaviour, even if it gets approved[1]. I’ve already pointed out that the virtualenv-clone project documents that there is a lot more complexity to making a venv relocatable than the simplistic suggestions in this PEP, so I don’t want to get pip sucked into implementing a half-solution, even on a basis of “the standard made us do it”.

If you want to strip the proposal back to only the part about allowing a relative path in the home key, I’m -0 on that part of the proposal (I don’t think it’s a good idea, but I don’t care enough to argue about it).


  1. I’m only one of a group of maintainers, so it’s possible I’d get overridden by the other maintainers, but I wouldn’t go down without a fight… ↩︎

1 Like

I agree with @pf_moore that the initial PEP should be scoped to just the interpreter startup behaviour. That’s a clearly defined improvement that may get some “Why bother?” reactions, but is unlikely to get anyone to argue too hard against it.

Logistically, it also makes it a pure interpreter PEP that doesn’t need to get the packaging tool developers involved in reviewing it.

I would sponsor a descoped PEP (since this would simplify a few things in venvstacks, too), but not the currently expansive version.

For the packaging side, shebang lines are already a torturous platform specific mess, and that’s before we get into the complexity of platform dependent binary formats and other details (for example, a Linux venv isn’t going to work on even macOS, let alone Windows). When a tool like bazel or venvstacks or conda-pack handles setting up an environment for portability, then it can specify the constraints it is imposing to achieve that, but a fully general portability mechanism isn’t feasible - there are some things you can do that are fine when venvs are machine specific but need to be avoided when aiming for portability.

Pragmatically, it might be helpful for the install package to offer some additional tooling for editing shebang lines (and other files) while keeping RECORD files consistent, but that’s a feature request and PR, not a PEP.

(Terminology note: I favour “relocatable” for moving venvs around within a machine, “portable” for moving them between machines running the same platform, and “cross-platform” for pure Python code that can run on multiple platforms. Cross-platform venvs are unlikely due to the layout differences)

7 Likes

Thank you, @ncoghlan ! I’ll update the PEP text and continue working on the pep PR.

4 Likes