It’s possible to make them portable if <5 things>
Your overall point that there are certain constraints imposed is true, but I’d quibble over some in that list. In any case, Bazel projects are built upon things being hermetic, isolated, relocatable, etc, so those sort of constraints are part and parcel and not barriers.
Having to regenerate pyvenv.cfg post-install is milding irritating, but far from the trickiest part
Make the startup complications even more complicated
pyvenv.cfg is the only remaining problem and it’s much more than just “mildly irritating”, insofar as Bazel projects are concerned (see further down).
The logic for this is already in Python’s startup, too. What prompted this thread was a regression we noticed in 3.14 beta:
venv using symlinks and empty pyvenv.cfg isn't recognized as venv / able to find python home · Issue #135773 · python/cpython · GitHub, which regresses the functionality Bazel projects require for effective venvs. In the bug, I go into detail about the minor change need to fix it.
Like I said, it already works today (i.e. rules_python and its users use this behavior), and has been working for about a year, and works back to, I think Python 3.10.
regenerating pyvenv.cfg post install
reproduce venvs on demand. if you can’t, that’s the real problem
If the packaging is going to bundle the Python runtime itself does it even need a venv?
My first post explains this lightly, but it’s probably time to restate it and expound a bit about why modifying pyvenv.cfg, or creating one later, isn’t possible.
I don’t want to go into Bazel-isms too much, but I’ll have to a bit. The two important things to keep in mind are that remote machines may create files you (or another remote machine) use (e.g. pyvenv.cfg), and that the outputs are immutable (e.g. you may be reading a cached, shared, artifact from a network filesystem).
First: there isn’t an install or setup phase, not in the typical sense. There’s just e.g. bazel build //:my_test. Running that command handles what e.g. docker bla; python -m venv; bin/activate; pip install would typically do. The output of this command is a directory tree of everything necessary and a wrapper to run the program. For Python, that means a venv and supporting files. You can then do things like bazel-bin/my_test to directly run it locally, or bazel test //:my_test to e.g. concurrently run 100 test variations on a remote cluster.
That command isn’t a simple “helper shell script” to do those tasks, though. It’s a distributed system optimizing to minimize the amount of repeated work, maximize the amount of parallelism, and prevent “it works on my machine but not yours” types of problems. In order for this distributed system to realize those benefits, it uses lots of isolation, remote workers, caching, and sharing of artifacts.
The implication of this distributed-ness, caching, and sharing, is outputs can’t be machine-specific or modifiable. If it wrote /home/user1/bin/python to pyvenv.cfg, then it wouldn’t work on user2’s machine and it and anything downstream of it can’t be cached and reused. By not modifiable, I don’t simply mean “doesn’t have +w”, I mean the files and directories you see locally may be a symlink to a network filesystem of a cached artifact and you simply cannot modify it. Even if it was modifiable, that won’t work because such a modification is local to your machine, and can’t be passed along to remote workers. This is important because, going back to the example of concurrently running 100 test variations, it allows workers to pull everything from cache and the only setup cost they really pay is execv() with different args.
Second, the “packaging” phase is usually very simple. e.g. tar’ing up the outputs and untar’ing them later. There’s no need for domain-specific (e.g. python) installation logic because all the heavy lifting (e.g. creating a venv for a python program) was handled earlier and is already relocatable. In the Bazel-verse, this isn’t unique to Python – all the languages act similarly.
“So just create a venv at runtime in a temporary directory / at runtime”
This is surprisingly non-trivial. A sibling Bazel project to mine actually started with that idea, and ultimately abandoned it. Ensuring cleanup of the temp files is problematic. Subprocesses interfere with signals. Race conditions invoking the same program twice occur. Platform-specific differences with shells and coreutils crop up. The OS reaping /tmp while the program is running. It’s just an endless variety of headaches.
It is so much vastly simpler, correct, and reliable when Python itself recognizes its in a venv and is able to find its home without having to create a bunch of file system state tied to the program’s life cycle.
there’s two definitions of relocatable being talked about
Hah, yeah. In retrospect, I probably should have used a different term to narrow the discussion scope. Ah well.
How the python -m venv tool (or similar) behaves isn’t too interesting to my environment (we don’t use it); essentially rules_python implements its own venv management tooling. Hence why I’m more interested in e.g. specifying how pyvenv.cfg gets understood by the runtime, since we otherwise control the rest (or nearly so).
Is this bundling a runtime with multiple venvs that need to be independent of each other?
Yes, that’s something people could do, if they wanted. A variation of this is one program might depend on another program with a different set/version of dependencies or Python version, and it’ll nest without issue. But to be clear, insofar as Bazel projects are concerned, this isn’t special, and doesn’t require any special support to make work. It simply Just Works because programs are more intrinsically isolated and relocatable under Bazel.
If you’re including a Python runtime that you’ve built can you modify it to do this already?
No, because the runtime used is decided by the user. “Built” here doesn’t mean “everything is built from source”, it more means, “collecting all the artifacts and laying them out in a directory tree”. Bazel and rules_python provide abstractions so users can wire in an arbitrary python runtime, be it python-build-standlone, from source, the system, or otherwise.