Virtualenv 20.0.0 beta1 is available

bernatgabor · January 28, 2020, 12:16pm

Why the major release

PEP-405 defines Python virtual environments, and people generally tend to believe it’s a solved problem set. I am a maintainer of the tox project, in which step 1 is usually creating virtual environments. Drawing from two and a half years of maintainership of that project, I identified three main pain points:

Creating a virtual environment is slow (takes around 3 seconds, even in offline mode; while 3 seconds does not seem that long if you need to create tens of virtual environments, it quickly adds up).
The API used within PEP-405 is excellent if you want to create virtual environments; however, only that. It does not allow us to describe the target environment flexibly or to do that without actually creating the environment.
The duality of virtualenv versus venv. Right, python3.4 has the venv module as defined by PEP-405. In theory, we could switch to that and forget virtualenv. However, it is not that simple. virtualenv offers a few benefits that venv does not, and I’ve talked about this in my EuroPython presentation:
- Ability to discover alternate versions (-p 2 creates a python 2 virtual environment, -p 3.8 a python 3.8, -p pypy3 a PyPy 3, and so on).
- virtualenv packages out of the box the wheel package as part of the seed packages, this significantly improves package installation speed as pip can now use its wheel cache when installing packages.
- You are guaranteed to work even when distributions decide not to ship venv (Debian derivates notably make venv an extra package, and not part of the core binary).
- Can be upgraded out of band from the host python (often via just pip/curl - so can pull in bug fixes and improvements without needing to wait until the platform upgrades venv).
- Easier to extend, e.g., we added Xonsh activation script generation without much pushback, support for PowerShell activation on POSIX platforms.

In 2018 October, I’ve also become a maintainer of the virtualenv project. The project was in on life-support for years and a good reason. We had much code that did not have any tests, and the project is a single file runtime (with 2.6k lines of code, plus a whole lot code that’s embedded as base64). It’s a long script in which at various points multiple if/else statements try to cater the logic to the current interpreter type and platform. This made adding Jython, IronPython, or even improving PyPy support very hard. There have been a few rewrites that attempted to fix this; notably, Donald Stufft got reasonably far. Nevertheless, the creators of these eventually moved on to other projects, so the rewrites never got promoted.

What changed

I present to you my attempt at the rewrite. Initial design goals were published under this RFC issue. The goal of the rewrite was not just to improve ease of maintenance of the project, but also address most of the above pain points:

Moved away from the single file format: allows separating virtual environment creation logic per target interpreter (CPython 2, CPython3, PyPy2, PyPy3 supported).
Python 3 virtual environments created by virtualenv are now pyenv.cfg based (in essence, they are equivalent to venv).
Python 2 virtual environments created by virtualenv are now pyenv.cfg based. Instead of injecting our own site.py now, we only add a slight shim site.py that fixes the sys.prefix and sys.exec_prefix by reading pyenv.cfg and then delegates site-packages setup by triggering the import of the host site.py). Python 2 virtual environments now look a lot like Python 3 venv as a side effect. CPython 2 might now be EOL, but this is very handy for PyPy2 still supported, and for anyone still stuck on CPython 2 for any reason.
Add a venv based creator, this in essence, delegates virtual environment creation to the target pythons venv module (note we still control activation script generation, pip/setuptools/wheels seeding). Default is still the builtin method, but one can select this mode via the --creator venv (mostly because calling processes can be expensive, especially on Windows).
CentOS/Fedora pythons supported (all other platforms should be too, now we no longer assume via if/else what the platforms folders are but instead use sysconfig/distutils to query the python interpreter about where things should go).
Be upfront about what interpreter we support and what we don’t. When we discover an interpreter, we check if our expectations about the interpreter are meet. Microsoft Store Python is now supported, we automatically discover that it does not support our builtin virtualenv generation method (as the python executable is read-only), and provide only the venv route.
Provide a Describe interface, that provides information about a virtual environment without creating it.
Significantly improved activation scripts that now support Unicode (emoji) characters. If the file system can encode a character, you can pass it.
Historically adding the seed packages (pip/virtualenv/setuptools) has been done by invoking pip and pointing it to the embedded wheels via --find-links. This is now available under the --seed pip flag.
The default seed mechanism is now --seed app-data. This new model tries to address the performance issues mentioned at the start of this post. 98 percent of those three seconds (on Linux at least, Windows is even slower) is spent on installing the seed packages. Instead of always installing packages from scratch, we use a cache. The first time we are installing a seed package, we’ll install it into the user application data folder, and make it read-only. Finally, instead of installing it into the virtual environments pure library path (often site-package), we link it from the app-data. We also improved the wheel extraction mechanism, getting it down to 1.8 seconds. The first virtual environment creation will still be slow (2 seconds). However, subsequent ones will run in just 50 milliseconds.
zipapp support. The advantage of a single file mode was that it was accessible to bootstrap virtualenv itself. You just downloaded virtualenv.py and you were good to go. To mitigate the fact that now we have multiple files and multiple dependencies, we now ship a zipapp - 20.0.0b1 version available here, that one can use the same fashion. Download virtualenv.pyz point that to a python interpreter, and you should be good to go.
All CLI defaults can be changed via virtualenv.ini inside the user config folder (or use an environment variable to specify the location of this).
Now extensible via package entry points (install packages alongside virtualenv to enable):
- interpreter discovery mechanism (you have some custom logic specifying where you can find compatible pythons - use this),
- virtual environment creation logic (want to load Python from a database, sure thing!),
- seed package creation (you have a better idea than the app data design described above, try it by writing your own)
- activation scripts (you have a new shell, create your own activator script via this).

These are just some of the changes. The idea is that this package should be at CLI fully compatible with virtualenv 16.x. Yet, within has many improvements.

Call for feedback

I released today beta 1 with the hope that some people can try it out and report back bugs they find. Once we fix all the issues people run into, we’ll release it as version 20.0.0. The rewrite branch within the virtualenv repository will become the master, with the master moving to legacy. A final note, that documentation has not been updated yet, but I’ll try to work on this in the following days.

PyPi 20.0.0b1 - https://pypi.org/project/virtualenv/20.0.0b1/#files
Zipapp 20.0.0b1 - https://drive.google.com/open?id=1RPoLprfsexuO-AEFcpdSB2DupMdsWcgC (hosted on my personal Google Drive)

pf_moore · January 28, 2020, 2:16pm

On my nasty slow Windows 10 laptop with corporate antivirus, network proxy and all sorts of stuff, new virtualenv takes around 11 sec. Old version took 1 min 10 sec. I’m calling that a major win on speed, if nothing else.

Also the zipapp deployment is really easy to use and works perfectly for me.

Many thanks for this upgrade - it’s an amazing bit of work that has been badly needed for years now.

bernatgabor · January 28, 2020, 2:19pm

@pf_moore note you can enable https://community.perforce.com/s/article/3472, which should speed up things even further. If virtualenv is installed under python3 this will work even when creating Python 2 environments. Can you provide a link to the creation by adding in -vvv? Is the 11 seconds the first run, or the second run (cached)? 11 seconds still seems a lot and wonder what’s making up that (could be just copy overhead).

pf_moore · January 28, 2020, 3:37pm

I’m in the Administrators group, so it looks from that article like I can’t use symbolic links without running in an administrative prompt (which I don’t routinely do). It’s bizarre that it’s harder to use symlinks if you’re an admin than if you’re a non-admin… (I can’t add my user to the privilege, as I have a domain account). [Update: It turns out that what I need to do is switch on “Use developer features” and symlinks work fine - still not as fast as on my home PC, but as I say I put most of that down to AV and corporate lockdown]

Here is the logfile. This time it took 8 seconds, and I’d pretty much expect that if the virus scanner checks each pyd being copied, or something silly like that. Honestly, 8 seconds seems perfectly reasonable to me on that PC.

I just checked on my home PC, which has much less aggressive AV settings, and apparently has symlinks enabled, and it takes 0.8 sec to create a virtualenv. That is seriously impressive.

brettcannon · January 28, 2020, 7:31pm

Do you know where the time is going? Is it mostly the symlinking on UNIX and the file copying on Windows? (I ask because I have an idea for the Python launcher for UNIX to potentially help with this on Windows.)

bernatgabor · January 28, 2020, 8:46pm

In many places, copy/symlink is part of it, probably the biggest, but starting subprocesses is another, anti virus blocks another and there’s a significant amount under pip startup time (setting up things and ensuring things are sane - configuration parse, etc).

bernatgabor · January 28, 2020, 8:47pm

What would be the problem you’re trying to solve, and what’s the solution?

brettcannon · January 28, 2020, 9:37pm

Overhead of copying over all the files for Python on Windows due to lack of symlink support.

Make the Python launcher recognize when it is running in a virtual environment (i.e. ../pyvenv.cfg exists) and then run itself such that it acts like a virtual environment. That way creating a virtual environment on Windows is just copying the launcher over and creating some empty directories.

bernatgabor · January 28, 2020, 9:51pm

Not sure I follow why the py.exe needs to copy files.

brettcannon · January 29, 2020, 7:04pm

There are no files to copy over except py.exe and renaming it python.exe. Then with the appropriate environment variables set it would operate like a normal python running in a virtual environment, all without copying over stuff like the stdlib.

brylie · January 31, 2020, 12:40pm

Any plans to support PEP-518/pyproject.toml?

bernatgabor · January 31, 2020, 1:09pm

Can you please qualify what type of support are you expecting here?

brylie · January 31, 2020, 1:24pm

Sorry. I was confused on the context. I was thinking of Pipenv not virtualenv. My, admittedly off-topic, idea was for Pipenv to converge on supporting pyproject.toml (as does Poetry). I suppose that virtualenv is agnostic of how project dependencies are defined?

bernatgabor · January 31, 2020, 3:19pm

Yes; it’s not related.

steve.dower · February 1, 2020, 12:48am

The main overhead is that our copyfile implementations are terribly inefficient. I think sendfile is now optimal, but if we try and replicate that to copyfile it’ll break people who care about POSIX semantics (like attributes, metadata, etc.) and not just getting the file into the right place.

The fewer file open/closes, the better. Even better if we just use the OS CopyFile API (or one of the batch copy APIs, when I get my module that exposes them written).

eryksun · February 1, 2020, 7:18pm

This depends on the policy of your organization’s IT department regarding granting SeCreateSymbolicLinkPrivilege directly to users and groups. A user does not have to be an elevated administrator to use symlinks, and developer mode does not have to be enabled either. By design, when LSA logs a user on for an interactive session, if the user is an administrator and UAC is enabled, it creates a split token pair. Of the two, it returns the limited-user token at medium integrity level, with administrator privileges stripped out. This token has a linked administrator token at high integrity level, with full admin privileges, which is accessible by system processes that have SeTcbPrivilege (e.g. the system process that hosts the AppInfo services that handles UAC elevation requests). In this case, if the user has the symlink creation privilege by way of the Administrators group, then it gets stripped out of the limited token. On other hand, if a user (admin or standard, doesn’t matter) has the symlink creation privilege directly assigned to them, or to one of their other groups, then it will not be stripped out of the limited-user access token. In this case, the user can create symlinks without elevating to the linked administrator token and without enabling developer mode.

eryksun · February 1, 2020, 8:02pm

The --symlinks option works and creates a functional environment when using a python.org installation, assuming the user is allowed to create symlinks. But it does not work and fails ungracefully with the store app distribution. It would be nice to fail more gracefully in the app context, with a message that symlinks aren’t supported.

steve.dower · February 1, 2020, 9:22pm

The symlinks option only saves copying about three files, so it’s basically irrelevant (though it does reduce startup overhead a little).

The real performance issue is installing pip. Once we can easily run pip from outside the environment and have it install into the environment, we can have sub-second venvs on all platforms.

bernatgabor · February 1, 2020, 9:43pm

This is what this rewrite does with the app-data seeder, and hence @pf_moore 0.8s creation on Windows, that’s just 100ms on POSIX.

eryksun · February 1, 2020, 9:43pm

I just wanted to clarify that symlinks are still supported in some cases, and to note the ungraceful failure when it’s used with the store app. It’s useful to me for testing bpo bugs to have an environment that doesn’t use the new launcher, i.e. one where sys.executable == sys._base_executable. Performance-wise it’s not all that useful, especially since the contents of the DLLs directory are no longer getting copied unnecessarily.