The release of pip 19.x seems to have broken installation for a large swathe of the ecosystem because of the changes with PEP 517. While we’re working to figure out what to do, I can’t help but wonder if there’s anything we could have done to detect that this was going to be a problem before the release.
One idea that I’ve come to is that we probably could have detected this if we had some better “integration testing”, using the master branch of pip to install some real projects - in a virtualenv, not in a virtualenv, etc.
I’ve already suggested on setuptools that we start testing against master of pip, and I’m thinking we may want to start testing against master for other tools as well. I’m wondering if maybe it would make sense for us to broaden the scope of this testing and create something akin to CPython’s buildbots to detect early incompatibilities in Python’s build ecosystem.
Roughly what I’m thinking would be that for each environment we could install the master branch of each of the relevant tools (pip, setuptools, virtualenv, tox, etc), and try to install a bunch of real projects from PyPI and see if they fail. To guard against noise when some project introduces a bug in its build, if there are failures we can re-run the failing installations using the release versions of the build tools and only hard-fail if the release tools succeed.
Presumably the release versions could be assumed to succeed (or at least be out of scope if they fail there), and any prerelease test failure would need to be investigated regardless, which means this is really just a few pip install commands in CI? It’s certainly easy enough to get a nightly build set up, and Azure Pipelines has enough capacity (or I can get more if necessary).
The challenge is having people who will watch and diagnose problems, and then assign them to the relevant project. If enough people put their hand up and say they’ll watch it, I’ll happily help out getting it set up.
Presumably the release versions could be assumed to succeed (or at least be out of scope if they fail there), and any prerelease test failure would need to be investigated regardless
Not sure I understand what you mean by this, maybe we agree? I’m imagining the algorithm to go like this:
Install master versions of build tools
Run pip install -r requirements-A.txt to install a collection of packages
If step 2 succeeded, move on to the next requirements.txt, otherwise:
Create and/or activate an environment with the latest release of all the build tools
Repeat step 2, if it fails log and move on, if it succeeds, register a test failure.
That way, if PyPI goes down or something breaks in the build setup of one of the packages (again these are real-life packages on PyPI, not crafted test packages), we don’t get erroneous notifications. Failures with the real pip will likely be noticed and reported to the project (and to pip if it’s a bug in pip).
… which means this is really just a few pip install commands in CI?
In addition to the admittedly simple algorithm above, I think there’s a class of bugs that only presents if you have certain unusual features like building a C or C++ extension. We will likely need to do something to specify “native” dependencies in some way, which may make things slightly more complicated, but we can probably start with things that don’t have any dependencies above and beyond the standard libraries you can expect (maybe restricted to the allowed libraries in manylinux2010), and add other tests as necessary.
I guess my assumption is that if they’re failing to install with the release version, that will be noticed before failing with master anyway, so we don’t need the automatic retest step. As in, it’s not worth the complexity it would create (specifically, I don’t think you have a simple cross-platform script with easily captured output at that point, whereas letting it fail and having those who review it also check PyPA status defers the effort until the few times it’ll be needed).
guess my assumption is that if they’re failing to install with the release version, that will be noticed before failing with master anyway, so we don’t need the automatic retest step. As in, it’s not worth the complexity it would create (specifically, I don’t think you have a simple cross-platform script with easily captured output at that point, whereas letting it fail and having those who review it also check PyPA status defers the effort until the few times it’ll be needed).
We can try, but usually there’s a decent gap between the time when these sorts of things are noticed and the time that a fix is released. That will add a lot of noise to the “buildbots”. Not to mention, some packages end up considering source distributions to be a “best effort” kind of thing, and won’t necessarily rush to fix a problem.
That said, I’m probably over-designing this. The first thing to do is definitely to set up some kind of recurring CI job that does the builds.
I was thinking we could just create a dedicated repo for this a la packaging-problems.
That sounds like a good approach. This is very much about how the various tools interact, so having the tests outside of any one project’s CI makes that clearer. It also makes it more obvious that anyone can help investigate and/or address issues that do arise, and they aren’t “setuptools issues” or “pip issues”.
I’d assume that the key projects we’d want to be tracking are setuptools, wheel and pip. Any others?
Also, we’d need to build versions of the tools from master and make them globally available in some sort of local index or directory. Pip, for example, will install setuptools and wheel in the build environment, so it has to have access to the master version at that point. Just having master setuptools installed in the system site-packages won’t be sufficient.
Not a pypa project, but I also wonder if we should throw tox into the mix as well. The most common problems I see with my CI tend to be weird interactions between pip, setuptools, tox and virtualenv, especially around versioning.
It would be wonderful if we would have an integration testing between all these essential python tools (tox, virtualenv, pip,…) so we would considerably lower the chance of breaking something on a new release.
In the past I use the pip install —pre approach to test compatibility with not released yet code. The only issue with that is that it assumes that there is enough time between a prerelease is made and the final release. I know that more recently Travis started to allow custom triggered builds which could at least in theory be used to test changes made in a dependency. I never managed to find time to implement such test but I will try as I maintain few python libraries and the best way to know if a newer version of the library break a consumer is to run consumer CI with newer package.
pip’s test suite does (install certain packages from PyPI) and more too (like VCS installs), though not in the same vein. pip also uses the master branch of virtualenv in our CI tests. We don’t use the master of setuptools + wheel – I’m not sure keen on adding more things to pip’s test matrix though (it takes too long already).
As for the separate CI for testing the installation of a bunch of popular projects, I like the idea. Let’s figure out how to set that up on some pypa/<nice-name>.
As for the separate CI for testing the installation of a bunch of popular projects, I like the idea. Let’s figure out how to set that up on some pypa/.
@bernatgabor asked me to setup a pypa/integration-test repository. I can do this, just need to know who we’d like to have as administrators of the repository.
The Repository has been created at https://github.com/pypa/integration-test, and the listed folks have been invited to join the GitHub Team that has been granted Admin permissions.
That is likely sufficient for Admin coverage on that repo.
Per discussion above me and Steve offered to put together an Azure CI build, though I would expect to define the test part behind some tox files, maybe pytest included.
Much as I love Tox, I think we will avoid noise by sticking closer to the metal for this. Pipelines has some neat templating support that should give us a way to easily have many independent jobs for each package of interest, as well as coping with platform differences, and all we really want is the basic install command, right?