Pre-release integration testing for pip and/or other build tools?

pganssle · January 27, 2019, 3:46pm

The release of pip 19.x seems to have broken installation for a large swathe of the ecosystem because of the changes with PEP 517. While we’re working to figure out what to do, I can’t help but wonder if there’s anything we could have done to detect that this was going to be a problem before the release.

One idea that I’ve come to is that we probably could have detected this if we had some better “integration testing”, using the master branch of pip to install some real projects - in a virtualenv, not in a virtualenv, etc.

I’ve already suggested on setuptools that we start testing against master of pip, and I’m thinking we may want to start testing against master for other tools as well. I’m wondering if maybe it would make sense for us to broaden the scope of this testing and create something akin to CPython’s buildbots to detect early incompatibilities in Python’s build ecosystem.

Roughly what I’m thinking would be that for each environment we could install the master branch of each of the relevant tools (pip, setuptools, virtualenv, tox, etc), and try to install a bunch of real projects from PyPI and see if they fail. To guard against noise when some project introduces a bug in its build, if there are failures we can re-run the failing installations using the release versions of the build tools and only hard-fail if the release tools succeed.

steve.dower · January 27, 2019, 5:45pm

Presumably the release versions could be assumed to succeed (or at least be out of scope if they fail there), and any prerelease test failure would need to be investigated regardless, which means this is really just a few pip install commands in CI? It’s certainly easy enough to get a nightly build set up, and Azure Pipelines has enough capacity (or I can get more if necessary).

The challenge is having people who will watch and diagnose problems, and then assign them to the relevant project. If enough people put their hand up and say they’ll watch it, I’ll happily help out getting it set up.

pganssle · January 27, 2019, 10:19pm

Presumably the release versions could be assumed to succeed (or at least be out of scope if they fail there), and any prerelease test failure would need to be investigated regardless

Not sure I understand what you mean by this, maybe we agree? I’m imagining the algorithm to go like this:

Install master versions of build tools
Run pip install -r requirements-A.txt to install a collection of packages
If step 2 succeeded, move on to the next requirements.txt, otherwise:
Create and/or activate an environment with the latest release of all the build tools
Repeat step 2, if it fails log and move on, if it succeeds, register a test failure.

That way, if PyPI goes down or something breaks in the build setup of one of the packages (again these are real-life packages on PyPI, not crafted test packages), we don’t get erroneous notifications. Failures with the real pip will likely be noticed and reported to the project (and to pip if it’s a bug in pip).

… which means this is really just a few pip install commands in CI?

In addition to the admittedly simple algorithm above, I think there’s a class of bugs that only presents if you have certain unusual features like building a C or C++ extension. We will likely need to do something to specify “native” dependencies in some way, which may make things slightly more complicated, but we can probably start with things that don’t have any dependencies above and beyond the standard libraries you can expect (maybe restricted to the allowed libraries in manylinux2010), and add other tests as necessary.

steve.dower · January 27, 2019, 11:54pm

I guess my assumption is that if they’re failing to install with the release version, that will be noticed before failing with master anyway, so we don’t need the automatic retest step. As in, it’s not worth the complexity it would create (specifically, I don’t think you have a simple cross-platform script with easily captured output at that point, whereas letting it fail and having those who review it also check PyPA status defers the effort until the few times it’ll be needed).

njs · January 28, 2019, 12:46am

Maybe all we need to do is add a CI job to pip that just tries to install a bunch of real packages from PyPI?

ncoghlan · January 28, 2019, 7:57am

A pip-only job wouldn’t catch setuptools or wheel issues until post-release, but it does seem like a sensible place to start.

If the versions being installed are pinned, then there wouldn’t be a need to worry about updates in those projects breaking the CI.

pganssle · January 28, 2019, 12:39pm

guess my assumption is that if they’re failing to install with the release version, that will be noticed before failing with master anyway, so we don’t need the automatic retest step. As in, it’s not worth the complexity it would create (specifically, I don’t think you have a simple cross-platform script with easily captured output at that point, whereas letting it fail and having those who review it also check PyPA status defers the effort until the few times it’ll be needed).

We can try, but usually there’s a decent gap between the time when these sorts of things are noticed and the time that a fix is released. That will add a lot of noise to the “buildbots”. Not to mention, some packages end up considering source distributions to be a “best effort” kind of thing, and won’t necessarily rush to fix a problem.

That said, I’m probably over-designing this. The first thing to do is definitely to set up some kind of recurring CI job that does the builds.

I was thinking we could just create a dedicated repo for this a la packaging-problems.

pf_moore · January 28, 2019, 2:24pm

That sounds like a good approach. This is very much about how the various tools interact, so having the tests outside of any one project’s CI makes that clearer. It also makes it more obvious that anyone can help investigate and/or address issues that do arise, and they aren’t “setuptools issues” or “pip issues”.

I’d assume that the key projects we’d want to be tracking are setuptools, wheel and pip. Any others?

Also, we’d need to build versions of the tools from master and make them globally available in some sort of local index or directory. Pip, for example, will install setuptools and wheel in the build environment, so it has to have access to the master version at that point. Just having master setuptools installed in the system site-packages won’t be sufficient.

pganssle · January 28, 2019, 2:41pm

virtualenv, packaging, pep517 and possibly twine.

Not a pypa project, but I also wonder if we should throw tox into the mix as well. The most common problems I see with my CI tend to be weird interactions between pip, setuptools, tox and virtualenv, especially around versioning.

bernatgabor · January 28, 2019, 3:05pm

+1 on this, I can help out with this too if needed.

I can get along with adding virtualenv/tox to the matrix.

ssbarnea · January 28, 2019, 5:58pm

It would be wonderful if we would have an integration testing between all these essential python tools (tox, virtualenv, pip,…) so we would considerably lower the chance of breaking something on a new release.

In the past I use the pip install —pre approach to test compatibility with not released yet code. The only issue with that is that it assumes that there is enough time between a prerelease is made and the final release. I know that more recently Travis started to allow custom triggered builds which could at least in theory be used to test changes made in a dependency. I never managed to find time to implement such test but I will try as I maintain few python libraries and the best way to know if a newer version of the library break a consumer is to run consumer CI with newer package.

pradyunsg · January 28, 2019, 7:02pm

pip’s test suite does (install certain packages from PyPI) and more too (like VCS installs), though not in the same vein. pip also uses the master branch of virtualenv in our CI tests. We don’t use the master of setuptools + wheel – I’m not sure keen on adding more things to pip’s test matrix though (it takes too long already).

As for the separate CI for testing the installation of a bunch of popular projects, I like the idea. Let’s figure out how to set that up on some pypa/<nice-name>.

EWDurbin · January 28, 2019, 7:10pm

As for the separate CI for testing the installation of a bunch of popular projects, I like the idea. Let’s figure out how to set that up on some pypa/.

@bernatgabor asked me to setup a pypa/integration-test repository. I can do this, just need to know who we’d like to have as administrators of the repository.

bernatgabor · January 28, 2019, 7:12pm

I would propose @pradyunsg @pganssle @pf_moore @steve.dower @ncoghlan and myself for start. Anyone else wants in on it?

xafer · January 28, 2019, 8:35pm

I’m also interested

techalchemy · January 28, 2019, 8:35pm

We do some extensive integration testing already over in pipenv, so it might help if we move some of that over as well

EWDurbin · January 29, 2019, 2:07pm

The Repository has been created at https://github.com/pypa/integration-test, and the listed folks have been invited to join the GitHub Team that has been granted Admin permissions.

That is likely sufficient for Admin coverage on that repo.

pradyunsg · January 30, 2019, 5:12am

Thanks @EWDurbin!

A couple of quick questions:

What CI service will we be using?
Let’s move the discussion to the issue tracker of that repository?

bernatgabor · January 30, 2019, 7:46am

Per discussion above me and Steve offered to put together an Azure CI build, though I would expect to define the test part behind some tox files, maybe pytest included.

steve.dower · January 30, 2019, 2:15pm

Much as I love Tox, I think we will avoid noise by sticking closer to the metal for this. Pipelines has some neat templating support that should give us a way to easily have many independent jobs for each package of interest, as well as coping with platform differences, and all we really want is the basic install command, right?