Hi all, I saw in the post about the Language Summit that there was a suggestion by @colesbury to run third-party testsuites as part of core development. Just wanted to contribute that we did this during Pyston development and wanted to offer our experience and tooling.
We ran the following testsuites as part of our normal development process: django urllib3 setuptools six requests sqlalchemy pandas and numpy. You can see the Makefile entrypoint here, and some of our helper code here
We decided pretty quickly to not target passing the test suites ā you would hope that it is straightforward to clone a repo and get its tests to pass, but we found that this was difficult-to-impossible depending on the requirements of the test suite and how well it supported non-standard testing environments. So our success criteria is fuzzier: that we pass the same number of tests as a reference implementation (the CPython version we target). Even that was tricky because many of the testsuites are nondeterministic and/or not consistent over time due to usage of external resources (particularly the libraries that are designed for network requests), and a few required monkey-patching to disable their runtime reflection of which tests to run based on the Python version. We did build some tooling that lets you specify fuzzy match criteria as well as do some fuzzy diffing between the produced test output and the reference test output, and maybe this could be useful.
We ended up including the external testsuites as part of our CI run but Iām not sure I would recommend that because they take a significant amount of time to run and I spent a decent amount of time tracking down test failures that ended up being spurious in various ways. But as a general tool for increasing our confidence it was quite nice. Happy to answer any questions or help!