Publishing nightly builds on test.pypi.org with a time-based retention policy

Is there a standard way to publish timestamped nightly wheels (such as projectname-X.Y.Z.dev0+20200116035252-cpXX-cpXX-win_amd64.whl or projectname-X.Y.Z.dev20200116-cpXX-cpXX-win_amd64.whl) on test.pypi.org and to have a system to automatically delete wheels that are older than a couple of days?

For scikit-learn we typically build binary wheels for at least 3 Python versions times 3 platforms times ~1.5 (32 bit and 64 bit Python for Windows and Linux). Each wheel is at least 4 MB. So in the order of ~50 MB per day just for scikit-learn, so ~20 GB per year if the old files are not automatically deleted.

I noticed that tensorflow has 2 ancillary packages for nightly builds: tf-nightly and tf-nightly-gpu. Each of them. The tf-nightly wheels seems to weigh ~2.2 GB which is ~800 GB per year.

This looks very wasteful to me.

If there is no built-in way to set-up retention policies for timestamped dev releases on pypi.org or test.pypi.org, one could try to setup a cron job on some CI server to automatically delete older files. However the warehouse API does not seem to allow for file deletion: https://warehouse.readthedocs.io/api-reference/

2 Likes

It seems to me that the best answer for temporary releases like nightlies would be to set up your own simple index (PEP 503 has the format you need, it’s not complicated) and direct your users to use that. You can set your own retention policies, etc, without needing to wait for Warehouse to implement anything.

4 Likes

Thanks Paul.

For a single project that would work fine. But ideally we would like to have a single index shared by several projects that have CI workers that upload and download the nightly builds of each other dependent project so as to be able to run the tests against the master branch of each other project without having to re-build all the upstream dependencies each time.

I liked the idea of using test.pypi.org for this as it makes it possible to have shared index where each project maintainers’ team can manage its own upload credentials / tokens: the scikit-learn developers can only upload the scikit-learn wheels and not mess around with the numpy wheels…

In the mean time I think we will use the anaconda cloud service that can provide PEP 503 compatible index, but as far as I know if would not provide per-project upload permissions on a shared index.

1 Like

Hmm, you could still have a single index that contains URLs to project-specific areas, surely? Or alternatively, I imagine something like devpi could handle this.

I guess the implied requirement here is “a hosted service that already exists so we don’t have to spend project resources building a publishing solution rather than working on the projects”. But in that case, I don’t think there is such a thing. As you say, PyPI/warehouse is not really designed for large, transient artefacts of the sort you’re describing.

3 Likes

For the record, anaconda.org allows for per-package upload permissions in a shared organization feed. So it sounds like a good solution for our use case.

For the longer term, I still think that would be nice for the wider Python community to have a standard way to publish nightly builds on an official channel, for instance, on a nightly.pypi.org instance of warehouse (with a generic time-based or sequence-based retention policy). This would make it easy for the test automation of all project be able to run their tests against the latest development branch of all their dependencies.

It would even make it easier for CPython itself to test that a new Python release will have quick support of all the major top level dependencies of the ecosystem.

4 Likes

If you have an Azure Pipelines account, you should be able to set up an Azure Artifacts feed. Unfortunately I don’t think the permissions allow for public read/authenticated upload yet, but it might suit your needs?

1 Like

There is public read for public projects.

It’s possible to create tokens to allow several open source projects by different teams to push their nightly wheels into a shared feed but:

  • the maximum duration for a token (Personal Access Tokens) with 1y which means that continuous integrations system will have to renew tokens every year.
  • you have no per-package granular permissions (only feed-level permissions): so open-source-project-a can upload a new version of open-source-project-b if they both share a feed.

Anaconda channels are more versatile and granular w.r.t. permissions and tokens. They have per-package upload semantics closer to the main pypi.org server.

1 Like

We at Vaex are thinking about doing nightly releases as well, and I think this would be a really good idea. I’m in favor of using test.pypi.org first, and see how that goes. And I think it would be really valuable for a nightly CI to take eachothers nightly packages, e.g. numpy, scipy, pandas, sklearn. This should help with the stability of the whole ecosystem.

1 Like

Personally I think this seems like a cool idea.

This is a bit trickier. Right now, the only “dangerous” thing people can do via the Warehouse API, to change the state of resources on PyPI, is upload/publish a distribution. I do suggest you file an issue on the Warehouse GitHub issue tracker to suggest the ability for a project maintainer or owner to delete a file via the API – now that we have support for API tokens it’s safer than it used to be to then automate PyPI API interactions off a CI or other server.

More money will help us overhaul and improve the PyPI API.

2 Likes

Update: several projects from the scipy ecosystem currently share the following anaconda.org hosted pypi compatible index for nightly builds:

1 Like

So four years later, is there anyway to automatize release or release assets deletion on tests.pypi.org ?

CI deletion is not possible unless someone implement an action in github that uses trusted publishing. Currently, deleting an asset or a release is a manual action for me because of 2FA.