Introducing simple-repository-server: A Standards-Compliant, Extensible Python Package Index Server

TL;DR

Run a standards compliant FastAPI-based Python package index/repository server with ease with simple-repository-server. Featuring:

  • PEP-503 compliant repository proxy, including PyPI and other repositories
  • Local package serving directly from the filesystem
  • Order-based repository merging to mitigate the dependency confusion vulnerability
  • PEP-658 metadata enrichment for upstream repositories that need it, as well as for local wheels (faster package resolution for installers)
  • PEP-691 JSON API support for enhanced repository interactions
  • Easy extensibility: Modify and extend in Python with ease

It doesn’t provide a project upload capability (e.g. via twine), nor does it serve non-standards based repository APIs such as the JSON API and the XML-RPC API. There is also no browsing interface beyond the standard PEP-503 pages; for this, you could use simple-repository-browser.

Background

In a post last year on discuss.python.org, I introduced simple-repository-browser, a PyPI-like interface to browse and search packages in any PEP-503 compliant simple repository. The project is built on a core simple-repository library, which has the potential for use in both client and server contexts where package repository interactions are needed. The simple-repository core library is a topic for a different post, but in this one I wanted to share another application built on top of the library: simple-repository-server.

The simple-repository-server project combines the simple-repository library with FastAPI, to deliver a standards compliant package repository over HTTP. The server has content negotiation based JSON support (PEP-691), and implements the PEP-503 interface - it is therefore compatible with standards compliant repository consumers, such as pip and uv. Thanks to built-in PEP-658 metadata enrichment, it can also help speed-up dependency resolution for these tools (especially for projects not on PyPI, or those mirrored via bandersnatch).

Usage

Out of the box, simply running:

python -m simple_repository_server https://pypi.org/simple/

Starts a uvicorn server listening on localhost, and which uses the cache-control headers from PyPI to store responses and minimise network traffic to PyPI. In the case of PyPI outages, the service can continue to operate using the responses already seen by the server (though accessing never-seen-before resources will naturally fail).

If you wish to augment this repository with your own set of projects, taking care to ensure that projects of the same name aren’t accidentally installed from PyPI (aka. dependency confusion), then you can run the server with:

python -m simple_repository_server \
    /path/to/local/python/projects/ \
    https://pypi.org/simple/

Where your local projects should be organised into one directory per project, and named according to PEP-503 normalisation.

Real-world experience

The simple-repository-server project has been running in production behind an nginx reverse-proxy for a number of years in the Accelerator Sector at CERN. It healthily sustains >100k requests per day by hundreds of users on a single simple-repository-server instance.

Initially we developed the server as a lightweight proxy for our internal Nexus instance in order to address the dependency confusion vulnerability. Subsequently we wanted to support PEP-658 (metadata) based installer optimisations, and other nice-to-haves (JSON API from PEP-691), and so, over time our Nexus instance moved down the stack and became less critical. The maintenence cost, and inextensibility for a team of non-Java specialists, has led to us ultimately removing Nexus entirely from our stack, having replaced browsing of the repository with simple-repository-browser, and publishing of projects (through twine) by an internal tool tighly coupled to our authentication and metadata QA policies (this is not part of simple-repository-server).

The history of simple-repository-server means that it has been designed from the ground-up to be highly adaptable to different scenarios, and its lightweight asyncio-based API means that it is realistic to extend for non-experts.

We spent a lot of time scouting the market for alternative approaches that did not require us to write our own server (after all: this is a crowded space). Our key requirements were that we wanted something simple and reliable to deploy and update, adoption of modern packaging standards (PEP-658 as a minimum), minimal lock-in, and the flexibility to introduce changes to our existing production repository gradually. We ideally wanted convenient extensibility and a clear separation of concerns, such that we could apply internal policies (e.g. metadata requirements, project ownership, automatic blacklisting) to our repository. Our search didn’t yield a suitable project (simpleindex comes close, but doesn’t serve all of the standards that we were looking for), and so we created simple-repository-server to meet these needs.

What’s next?

Much like my previous post, simple-repository-server is an important internal project for us, and we’ve released the full code under the MIT license at GitHub - simple-repository/simple-repository-server: A tool for running a PEP-503 simple Python package repository, including features such as dist metadata (PEP-658) and JSON API (PEP-691). We are seeking feedback and to understand the interest from the community before deciding whether to invest further effort in moving the project towards a full open development & maintenance model [1].

We would love to hear from you if simple-repository-server sounds like a project that fits your needs directly. We are also really interested to know if there are projects out there with a need to serve a standards based repository API, and who would be interested to integrate directly with simple-repository-server (in general, this is surprisingly easy to do). More generally, we’re interested in feedback on the tool, and to raise awareness of the project which is solving a need for us today.


  1. our motivations, as a publicly funded international organisation, are to make the best use of our resources to achieve our scientific mission. We think that in the long-run, simple-repository-server being a well adopted tool will reduce our overall cost-of-ownership, and at the same time can have a positive impact for others (including other similar research labs). We also care deeply about disclosing software responsibly, and ensuring that when we release software we do so sustainably. ↩︎

22 Likes

This looks excellent and fills a gap. Thanks for the contribution to the community!