What's a reasonable rate to access PyPI automatically at?

The Warehouse documentation suggests that, whilst there’s no rate-limiting currently in place for PyPI, making thousands of requests per minute would be a bad idea. I’m using distlib.locators.locate to make requests, so I don’t have a way of putting my contact details in the user agent (as far as I know) – is it reasonable to call that function no more than once per second, say? Could I go faster? (It’ll take a couple of days to do the initial crawl at one request per second, which is fine, but getting it done sooner would be nice.)

1 Like

Thanks for asking! Those docs could probably be a bit more clear, but like it says, those things are just suggestions. 90-something percent of requests hit our CDN, and don’t actually have an effect on our backends.

To give you a sense of scale, our CDN usually is serving anywhere from 5-10k req/s, so as long as you’re not making tens or hundreds of thousands of requests per minute, we probably won’t notice.