What's a reasonable rate to access PyPI automatically at?

The Warehouse documentation suggests that, whilst there’s no rate-limiting currently in place for PyPI, making thousands of requests per minute would be a bad idea. I’m using distlib.locators.locate to make requests, so I don’t have a way of putting my contact details in the user agent (as far as I know) – is it reasonable to call that function no more than once per second, say? Could I go faster? (It’ll take a couple of days to do the initial crawl at one request per second, which is fine, but getting it done sooner would be nice.)

Thanks for asking! Those docs could probably be a bit more clear, but like it says, those things are just suggestions. 90-something percent of requests hit our CDN, and don’t actually have an effect on our backends.

To give you a sense of scale, our CDN usually is serving anywhere from 5-10k req/s, so as long as you’re not making tens or hundreds of thousands of requests per minute, we probably won’t notice.