Open Pull Requests by File

Ever wonder which open pull requests have modified your favorite module/file? Wonder no more!

Here’s a silly little repo to create the JSON of all the open pull requests and then get their related files:

edit: I added a text file to the repo so it’s easier to see the current results instead of running it all yourself.
https://github.com/csabella/pulls/blob/master/files_to_pull_requests.txt

I’m sure something better already exists, but I had an itch to scratch, so I created this. I thought someone else might find it interesting. :slight_smile:

FYI, the JSON is 25MB because I decided to just dump everything I looked at.

8 Likes

This is INCREDIBLY useful. This should be a GitHub feature! Until then, I wonder if it can be added to the automated GitHub workflow as a bot or something?!

Thanks @scotchka! I’m glad you find it useful. :slight_smile:

It would obviously need work before it would be ready for prime time, but if there’s interest, it’s something I can take a look at. I would envision that a usable product would gather the data asynchronously and probably display the results as a dashboard of graphs instead of just pretty-printed. As I said, I hadn’t really researched if something already existed before I did this because it gave me a chance to play with the GitHub API a little.

I took a glance at the GitHub API docs, and it seems like a webhook could be a convenient way to keep the open PR data in sync with the repo:

Pinging @Mariatta, in case she has inputs here. =)

Depends how long this takes to run :slight_smile: and how many API calls (are we going to reach the rate limit if doing this on demand as webhook?)

If there is no concern with ratelimit, webhook definitely will work. We’ll want it hosted in Heroku, and use celery or similar library as a background task. The catch is celery is not compatible with Python 3.7 yet, that’ll come in celery 5.0.

It can also work as a cron job, running it periodically every hour or so. This way we probably don’t need to worry about rate limiting.

I agree with @Mariatta that this would probably be better as a cron job than a webhook. When I created it, it was more to see what was already out there instead of getting up to the minute information for newly added PRs. I think running it once a day or even once a week would be sufficient, based on my original intent.

However, this was a fun script and not an idea that had any actual design. Taking that into consideration, the method used and frequency for grabbing the data is certainly something that can be looked at. If a base dataset is created (with existing PRs), then each touch on a PR would probably one result in one additional GitHub API call, so it probably wouldn’t have much of an impact on the ratelimit.