I find that I can’t find good data about our development process. While I don’t think we should be “data driven”, being “data informed” could be wise. I think that having easily accessible data in a single place would be a great basis for discussions and decision making.
What I’m thinking of is a system that would collect data from b.p.o., GitHub, the buildbots etc. We’d have a website allowing access to this data, graphing it in various ways, and exporting parts of the data set. And it would be a great place for a general dashboard shown on website’s home page, showing things like activity, # of open/closed issues and PR at various statuses.
I’m not necessarily volunteering to build this if it’s thought to be a good idea, but I’d like to at least hear what others think!
What are you specifically after here? Once issues move to GitHub from bpo then that will allow people to use GitHub’s API without us having to maintain our own data collection system for that data. But otherwise a “collect all the data” mentality is a lot of work, especially when one of those systems should be going away at some point for us and the other already is essentially a data store.
Now if someone wanted to somehow collect stats on the buildbot, that wouldn’t be a duplication of effort and might help identify troublesome OSs, tests, and buildbots themselves.
Collecting data would be the back-end side of it; easily making graphs, or even having a single standard dashboard, would be the front-end side. From my experience, just having those readily available and regularly viewed can have an effect.
Trying to think of concrete use cases, I’d like to be able to easily explore questions such as:
How has the number of open PRs behaved over time? Open PRs in various states, e.g. “awaiting review” vs. “awaiting merge” vs. waiting for changes requested by a core dev?
How is that affected by the number of active core devs?
How is that affected by the number of people creating issues / PRs?
How many core devs review and merge PRs? How many do so at a high rate? For how much of the PR review are a small group responsible?
How often do non-core devs review PRs? What percentage of the reviews do these make up?
How many PRs linger for over X months, at various stages?
How many change requests do PRs go through before being merged, closed, or entering a long period of no progress?
I could certainly pull data from the GitHub and b.p.o. APIs and work with them locally… but a service regularly updating its data set, with an off-the-shelf web-based interface, seems like it should be relatively simple to set up these days, and would naturaly enable making a standard online dashboard which would have its own merits.