The year is almost over and plenty of internet services are generating the “2020 in review” summary so I hope everyone is in the mood for some extra plots I have generated some “CPython lifetime in review” plots that I hope you enjoy. Generating these plots take a lot of time because the git blame for every single commit needs to be analyzed and aggregated (and that is a very filesystem intensive O(n^2) process), but there is some interesting insights and statistics that can be analyzed by looking at the results.
This plot shows the total number of lines in the codebase broken down into cohorts by the year the code was added. Looking at the different colours, you can observe how the code added in particular year survives over time. (“other” in this plot refers to everything before 1997).
Same idea but for extensions of the files in the codebase. Here you can see how the file extensions evolve over time.
The same idea again but broken down by authors. Is impossible to plot every author, so the ones displayed here are the ones that have added/deleted/modified the most lines cumulatively. Almost everyone is really in the “other” group.
This curve shows the percentage of lines in a commit that are still present after x years. It aggregates it overall commits, no matter what point in time they were made. So for x=0 it includes all commits, whereas for x>0 not all commits are counted (because we would have to look into the future for some of them). The survival curves are estimated using Kaplan-Meier.
These plots have been generated using a modified version of this tool.