Prevalence/Staleness of Stubs Packages in Pypi

yangdanny97 · November 7, 2024, 4:16pm

I did some analysis of stubs packages in Pypi, and thought the data might be interesting to share here.

For the purposes of this analysis, I define a stubs package as packages with the naming convention “-stubs”, per PEP561. This is probably not a complete listing, since there are other packages that are obviously stubs which do not follow this naming convention, such as boto3-stubs-lite.

The code to generate the results is in my fork of @lolpack’s type coverage script: GitHub - yangdanny97/type_coverage_py

An updated HTML report of top 2k packages can be viewed here: Git-Forge HTML Preview

How common are stubs packages?

Not that common - of the top 8k downloaded packages, 106 (1.3%) have stubs packages, and 15 (0.18%) are stubs packages. Separate stubs packages are more common in the most widely-used packages - 15 of the top 100 packages have corresponding stubs packages.

How about typeshed? Of the top 2000 packages, 114 have typeshed while only 74 have stubs packages. Interestingly, 10 packages have both typeshed stubs and stubs packages.

Full data here: pastebin

Are stubs packages up to date?

Having stubs in a separate pypi package with a different release schedule and potentially different maintainers raises some questions of whether these stubs are reliable and up-to-date.

To answer this question, I looked up each package with a corresponding stubs package and compared their latest release dates. I also compared the release frequency of packages v.s. stubs.

Staleness

We define staleness as the days between the latest release of the main package and the latest release of the stubs package. A negative value means that the stubs package is newer than the main package.

Ideally, we would expect stubs to be updated soon after a main package release, so well-maintained stubs packages should have a small negative staleness value.

In practice, this ideal scenario is rare. While there is a huge variance in staleness figures - the median stubs package is around ~280 days stale. As expected, stubs for more popular packages are more likely to have low staleness.

See the attached chart for a detailed breakdown. Full data here: pastebin

There are a lot of positive and negative outliers here, with a wide variety of reasons:

rfc3986-stubs - The stubs are several years ahead of the main package, because the main package hasn’t had a release in years. Presumably, users are downloading/consuming the latest package source directly.
pyjarowinkler-stubs - The main package hasn’t been updated since 2016, but in 2023 someone made a stubs package as a practice project

Update Frequency

Sometimes, stubs packages are made as one-off practice projects or are unmaintained/poorly maintained compared to the main package. The maintainers of the stubs may not be the same as for the main package, and it’s clear that most stubs are not set up to be automatically bumped when the main package is released.

In constrast, typeshed has stubsabot, which submits PRs to bump typeshed’s stubs when the main package releases, and any API changes are flagged to maintainers by CI.

Some data: the median package in this analysis has 46 releases, while the median stubs package has just 3. Out of the 106 stubs packages I looked at, over a quarter (23) have only a single release and another 19 have only two releases; 81/106 have <10 releases.

To me this raises a lot of questions on how much we can trust standalone stubs packages v.s. a centralized stubs repository like typeshed. The latter appears to be far better maintained and is a more popular way to distribute types across existing libraries.

Where should types go?

I’m curious to hear what everyone’s thoughts are on how best to add static typing for popular packages.

For library maintainers who don’t want to annotate their code directly with types, it seems like there are three options.

Generate stubs alongside the code & set up CI to ensure they are updated along with the source code.
Generate the stubs and ship it as a separate package & set up CI to ensure it is released along with the main package.
Generate the stubs in typeshed.

#1 or #2 would probably need to be done on a case-by-case basis and might be a hard sell for some package maintainers. If I were a package maintainer I would probably prefer #1 or adding the types inline over having a separate stub package because it seems like less work.

#3 seems like it would have the most consistent standards since all the infra and CI is already set up & all the typecheckers use it, but if we were to add stubs for, say, 1000 packages into typeshed I feel like the maintainers wouldn’t be able to keep up with updates.

Some possible ways of reducing the maintenance burden for typeshed:

Have some way of marking stubs as fully-generated, and automatically regenerate them each time. This would probably require big improvements to the current stub generation tooling, and the resulting stubs would be lower-quality than handwritten ones.
Have some way of marking ownership of typeshed stubs, so that when a version is bumped a bot automatically opens an issue against the package’s source repo reminding the maintainers to also update typeshed, instead of relying on typeshed maintainers to stay on top of all the updates.

Thoughts?

erictraut · November 7, 2024, 6:12pm

Thanks for doing this analysis.

Did you happen to note which libraries contain inlined type information or bundled stubs? These libraries can be identified by looking for a “py.typed” file marker.

I’m curious to hear what everyone’s thoughts are on how best to add static typing for popular packages.

From the perspective of library consumers, the best way for library maintainers to deliver type information is through inline annotations. Refer to this section of the pyright documentation for the reasoning behind this assertion.

We increasingly see library authors adopt this approach, which is great for the community. The typical path is to first annotate some subset (say, the 20% most-commonly-used classes and methods in the library) and then mark the library as “py.typed”. Type information for the remaining interface surface area is then added over time prioritized by user feedback. Type annotations are often improved through community contributions, which lowers the investment for the core library maintainers.

If inlined type information cannot be provided (for example, because the package is implemented in a language other than Python), then bundled stubs are the best answer. This approach, like inlined type information, reduces the burden for library consumers because it doesn’t involve an extra install and it eliminates problems with version inconsistencies.

If a library maintainer is unwilling to add either inlined type information or bundled stubs, then your list of suggestions make sense. However, all of these approaches have significant downsides for library consumers, so I would use them only as a fallback.

Library maintainers generally listen to their users. If consumers of a library see value in type information, their feedback can provide the impetus for investing in inlined types or bundled stubs.

yangdanny97 · November 7, 2024, 7:37pm

Out of the top 2k packages, 829 have this file.

I’ve added some code to count this and updated the report.

bwoodsend · November 8, 2024, 7:57pm

As a library maintainer who doesn’t want the burden of maintaining stubs or inline types but also doesn’t enjoy being peppered with false positive bug reports from PyCharm users complaining that their completions are wrong, I’d actually push for the unspoken and probably unpopular option 4 where stale+inaccurate stub packages should be removed (assuming no-one can think of a clever way of versioning stubs so that they’re ignored if they’re stale?).

Jelle · November 9, 2024, 2:48pm

Thanks for writing this up!

I think the ideal is for types to be maintained inline in each package, so that they get updated immediately when the code changes, and they’re maintained by the people who know the code best.

However, not all maintainers will want to maintain typing information, which is understandable: maintaining a package is enough work as is, and not everyone wants to use typing. As the typing community, we should therefore find ways to provide types for such packages in a way that doesn’t burden the package maintainers. That’s what typeshed is for. Some people also maintain stub packages independently from typeshed and from the original package; I think your analysis shows that maintenance of these packages is often not ideal. Typeshed provides a lot of infrastructure (testing, automated update PRs) that is helpful for these packages, so we should encourage users to submit stubs to typeshed instead of building their own packages.

As for your concrete suggestions:

Have some way of marking stubs as fully-generated

If we can generate the stubs from the source code, it might be better to have type checkers just look at the source code of the package so we don’t have to go through a separate stub generation stage.

Have some way of marking ownership of typeshed stubs

As discussed above, I wouldn’t want an issue to be opened on the original package; one of the points of typeshed is to provide types without bothering the original package maintainers. However, typeshed has a small group of maintainers to maintain a huge number of packages. It might be interesting to explore a system where people who aren’t full typeshed maintainers but are interested in a specific stub package get automatically subscribed to PRs related to that package.

yangdanny97 · November 11, 2024, 2:28am

If we can generate the stubs from the source code, it might be better to have type checkers just look at the source code of the package so we don’t have to go through a separate stub generation stage.

Could generated stubs give more consistent behavior for users who use different typecheckers?

Like if the codebase doesn’t have return types annotated, I imagine the typechecking behavior could be different depending on how the user’s typechecker does return type inference. So having the generated stubs could still ensure a consistent experience for users and might reduce the amount of typechecker-related bug reports a project gets.