358 most popular Python packages have wheels

groodt · January 18, 2024, 9:21pm

Vanity metric, but still cool and interesting in my opinion.

There are now 358 out of 360 most popular python packages have wheels available. https://pythonwheels.com/

The 2 non-wheel packages are:

future
pyspark

I’m not even sure if those could ever be wheels without changing packaging standards.

groodt · January 18, 2024, 9:29pm

“future” appears in quite a lot of dependency closures in my experience.

It’s a bit of a shame that “future” is not a wheel. In theory, you can certainly write a lot of useful software if your dependency closure only includes those top 358 packages.

If all 358 have PEP 658 metadata, it should be possible to do a very fast dependency solve purely using static metadata without any intermediate package downloads or builds. Could even be done without a Python interpreter ala GitHub - prefix-dev/rip: Solve and install Python packages quickly with rip (pip in Rust)

pf_moore · January 18, 2024, 10:06pm

It’s annoying that the list of wheels on that page can’t be cut & pasted, otherwise I could relatively quickly check. But I assume that most, probably all of those 358 will have PEP 658 metadata for their latest versions, at least. Unfortunately, for many resolves, looking at older versions is necessary and I don’t think the “backfill” exercise to add metadata for older wheels has been done yet.

But yes, it’s great news that we’re reaching a point where a significant number of real-world installs can be completed with static data, only downloading what needs to be installed.

sinoroc · January 18, 2024, 10:10pm

Maybe this can help: https://pythonwheels.com/results.json

pf_moore · January 18, 2024, 10:35pm

Ah cool. I didn’t spot a mention to that file. There’s actually 77 of the 358 which don’t have static metadata - presumably because they haven’t had a release since PyPI started extracting metadata from wheels.

A few spot checks suggests that’s the case.

This is the script I used to get upload times.

def get_upload_times(pkg):
    ACCEPT = "application/vnd.pypi.simple.v1+json"
    url = f"https://pypi.org/simple/{pkg}/"
    rsp = requests.get(url, headers={"Accept": ACCEPT})
    data = rsp.json()
    return [f["upload-time"] for f in data["files"] if f["filename"].endswith(".whl")]

This is the list of packages with no metadata:

adal
aioitertools
aiosignal
appdirs
asn1crypto
asynctest
azure-common
backoff
backports-zoneinfo
cinemagoer
colorama
coloredlogs
contextlib2
crashtest
decorator
entrypoints
et-xmlfile
gast
google-crc32c
google-pasta
h11
httplib2
humanfriendly
imdbpy
iniconfig
installer
isodate
itsdangerous
jeepney
jmespath
matplotlib-inline
mccabe
mdurl
msrest
msrestazure
multidict
mypy-extensions
oauth2client
oauthlib
openpyxl
oscrypto
parso
pkginfo
pkgutil-resolve-name
ply
ptyprocess
py
py4j
pyasn1-modules
pycparser
pynacl
pyproject-hooks
pysocks
python-dateutil
python-dotenv
python-json-logger
pytzdata
requests-aws4auth
requests-file
requests-oauthlib
requests-toolbelt
rfc3339-validator
rsa
scramp
secretstorage
six
sniffio
sortedcontainers
sqlparse
tabulate
toml
tomli
toolz
uritemplate
webencodings
xlrd
xmltodict

CAM-Gerlach · January 21, 2024, 3:25am

There’s no reason why future, at least, couldn’t be one; there’s an issue and multiple linked PRs open. AFAIK, it is merely due to being mostly unmaintained in the past few years, and usage will continue to drop as its original purpose—making code cross compatible with Python 2 and Python 3—fades away. But it if the maintainer pops up again, it seems like it would be a fairly straightforward matter.

PySpark appears to have a non-trivially complex build process that requires building/running against the Spark JARs of the existing Spark version, which may or may not be straightforward to incorporate into a wheel.

sethmlarson · February 22, 2024, 10:14pm

Noticed that the future package now has a wheel, so now only PySpark remains wheel-less.

groodt · February 22, 2024, 11:20pm

I noticed too. They merged my PR to package wheels and have EOL future as 1.0.0!

On the topic of “fat” projects like pyspark, I don’t really think it can be realistic for these kinds of packages to publish wheels in the way that they currently think about packaging and distribution. They are really using “setup.py” as a sort of post-install or configuration hook, to link to user provided versions of Hadoop and Spark, as well as to sniff the environment to detect if they’re being installed into Spark as far as I can tell.

Im not really sure what the guidance for these kinds of projects could be realistically. I guess it would either require them to change their UX in a breaking way such as pip install pyspark; python -m pyspark-optionsal-postinstall, or possibly there could be some kind of postinstall hook added to the ecosystem, but that seems like it become complex or dangerous if code is executed after install (although, it’s no different from executing code during installation of sdist, which happens at the moment I guess).

woodruffw · February 26, 2024, 5:40pm

Maybe it’s time for a tau version of pythonwheels.com

groodt · February 26, 2024, 6:29pm

Hmmmm… you make a good point.

360 packages is only a drop in the ocean. I wonder what the top 80% of packages would be. Or what the wheel publish ratio looks like for top-500 or top-1000 etc

I don’t think the site is actively maintained at the moment. It also might feel that it has served its purpose or not want to increase the maintenance burden on projects by nagging for wheels.

I still think it’s useful information to have and would like to see the site (or another site) expanded to more packages.

dustin · February 26, 2024, 7:07pm

At this point, it might make more sense to reframe the question “what popular projects have wheels?” as “which projects does PyPI serve the most source distributions, and why?”. The first part can be determined from the PyPI public dataset:

SELECT
  DISTINCT(file.project) AS PROJECT,
  COUNT(*) AS download_count
FROM
  `bigquery-public-data.pypi.file_downloads`
WHERE
  file.project IS NOT NULL
  -- Only query the last 30 days of history
  AND DATE(timestamp) BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
  AND CURRENT_DATE()
  -- Exclude "bdist_wheel" type files
  AND file.type != "bdist_wheel"
GROUP BY
  file.project
ORDER BY
  download_count DESC
LIMIT
  30;

And the results are:

PROJECT	download_count
aiobotocore	120162693
future	30493677
pyspark	29006255
pyyaml	25066204
sagemaker	23172882
psycopg2	14646970
thrift	11544599
docopt	10679116
antlr4-python3-runtime	10310836
pysftp	9810676
pycrypto	8858308
psutil	8471828
protobuf3-to-dict	6946392
gsutil	6628788
ratelimit	6301285
avro-python3	6221579
fire	6113813
unicodecsv	6075228
sklearn	6010426
avro	5936646
starkbank-ecdsa	5608849
mysqlclient	5410320
databricks-cli	5145206
crcmod	4926319
stringcase	4918032
numpy	4806773
markupsafe	4789170
pendulum	4786522
kfp	4738670
wrapt	4530536

The second is a bit harder to determine. For some of these (like aiobotocore), it looks like many folks are still using old releases that don’t have wheels. For others (like pyyaml), it looks like the wheels that are released don’t cover enough platforms/architectures.

But overall, it would probably be a greater impact on the overall ecosystem if we were focused on figuring out how to get more users using wheels for these projects, rather than trying to get other projects that are less widely used to generate wheels.

hugovk · February 27, 2024, 6:59am

A quick one-off build with 1k, 973/1000 have wheels: https://hugovk.github.io/pythonwheels/

methane · February 27, 2024, 7:29am

Hi, this is mysqlclient maintainer.

mysqlclient provides only Windows wheel because user can chose libmysqlclient (or libmariadbclient). User can install such libraries from Linux distribution package or from MySQL/MariaDB.

Maybe, I can create package like mysqlclient-libmysql that contains libmysqlclient binary and Provides-Dist: mysqlclient metadata. But it is too complex to me. Both of libmysqlclient and libmariadb depending on OpenSSL. So I need to bundle at least OpenSSL.

I will consider providing macOS/Linux wheel with libmariadb like Windows for convenience. Recent libmariadb support MySQL well.

Anyway, I think many packages don’t provide binary wheel because of similar reason. Bundling dependency library is hard in technical or licence reason.

CAM-Gerlach · February 27, 2024, 8:22am

As noted in the core metadata spec, Provides-Dist isn’t supported by any mainstream tool and doesn’t have a clear use case on a repository like PyPI.

Probably the simplest approach might be just bundling a “default” library in the wheel and instructing users who want to link to their own system-provided libraries to install with --no-binary. However, users expecting to get the version built from source against their local libraries might be confused why they aren’t getting that, at least at first.

groodt · February 27, 2024, 9:42am

Wow! Thanks! That’s actually a lot higher than I expected! Awesome!

I guess another thing we could do to make the metric more meaningful would be to increase the difficulty to only consider wheels for all tier 1 and 2 platforms as successful. Ignoring wasm for now.

CAM-Gerlach · February 27, 2024, 9:53pm

Or maybe just the big three, windows-msvc, apple-darwin and linux-gnu on x86-64. Or break it down for seperate numbers by platform, e.g. a lot of projects (like NumPy) have dropped support for i686-pc-windows-msvc, and CI support for macOS arm64 was just introduced so that’s just starting to really get going.

groodt · February 28, 2024, 6:07am

Can that all be sniffed from the Simple API or wheel tags? Those don’t look like familiar platform tags to me. Those look like the CPython platforms?

So which specific metadata fields should we be looking at?

johnthagen · February 28, 2024, 1:35pm

For psutil, I know there are a lot of ARM containers that use it, for which a wheel would reduce those download numbers (Mac ARM Docker use, for example)

[Linux][aarch64] Wheel support for aarch64 Linux · Issue #1972 · giampaolo/psutil · GitHub

CAM-Gerlach · February 28, 2024, 5:31pm

Sorry for any confusion—those are the platform triples for CPython itself pulled from PEP 11, since that’s what you were referencing. Yeah, all the important ones map to wheel tags, and presumably the output would be displayed by wheel tag rather than platform triple. And the wheel tags of artifacts (or at least the wheel filenames which include them) can be retrieved various ways from several PyPI APIs (old JSON, new JSON, HTML simple API, etc).

edgarrmondragon · February 28, 2024, 9:50pm

For avro, it seems the maintainers were simply unaware that they weren’t publishing wheels: AVRO-2399: Use Wheels for Python Distribution by kojiromike · Pull Request #766 · apache/avro · GitHub.

They’ll hopefully start doing it soon.