358 most popular Python packages have wheels

Vanity metric, but still cool and interesting in my opinion.

There are now 358 out of 360 most popular python packages have wheels available. https://pythonwheels.com/

The 2 non-wheel packages are:

  • future
  • pyspark

I’m not even sure if those could ever be wheels without changing packaging standards.

2 Likes

“future” appears in quite a lot of dependency closures in my experience.

It’s a bit of a shame that “future” is not a wheel. In theory, you can certainly write a lot of useful software if your dependency closure only includes those top 358 packages.

If all 358 have PEP 658 metadata, it should be possible to do a very fast dependency solve purely using static metadata without any intermediate package downloads or builds. Could even be done without a Python interpreter ala GitHub - prefix-dev/rip: Solve and install Python packages quickly with rip (pip in Rust)

1 Like

It’s annoying that the list of wheels on that page can’t be cut & pasted, otherwise I could relatively quickly check. But I assume that most, probably all of those 358 will have PEP 658 metadata for their latest versions, at least. Unfortunately, for many resolves, looking at older versions is necessary and I don’t think the “backfill” exercise to add metadata for older wheels has been done yet.

But yes, it’s great news that we’re reaching a point where a significant number of real-world installs can be completed with static data, only downloading what needs to be installed.

3 Likes

Maybe this can help: https://pythonwheels.com/results.json

Ah cool. I didn’t spot a mention to that file. There’s actually 77 of the 358 which don’t have static metadata - presumably because they haven’t had a release since PyPI started extracting metadata from wheels.

A few spot checks suggests that’s the case.

This is the script I used to get upload times.

def get_upload_times(pkg):
    ACCEPT = "application/vnd.pypi.simple.v1+json"
    url = f"https://pypi.org/simple/{pkg}/"
    rsp = requests.get(url, headers={"Accept": ACCEPT})
    data = rsp.json()
    return [f["upload-time"] for f in data["files"] if f["filename"].endswith(".whl")]

This is the list of packages with no metadata:

adal
aioitertools
aiosignal
appdirs
asn1crypto
asynctest
azure-common
backoff
backports-zoneinfo
cinemagoer
colorama
coloredlogs
contextlib2
crashtest
decorator
entrypoints
et-xmlfile
gast
google-crc32c
google-pasta
h11
httplib2
humanfriendly
imdbpy
iniconfig
installer
isodate
itsdangerous
jeepney
jmespath
matplotlib-inline
mccabe
mdurl
msrest
msrestazure
multidict
mypy-extensions
oauth2client
oauthlib
openpyxl
oscrypto
parso
pkginfo
pkgutil-resolve-name
ply
ptyprocess
py
py4j
pyasn1-modules
pycparser
pynacl
pyproject-hooks
pysocks
python-dateutil
python-dotenv
python-json-logger
pytzdata
requests-aws4auth
requests-file
requests-oauthlib
requests-toolbelt
rfc3339-validator
rsa
scramp
secretstorage
six
sniffio
sortedcontainers
sqlparse
tabulate
toml
tomli
toolz
uritemplate
webencodings
xlrd
xmltodict
1 Like

There’s no reason why future, at least, couldn’t be one; there’s an issue and multiple linked PRs open. AFAIK, it is merely due to being mostly unmaintained in the past few years, and usage will continue to drop as its original purpose—making code cross compatible with Python 2 and Python 3—fades away. But it if the maintainer pops up again, it seems like it would be a fairly straightforward matter.

PySpark appears to have a non-trivially complex build process that requires building/running against the Spark JARs of the existing Spark version, which may or may not be straightforward to incorporate into a wheel.

Noticed that the future package now has a wheel, so now only PySpark remains wheel-less.

7 Likes

I noticed too. They merged my PR to package wheels and have EOL future as 1.0.0!

On the topic of “fat” projects like pyspark, I don’t really think it can be realistic for these kinds of packages to publish wheels in the way that they currently think about packaging and distribution. They are really using “setup.py” as a sort of post-install or configuration hook, to link to user provided versions of Hadoop and Spark, as well as to sniff the environment to detect if they’re being installed into Spark as far as I can tell.

Im not really sure what the guidance for these kinds of projects could be realistically. I guess it would either require them to change their UX in a breaking way such as pip install pyspark; python -m pyspark-optionsal-postinstall, or possibly there could be some kind of postinstall hook added to the ecosystem, but that seems like it become complex or dangerous if code is executed after install (although, it’s no different from executing code during installation of sdist, which happens at the moment I guess).

Maybe it’s time for a tau version of pythonwheels.com :slightly_smiling_face:

1 Like

Hmmmm… you make a good point.

360 packages is only a drop in the ocean. I wonder what the top 80% of packages would be. Or what the wheel publish ratio looks like for top-500 or top-1000 etc

I don’t think the site is actively maintained at the moment. It also might feel that it has served its purpose or not want to increase the maintenance burden on projects by nagging for wheels.

I still think it’s useful information to have and would like to see the site (or another site) expanded to more packages.

At this point, it might make more sense to reframe the question “what popular projects have wheels?” as “which projects does PyPI serve the most source distributions, and why?”. The first part can be determined from the PyPI public dataset:

SELECT
  DISTINCT(file.project) AS PROJECT,
  COUNT(*) AS download_count
FROM
  `bigquery-public-data.pypi.file_downloads`
WHERE
  file.project IS NOT NULL
  -- Only query the last 30 days of history
  AND DATE(timestamp) BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
  AND CURRENT_DATE()
  -- Exclude "bdist_wheel" type files
  AND file.type != "bdist_wheel"
GROUP BY
  file.project
ORDER BY
  download_count DESC
LIMIT
  30;

And the results are:

PROJECT download_count
aiobotocore 120162693
future 30493677
pyspark 29006255
pyyaml 25066204
sagemaker 23172882
psycopg2 14646970
thrift 11544599
docopt 10679116
antlr4-python3-runtime 10310836
pysftp 9810676
pycrypto 8858308
psutil 8471828
protobuf3-to-dict 6946392
gsutil 6628788
ratelimit 6301285
avro-python3 6221579
fire 6113813
unicodecsv 6075228
sklearn 6010426
avro 5936646
starkbank-ecdsa 5608849
mysqlclient 5410320
databricks-cli 5145206
crcmod 4926319
stringcase 4918032
numpy 4806773
markupsafe 4789170
pendulum 4786522
kfp 4738670
wrapt 4530536

The second is a bit harder to determine. For some of these (like aiobotocore), it looks like many folks are still using old releases that don’t have wheels. For others (like pyyaml), it looks like the wheels that are released don’t cover enough platforms/architectures.

But overall, it would probably be a greater impact on the overall ecosystem if we were focused on figuring out how to get more users using wheels for these projects, rather than trying to get other projects that are less widely used to generate wheels.

16 Likes

A quick one-off build with 1k, 973/1000 have wheels: https://hugovk.github.io/pythonwheels/

5 Likes

Hi, this is mysqlclient maintainer.

mysqlclient provides only Windows wheel because user can chose libmysqlclient (or libmariadbclient). User can install such libraries from Linux distribution package or from MySQL/MariaDB.

Maybe, I can create package like mysqlclient-libmysql that contains libmysqlclient binary and Provides-Dist: mysqlclient metadata. But it is too complex to me. Both of libmysqlclient and libmariadb depending on OpenSSL. So I need to bundle at least OpenSSL.

I will consider providing macOS/Linux wheel with libmariadb like Windows for convenience. Recent libmariadb support MySQL well.

Anyway, I think many packages don’t provide binary wheel because of similar reason. Bundling dependency library is hard in technical or licence reason.

4 Likes

As noted in the core metadata spec, Provides-Dist isn’t supported by any mainstream tool and doesn’t have a clear use case on a repository like PyPI.

Probably the simplest approach might be just bundling a “default” library in the wheel and instructing users who want to link to their own system-provided libraries to install with --no-binary. However, users expecting to get the version built from source against their local libraries might be confused why they aren’t getting that, at least at first.

1 Like

Wow! Thanks! That’s actually a lot higher than I expected! Awesome!

I guess another thing we could do to make the metric more meaningful would be to increase the difficulty to only consider wheels for all tier 1 and 2 platforms as successful. Ignoring wasm for now.

1 Like

Or maybe just the big three, windows-msvc, apple-darwin and linux-gnu on x86-64. Or break it down for seperate numbers by platform, e.g. a lot of projects (like NumPy) have dropped support for i686-pc-windows-msvc, and CI support for macOS arm64 was just introduced so that’s just starting to really get going.

4 Likes

Can that all be sniffed from the Simple API or wheel tags? Those don’t look like familiar platform tags to me. Those look like the CPython platforms?

So which specific metadata fields should we be looking at?

For psutil, I know there are a lot of ARM containers that use it, for which a wheel would reduce those download numbers (Mac ARM Docker use, for example)

1 Like

Sorry for any confusion—those are the platform triples for CPython itself pulled from PEP 11, since that’s what you were referencing. Yeah, all the important ones map to wheel tags, and presumably the output would be displayed by wheel tag rather than platform triple. And the wheel tags of artifacts (or at least the wheel filenames which include them) can be retrieved various ways from several PyPI APIs (old JSON, new JSON, HTML simple API, etc).

For avro, it seems the maintainers were simply unaware that they weren’t publishing wheels: AVRO-2399: Use Wheels for Python Distribution by kojiromike · Pull Request #766 · apache/avro · GitHub.

They’ll hopefully start doing it soon.

3 Likes