“future” appears in quite a lot of dependency closures in my experience.
It’s a bit of a shame that “future” is not a wheel. In theory, you can certainly write a lot of useful software if your dependency closure only includes those top 358 packages.
It’s annoying that the list of wheels on that page can’t be cut & pasted, otherwise I could relatively quickly check. But I assume that most, probably all of those 358 will have PEP 658 metadata for their latest versions, at least. Unfortunately, for many resolves, looking at older versions is necessary and I don’t think the “backfill” exercise to add metadata for older wheels has been done yet.
But yes, it’s great news that we’re reaching a point where a significant number of real-world installs can be completed with static data, only downloading what needs to be installed.
Ah cool. I didn’t spot a mention to that file. There’s actually 77 of the 358 which don’t have static metadata - presumably because they haven’t had a release since PyPI started extracting metadata from wheels.
A few spot checks suggests that’s the case.
This is the script I used to get upload times.
def get_upload_times(pkg):
ACCEPT = "application/vnd.pypi.simple.v1+json"
url = f"https://pypi.org/simple/{pkg}/"
rsp = requests.get(url, headers={"Accept": ACCEPT})
data = rsp.json()
return [f["upload-time"] for f in data["files"] if f["filename"].endswith(".whl")]
There’s no reason why future, at least, couldn’t be one; there’s an issue and multiple linked PRs open. AFAIK, it is merely due to being mostly unmaintained in the past few years, and usage will continue to drop as its original purpose—making code cross compatible with Python 2 and Python 3—fades away. But it if the maintainer pops up again, it seems like it would be a fairly straightforward matter.
PySpark appears to have a non-trivially complex build process that requires building/running against the Spark JARs of the existing Spark version, which may or may not be straightforward to incorporate into a wheel.
I noticed too. They merged my PR to package wheels and have EOL future as 1.0.0!
On the topic of “fat” projects like pyspark, I don’t really think it can be realistic for these kinds of packages to publish wheels in the way that they currently think about packaging and distribution. They are really using “setup.py” as a sort of post-install or configuration hook, to link to user provided versions of Hadoop and Spark, as well as to sniff the environment to detect if they’re being installed into Spark as far as I can tell.
Im not really sure what the guidance for these kinds of projects could be realistically. I guess it would either require them to change their UX in a breaking way such as pip install pyspark; python -m pyspark-optionsal-postinstall, or possibly there could be some kind of postinstall hook added to the ecosystem, but that seems like it become complex or dangerous if code is executed after install (although, it’s no different from executing code during installation of sdist, which happens at the moment I guess).
360 packages is only a drop in the ocean. I wonder what the top 80% of packages would be. Or what the wheel publish ratio looks like for top-500 or top-1000 etc
I don’t think the site is actively maintained at the moment. It also might feel that it has served its purpose or not want to increase the maintenance burden on projects by nagging for wheels.
I still think it’s useful information to have and would like to see the site (or another site) expanded to more packages.
At this point, it might make more sense to reframe the question “what popular projects have wheels?” as “which projects does PyPI serve the most source distributions, and why?”. The first part can be determined from the PyPI public dataset:
SELECT
DISTINCT(file.project) AS PROJECT,
COUNT(*) AS download_count
FROM
`bigquery-public-data.pypi.file_downloads`
WHERE
file.project IS NOT NULL
-- Only query the last 30 days of history
AND DATE(timestamp) BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
AND CURRENT_DATE()
-- Exclude "bdist_wheel" type files
AND file.type != "bdist_wheel"
GROUP BY
file.project
ORDER BY
download_count DESC
LIMIT
30;
And the results are:
PROJECT
download_count
aiobotocore
120162693
future
30493677
pyspark
29006255
pyyaml
25066204
sagemaker
23172882
psycopg2
14646970
thrift
11544599
docopt
10679116
antlr4-python3-runtime
10310836
pysftp
9810676
pycrypto
8858308
psutil
8471828
protobuf3-to-dict
6946392
gsutil
6628788
ratelimit
6301285
avro-python3
6221579
fire
6113813
unicodecsv
6075228
sklearn
6010426
avro
5936646
starkbank-ecdsa
5608849
mysqlclient
5410320
databricks-cli
5145206
crcmod
4926319
stringcase
4918032
numpy
4806773
markupsafe
4789170
pendulum
4786522
kfp
4738670
wrapt
4530536
The second is a bit harder to determine. For some of these (like aiobotocore), it looks like many folks are still using old releases that don’t have wheels. For others (like pyyaml), it looks like the wheels that are released don’t cover enough platforms/architectures.
But overall, it would probably be a greater impact on the overall ecosystem if we were focused on figuring out how to get more users using wheels for these projects, rather than trying to get other projects that are less widely used to generate wheels.
mysqlclient provides only Windows wheel because user can chose libmysqlclient (or libmariadbclient). User can install such libraries from Linux distribution package or from MySQL/MariaDB.
Maybe, I can create package like mysqlclient-libmysql that contains libmysqlclient binary and Provides-Dist: mysqlclient metadata. But it is too complex to me. Both of libmysqlclient and libmariadb depending on OpenSSL. So I need to bundle at least OpenSSL.
I will consider providing macOS/Linux wheel with libmariadb like Windows for convenience. Recent libmariadb support MySQL well.
Anyway, I think many packages don’t provide binary wheel because of similar reason. Bundling dependency library is hard in technical or licence reason.
As noted in the core metadata spec, Provides-Dist isn’t supported by any mainstream tool and doesn’t have a clear use case on a repository like PyPI.
Probably the simplest approach might be just bundling a “default” library in the wheel and instructing users who want to link to their own system-provided libraries to install with --no-binary. However, users expecting to get the version built from source against their local libraries might be confused why they aren’t getting that, at least at first.
Wow! Thanks! That’s actually a lot higher than I expected! Awesome!
I guess another thing we could do to make the metric more meaningful would be to increase the difficulty to only consider wheels for all tier 1 and 2 platforms as successful. Ignoring wasm for now.
Or maybe just the big three, windows-msvc, apple-darwin and linux-gnu on x86-64. Or break it down for seperate numbers by platform, e.g. a lot of projects (like NumPy) have dropped support for i686-pc-windows-msvc, and CI support for macOS arm64 was just introduced so that’s just starting to really get going.
For psutil, I know there are a lot of ARM containers that use it, for which a wheel would reduce those download numbers (Mac ARM Docker use, for example)
Sorry for any confusion—those are the platform triples for CPython itself pulled from PEP 11, since that’s what you were referencing. Yeah, all the important ones map to wheel tags, and presumably the output would be displayed by wheel tag rather than platform triple. And the wheel tags of artifacts (or at least the wheel filenames which include them) can be retrieved various ways from several PyPI APIs (old JSON, new JSON, HTML simple API, etc).