Use of PyPI as a generic storage platform for binaries

There is an ongoing practice of storing non-Python binaries on PyPI, for example cmake which can be useful to download an up-to-date CMake when building Python packages on old platforms.

However, it has just been proposed to industrialize this using a dedicated tool to transform Go binaries into PyPI wheels, for the sake of easy local execution using uvx.

I was wondering if this had already been discussed among PyPI maintainers, and if there’s an official stance about this practice? Should it be discouraged?

15 Likes

I’m sure someone more official will jump in, but I remember having this discussion in the past and the general policy was “PyPI is for Python packages and we’ll carefully ignore non-Python tools that are directly used to produce Python packages”. Basically, allowing things like pybind11 while discouraging the use of PyPI for, well, Go.

It shouldn’t be too complex to set up a dedicated index for Go tools, if they want to use them with uvx. But I don’t think that index should be PyPI.


The closest thing I see to a written policy is disallowing “off-topic” content in the acceptable use policy. Which presumably the author of that post knows about, which is why he goes to some length to justify that it’s now on topic (“we can use Go binaries as dependencies for other Python packages now”).

6 Likes

I think it’s an interesting use case we should dig deeper into. In some sense it shows that the model of uvx + wheels is a powerful combination that definitely fills a need in the software distribution story.

We should understand more deeply the pros and cons, and consider if there are ways we can mitigate the risks. What kind of usages patterns can we enable more directly. F.e. are PyPI’s size restrictions really useful? There’s already ways to work around some of these restrictions, and other proposals have been made[1].


  1. and withdrawn ↩︎

5 Likes

My concern was less about technical limits [1] and more about the human cost of ongoing PyPI maintenance, which is presumably correlated to the number of packages. And the idea of (ab)using a community-maintained resource for unrelated purposes feels disturbing to me.


  1. Though Go binaries might easily be large, by virtue of linking all dependencies statically ↩︎

11 Likes

I don’t know if there is a correlation there, but the PSF/PyPI admins can shed some light on that. Even without the correlation to number of packages, there definitely is both a human and financial cost to maintaining the PyPI service. On the operational side, there’s at least the support cost, storage costs, and bandwidth costs all of which are borne by the PSF or its sponsors.

There’s also the cost of maintaining and developing the warehouse software that runs PyPI, which has been on my mind lately. I don’t think we are at that point, but much like CPython, warehouse should have a plan for recruiting, mentoring, and promoting folks who want to contribute to the code base, so we never get to a point where we don’t have a healthy crew to maintain it.

2 Likes

I’d doubt this will really work for more than Rust or Go (maybe C#) with their heavy static linking and single-file native binaries. Given how hard sharing native libraries or runtimes between wheels is (that reason why Conda exists), wheels aren’t going to solve distribution of arbitrary software. I’m somewhat surprised that even CMake fits in a wheel.

If Go/Rust users want a free-access repository with a flat namespace, platform tags and a command line tool just smart enough to fetch and run zero-dependency applications that fit in zip files then that should specifically be a Rust/Go effort. We’re not going to solve cross-language cross-platform package management with this.

2 Likes

I’ve expressed concern in the past with distributing lightweight Python package wrappers around Javascript libraries too. Some people find it a convenient way to indicate JS dependencies in their Python projects and get them installed automatically. The biggest risk that I’m concerned about for any of these cases though is security updates.

The people maintaining Python package wrappers are often not the same people maintaining the upstream non-Python projects, so if they disappear and stop updating the wrapper packages then users who relied those are left in the lurch and may not realize they’re running outdated copies full of known exploitable vulnerabilities (which already happens today with some of the wrapped JS libs).

7 Likes

I don’t think we (PyPI) have an official stance about this practice, but a quick temperature check among us suggested that we were generally aligned on:

If there’s a benefit to the Python community to having something distributed on PyPI, then that’s OK. If there’s not really a benefit to the Python community and someone is just using PyPI as an alternative to Homebrew, then that’s something that would probably be discouraged.

An official stance would likely be a reasonable addition to the AUP, but I think the above gives a pretty good guideline.

23 Likes

In part:

Using PyPI as a distribution platform for Go binaries feels a tiny bit abusive, albeit there is plenty of precedent.

I’ll justify it by pointing out that this means we can use Go binaries as dependencies for other Python packages now.

That’s genuinely useful! It means that any functionality which is available in a cross-platform Go binary can now be subsumed into a Python package. Python is really good at running subprocesses so this opens up a whole world of useful tricks that we can bake into our Python tools.

Using it to install a binary and calling it via subprocess.run() does feel more on the “alternative to Homebrew” side, even though it could of course also be beneficial to the Python community if the binary is not otherwise available…

2 Likes

For me, there’s also a question over whether the people who generously provide the resources on which PyPI runs (Fastly is the name I know, there are others) are happy with their resources being used for purposes that aren’t in support of providing a thriving Python library ecosystem.

4 Likes

So does ripgrep · PyPI fall under that? It’s in no way Python-specific, but certainly there are a decent chunk of Python developers who use ripgrep.

My guess is the big sponsors are good with it as they are all probably Go and Rust users themselves. It’s the mid- to small-sized sponsors caring.

1 Like

I’m quite surprised by this as it’s almost trivial to install ripgrep on any modern platform, without going through PyPI. :person_shrugging:

6 Likes

I think separately from using PyPI to distribute user-facing non-Python tools like ripgrep, there’s a usecase for Python packages that have development-only dependencies on non-Python tools. It’s nice to be able to bring in, for example, cpplint as a development dependency when working on pybind11-based projects. That cpplint · PyPI exists makes that quite easy; a contributor can then just run pdm sync, uv sync or similar in order to work on the package.

By comparison, if I want to take a development-only dependency on markdownlint, which is distributed primarily via npm, I don’t have a good way of specifying that dependency in pyproject.toml, such that out-of-band tooling is needed for a contributor to get that up and running.

4 Likes

Between the external dependencies PEP and the wheel variants PEP, even that may not be true forever.

A couple of specific cases I’m aware of where it’s useful to have things not written in Python available via PyPI:

  • uv itself (both as a dev dependency and as a runtime dependency for a project like venvstacks)
  • ziglang as a dependency for build backends (cmake feels like a comparable use case)

“Written in Python” and “Used by at least some Python developers as part of their Python development activities” is going to cover an awful lot of software. I don’t think it’s actually a bad thing myself, so long as any larger entities encouraging it realise that the service isn’t maintained by magic internet pixies and actively support the PSF in keeping it running.

Getting Started - Warehouse Developer Documentation is in a pretty good place, and the PSF usually has at least one developer in residence actively working on PyPI maintenance & enhancements these days.

5 Likes

Oh yes, thanks for including that link. The warehouse docs are fantastic, and setting up a dev environment[1] was very easy. I highly encourage others to take a look and give it a try.

I’m talking about something more long term. Think about CPython. If it was just us old timers maintaining it, I’d be concerned about CPython’s long-term future. But I’m not, because we have an incredible team of developers, a well-defined process of mentorship into that team, and a health growth of new members.


  1. I use macOS ↩︎

1 Like

If you did pip install markdownlint, would you expect it to download node.js? Would you expect to be able to lock to a specific version of markdownlint (even though the PyPI package may need a version bump to handle a node.js issue)? Also, I notice that GitHub - markdownlint/markdownlint: Markdown lint tool exists (which seems to be the original ruby version, and is more widely packaged than the JS version), if you got that implementation instead, would that be an issue? At what point is it easier just to port the code to Python (that’s likely the origin of the JS version)? I suspect all these answers are going to vary for different people, but it feels like this puts a lot of social cost on the PyPI ecosystem (as opposed to being able to defer this to system packaging tools, which have better tooling and rules around handling cross-ecosystem issues like this)?

2 Likes

To be absolutely clear, I’m not suggesting anything, not laying out concrete expectations for what PyPI or any other Python infrastructure should do. I’m trying to give an example of a usecase where PyPI as it currently stands is sometimes useful, and where that utility is more limited.

In particular, part of what I’m getting at is that Python excels at gluing disparate things together, such that Python packages can accumulate dependencies outside of pure Python packages. In some cases, as noted above with CMake, those development dependencies can be brought in by wrapping them as wheels and using PyPI.

That means that when I ask for contributions to my projects, I don’t need to ask contributors to download and install a specific version of CMake — the same Python virtual environment tooling that makes it easy to rely on mypy or flake8 also allows for me to use cmake without adding too much more burden on contributors.

I don’t know if that usecase is in scope or not, and I make no suggestions as to whether it should be in scope or not. It’s just a usecase that looks different enough from the ripgrep example that I wanted to chime in and offer it for whatever it’s worth.

Insofar as that goes, if obtaining non-Python development dependencies of Python packages isn’t in scope, that pushes me towards using other tools like Docker or Nix to manage those dependencies, such that contributing or building from an sdist alone becomes more difficult.

2 Likes

Either system packaging tools, or cross-platform packaging tools such as conda or pixi which, together with the conda-forge distribution and its commercial predecessor Anaconda, have been designed precisely for these problems.

Learning to use conda or pixi rather than pip or uv should really be a minor one-time annoyance, not a blocker, if you’re into SW development.

6 Likes

I’m not entirely sure what’s the compelling UX aspect here.

The article linked in the OP was promoting uvx some_tool, which is unusual in that it explicitly doesn’t permanently install the tool. It creates a temporary environment and installs the tool for each use - with caching and uv’s speed to make the cost of this ignorable. (Other tools like pipx and maybe hatch can do much the same, although maybe not as fast).

People talking about installing tools into your development environment, on the other hand, are thinking much more in terms of doing a one-off install. But in that case, the tool isn’t globally available, it’s only available through the activated environment.

I have no idea which model conda and pixi support, or if they have their own UX.

The thing is, installing tools into your development environment seems much more Python-focused, and therefore more acceptable (as in, the sorts of tools you’d distribute this way are more likely to be ones people wouldn’t mind hosting on PyPI). But the uvx model is much more based around general tools, and those aren’t (IMO) as good a fit for PyPI.

When tools are written in Python, using uvx is a really convenient approach for things you use occasionally. I use uvx norwegianblue quite a lot, but not enough to justify installing it permanently (with all the admin involved in keeping my copy up to date, etc). And actually, why should it matter to me whether a tool is written in Python or not? But having said that, where does that end? Do we want people to be able to do uvx vim to start an editor? Do we want PyPI to host all of those tools? And going back to the original point here, do we want people promoting the idea that PyPI is a good place to provide free distribution services for your non-Python tools?

Apart from the matter of people promoting the idea, I think the existing low-key “don’t worry about it unless it becomes a problem” approach is fine. But once people start actively suggesting PyPI as a distribution channel for non-Python applications, I think we probably need to make some form of response - even if it’s only to say that we’re thinking about the question and haven’t yet decided if it’s acceptable.

4 Likes

There isn’t a clear bright line though, as moving from uvx someapp to uv tool install someapp isn’t hard, and whether it’s uv tool install, pipx install, or some other tool that offers a similar capability, that usually comes with a straightforward way to update both the tools themselves and the selection of the Python runtime they’re running against.