The 10+ year view on Python packaging (what's yours?)

[I have not read @BrenBarn’s post, too long, sorry. But I hope it is fair to answer the question in the title anyway.]

One item on my wish list is interoperability between (packaging) ecosystems. For me this means interoperability between for example the (packaging) ecosystems of Python, Node.js, Rust, as well as system packaging ecosystems such as apt, yum, brew and so on.

When I see initiatives like PyBI, PEP 725, purl, and others, I can’t help but think what if I could npm install python Django or pip install nodejs reactjs? Wouldn’t it be great if pip could install gcc in order to build C extensions? Ideally there would be installers that work across ecosystems. I am tired of having things installed with pipx, others with conda, some with apt (I have managed to avoid brew and nix so far) and losing track of what should be updated when and where.

1 Like

There’s ziglang · PyPI. If you do pip install ziglang you get a C/C++ compiler. And cmake is pip installable as well.

1 Like

True. I have not seen a project that lists ziglang in its build-system.requires to build its C extension (I have not looked for it). Does it exist? Is it possible?

You might be interested in PEP 725: Specifying external dependencies in pyproject.toml which is about specifying dependencies outside of the Python ecosystem.

3 Likes

Sure. Most of the post is just me trying to bring together what a few different people (including me) thought about the topic, so more is welcome.

Conda can install nodejs, R, etc. It looks like there’s even a conda package for gcc although I haven’t really used that.

However, at least right now as I understand it this isn’t really “interoperability”, it’s more just people repackaging things for conda. It’s not that conda can use npm to get npm packages, it’s just you can use conda to install node and then use that npm to install npm packages. It seems to me that having true interop, where different language package managers could actually install one another’s packages, would be quite a heavy lift. Of course, this thread is about dreaming, so yeah, it would be cool. :slight_smile:

That is indeed a pain point with the current situation. It sounds like what you’re describing is a situation where each package “doesn’t care” which manager (conda/apt/etc.) installed it, and the different managers themselves would be largely equivalent. That would be pretty awesome although again, a bit of a reach. I feel like to at least get parity among Python-oriented tools would be a start.

Also, I tend to be a little uncertain about a situation where there are multiple tools that have total interop. If (in the imagined future) every tool can do everything the others can do, why are there multiple tools instead of one? And if different ones can do different things, how does interop work between a tool that can do X and one that can’t? It seems that usually in this situation what winds up happening is that some information gets lost in the transfer of information between tools. I could imagine different tools that just provide different UX layers over the same base functionality though (sort of like different email clients or web browsers).

I do fear this could lead to a snarl of dependencies that would become even harder for people to unravel when they encounter problems (if I pip install with python I installed via an npm I installed with conda where does that package go??).

A version of this I think might be somewhat more attainable is if a packaging tool could delegate to a more specific tool–i.e. if all you’re going to do is install a wheel, then delegate that installation to pip [1].

We can sort of see this in the conda ecosystem–many recipes are just wrapping the PyPI package with metadata. conda list will tell you if a package came from pip but it doesn’t recognize those as “already installed”.

I feel like delegation is more attainable than agreeing on an interoperable standard[2], because a tool like pip could define its own delegation interface and let other tools adopt it if they want to [3]. Talking to the relevant parties can’t hurt, but it doesn’t require a consensus to demonstrate value.

Perhaps this is already possible in the interaction between conda and pip, for instance, but it’s not quite how it works right now (and I’m sure there are good reasons why it’s difficult)


  1. or a hypothetical future Unified PyPA Tool, if one arises ↩︎

  2. there are just so many differences of opinion about How Things Should Be ↩︎

  3. or rather, I should say that a PEP could define this for python packaging ↩︎

1 Like

[Thanks for the follow-up answers on my post. I do not want to discuss here whether or not it is feasible or how it would look like technically and in terms of UX or why it is the way it is now, I’d rather we do it on a dedicated thread if we really feel like discussing this now. But I still want to add to it…]

I guess my wish in the long term is that Python packaging stops being just about Python. And my intent is not to single out Python, I wish the same from other packaging ecosystems as well. For example I have things installed with Node.js npm and Ruby gem, that are not kept up-to-date because my operating system does not tell me that there are updates available. These are applications that are not available in apt repositories (or flatpak or appimage or snap as far as I can tell). So in the case of Python, I wish in the long term that the packaging story would not stop at “it is installable with pip, our work is done” (tongue in cheek).

That things like purl are being worked is exciting to me. I am also glad there is something like PackagingCon.

3 Likes

This is a magical project. Recently at work I upgraded all of our code generation to use Pydantic v2 which didn’t have wheels for CentOS 6 so it had to build from source but the minimum supported version of Rust required a newer version of glibc so I used this (among other things) to target a specific version of glibc.

It’s so impressive to me.

Here’s my vision. It matches the OP vision in some parts, but differs in others (particularly when it comes to managing interpreters).

  1. There exists the One True Packaging Tool that is useful for 95+% of use-cases, used by beginners and advanced users alike (and it ships with Python).
  2. The One True Packaging Tool handles tasks of both package authors (creating wheels, uploading to PyPI, installing in development mode, creating packages with C extensions) and package users (installing dependencies, creating lockfiles).
  3. The One True Packaging Tool comes with a npx/pipx-esque helper utility that can run an application written in Python without exposing the Python packaging world to the user.
  4. Something like node_modules/PEP 582 is built into Python. The implementation is built in a way that lets 90+% of users avoid using virtual environments (e.g. by searching for the __pypackages__ directory in parent directories, allowing a __pypackages__ directory to prevent searching the OS site-packages, ignoring packages currently in site-packages when installing into __pypackages__, having a simple way to run scripts installed by packages from anywhere in the filesystem).
  5. The One True Packaging Tool does not manage Python interpreters (it does not install new interpreters).
  6. The One True Packaging Tool is written in Python, welcoming contributions from a wide audience.
  7. There is a One True Interpreter Tool that manages Pythons. It can be written in Rust or whatever.
  8. Beginners and people who don’t have specific requirements can skip the Interpreter Tool entirely and use installers from python.org or distro packages.
  9. The OTPT can integrate with the OTIT (prompting the user to install a new interpreter). This feature can be disabled or discouraged by distributors or system administrators to avoid having multiple random Pythons on a system. (You wouldn’t want anything to install extra interpreters into the python:3.11 Docker image, for example.)
  10. Neither of those two tools supports installing gcc, nodejs, or react. The Python tools are responsible for Python only. Installing gcc/node should be done via apt or brew or nix, and installing react should be done via npm.
  11. Third-party tools (such as conda) can support installing gcc and numpy with the same user experience and within the same project context. Their Python support should be built on top of the One True Packaging Tool.
3 Likes

I’m not going to invest too much time on this thread, and I’ve expanded on this idea many times before, but my “10+ year hope” is that package consumers can move towards generic package managers and repositories (similar to a Debian or a conda-forge). Both package developers and those responsible for creating the repositories (usually “redistributors”) should move toward the language-specific tooling.

That makes a service like PyPI more about being the canonical source for redistributors to get the correctly tagged source code in a way that lets them rebuild it and distribute reliable/compatible builds to the final users.

It makes a service like PyPI less about distributing ready to use binaries for every possible platform, and reduces the burden on package developers to produce binaries for a million different operating system/configurations.

And it gives users their choice of tooling for managing their entire project, without forcing them to cobble together working environments from a hand-crafted mix of platform specific and language specific tools.

8 Likes

This isn’t a disagreement, just a data point. On a project I work on, we
recently got told that one Linux distro had made a policy decision to no
longer pull sdists from PyPI for packaging purposes. They will pull a
source tarball from GitHub or other designated official download site
for the project.

I don’t know if this indicates a trend away from the direction you list,
or if it’s an outlier; or if it would change back if the state
“improves” via the work being done by this community.

3 Likes

Seems to me more like a 2-3 years view (5 years max.). Seems like nearly everything mentioned already exists, has already been tried but failed, or is well underway (whether or not this view is the one that gets standardized and/or becomes mainstream is a different question).

1 Like

Yeah, I was aware of this. My hope would be that we’d develop the language-specific tooling that encourages them to prefer PyPI (e.g. making it easy to build sdists in their existing context rather than re-downloading dependencies in a separate environment outside of their control; also having strong provenance between the upstream code repositories and the sdist, so that redistributors trust that they’re getting the same files).

2 Likes

That to me is the key reason why – as someone often choosing sources for a project to be rebuilt in conda-forge – I prefer github. The only downside is that git(hub) does not populate submodules in the archives, but it avoids a potential supply chain attack vector (it’s currently completely opaque how the sdists get formed), and spares me from dealing with some non-standard transformations people come up with (mostly only relevant when there’s something to patch).

I agree with @steve.dower’s overall direction BTW[1]


  1. I haven’t had time to write a proper response for this thread. Perhaps later ↩︎

I agree that some kind of delegation seems promising. This was mentioned on some of the other packaging threads. For instance, if pip and/or conda could talk to each other and effectively ask “what dependencies would you need to install if I asked you to install this package?”

What do you see as the advantage of separating these?

Most of the other parts of your vision make sense to me and (as your noted) are similar to mine and the ones I quoted in the initial post.

The node-like/__pypackages__ model is a tough one for me. On the one hand, a lot of people seem to want it. On the other hand, I think some aspects of it are better suited to JS than Python just because of how the Python import mechanism works (i.e., you import with a bare name, which has to be an index into some kind of global import path). Back on the one hand again, it might be nice if some aspects of the Python import mechanism could be tweaked to make this less painful (e.g., the infamous “relative imports don’t work when running a script” problem).

Back on the other hand, I think the __pypackages__ approach, if it is good, is only good for a particular development context, which is where you’re working on a “project” that is within a single directory, and you’re okay with not sharing an environment across multiple “projects”. This is a common case, to be sure, but it does leave out the common academic/data-science situation where you have a more-or-less persistent environment that may be used for many different projects. I also think the devil is the details as far as how __pypackages__ would work (e.g., I think some kind of parent-directory search would be needed for people to feel like they really didn’t need a venv).

Another point listed in both your and @jeanas’s lists has to do with project management. It seems to me that a big reason for Poetry’s success is the way it links environment management with project management, so a single poetry add does two things: it installs the library into the environment, and it also updates your pyproject.toml to record the fact that your project needs that library installed. This particular task isn’t handled directly by pip or conda. (Personally, though, it still seems to me that it would be easier to add such a feature to a conda-like tool than to add some of conda’s more powerful features to a pip- or poetry-like tool.)

This sounds great insofar as it reduces the pain for package consumers. :slight_smile: That said, one practical problem with Debian and conda-forge is the relatively long wait time for packages to be added. But perhaps this is the price to pay for having a more “curated” set of packages, though (in the sense that, e.g., you know an installed package will not have irremediably broken metadata).

Also, relatedly, there seems to be a slight trend in the Linux world towards things like Flatpaks which try to circumvent the distro-packager part of the process. In part this is because of just what you said, people not wanting to produce binaries for every combination of configurations. But it’s happening even when the application authors aren’t the ones who have to do that. Even when the responsibility for packaging is pushed to the distro maintainers, that can rebound on users if it means they have to wait for the distro maintainers to react to a new source release by the app author.

Theoretically, I guess, the solution to this is automation — no one has to wait very long for anything if all the code is just getting auto-built for each package manager and not waiting on a human to repackage it. But of course that requires resources.

Package management is a common end-user/developer-level task performed often. Interpreter management is a sysadmin-level task [1] performed rarely.

Package management is needed ~everywhere Python code is run. Interpreter management is not needed in Docker containers, or in environments where a single Python version provided by the OS or by python.org installers is good enough (or is covered by an expensive RHEL support contract). The interpreter manager might be useless on niche platforms which require special build steps and which don’t have binary packages available.

Package managers can (should) be written in Python. Interpreter managers can’t be written in Python (or else you need a different tool to bootstrap the first Python interpreter). Package managers written in Python can leverage metadata and configuration exposed by the interpreter, and can have more contributors than things written in Rust. (And frankly, if Python needs a Rust program to manage its packages, is Python a good choice for your code?)

I agree, __pypackages__ needs .git-style recursive search to be usable.


  1. I realise most developers work on single-user machines with root access, and I also realise interpreter managers can install to the home directory, but it’s still an administrative task. ↩︎

As a developer I often have to test and reproduce issues with different version of Python and rely on conda’s ability to manage interpreter versions nearly daily, as do many of my colleagues.

And frankly, if Python needs a Rust program to manage its packages, is Python a good choice for your code?

This is hardly a compelling argument. Do you regard NumPy (written mostly in C) as a reason Python is not a good choice for numerical code?

10 Likes

NumPy is written in C for performance reasons. NumPy needs C to access SIMD instructions and other low-level things. NumPy is a specialist library that performs CPU-bound operations and that is used multiple times in your average program. You couldn’t write NumPy in Python (at least not if you care about any sort of performance).

A package manager is a console application. All it needs to do is download stuff from the Internet, parse some serialized data, copy some files around, and other simple tasks like this. This is generally IO-bound, not CPU-bound. Those are simple and common tasks that should be doable in any programming language. pip, npm, cargo, rubygems, maven, gradle, nuget are all written in the languages they are managing packages for[1]. Choosing to write the One True Python Package Manager in Rust would suggest that Python cannot be used for a simple console application performing basic tasks.


  1. Or at least in the languages for the VM they are primarily used with (Gradle is Groovy+Java+Kotlin and can help with other JVM languages too). ↩︎

1 Like

Dependency constraint solvers are written in C and Rust for performance reasons. [1]


  1. Conda had a pure-Python constraint solver in 2012, until it didn’t scale ↩︎

7 Likes

You may be underestimating what a package manager has to do. In particular, if it’s going to do dependency resolution (and people will want it to), then it may have to solve rather difficult mathematical problems that will indeed be CPU-bound.

I don’t think that is true. The reason is simple: many, many end-users/developers will eventually be in a situation where they have an old project using Python 3.X, but also start working on a new project that needs (or uses a library that is new enough to need) a later Python 3.Y. And they don’t want to upgrade the Python version in the old project, because that will require updating a bunch of libraries as well, and they don’t want to disturb the working configuration they have.

That situation is unsolvable without Python version management. Moreover, even if you don’t absolutely need to have the later version for the second project, you might still want to get it, or just to try something out, without having to screw up your carefully-prepared dependency stack in existing environments.

It’s the same reason people use venvs in the first place: they want to have independent environments, so they can, e.g., try upgrading a library in one environment to test it with a given project, or because different projects have incompatible requirements. All of those situations apply just as well to the Python version.

I see it is as much better in the long run to shift to regarding Python as “just another dependency” — because, well, it is a dependency. When you have a piece of Python code, Python is part of what it needs to run, and there are constraints on what versions of Python will work, just as there are for other libraries. It’s only the limitations of current tools that make us accustomed to managing Python separately from all the other things in the environment.

It may not be needed in such cases, but having it available does no harm. Moreover, in the long run, I don’t see any reason why Python provided by the OS, or by python.org, should not be actually a manager that manages Python. In the case of Linux distributions, for instance, I think this would go a good way towards alleviating the problems we currently have with “the system Python”. If “the system Python” is just an environment among many, then it becomes much easier to explain to people that they should create an environment for their own work that is separate from that, and it becomes easier for them to do that as well, because the tool to do so is already built in.

That is potentially true, and I agree that that might be the rare case where the manager might not be used. However, as you say, such platforms are niche. The vast majority of cases will not require such special handling. Plus, people who have to deal with such situations will usually already know they’re likely to have to do some extra work to get things working (e.g., compiling Python themselves).

3 Likes