The 10+ year view on Python packaging (what's yours?)

BrenBarn · August 17, 2023, 7:27pm

In a couple of the recent threads, the idea was mentioned of considering where we hope Python packaging to be in 10 years^[1]:

Unsurprisingly, I agree that this is the core of the issue. Personally I think it is the core of every packaging issue. So I wanted to pull together some of what a couple other people said on this topic and make a new thread where hopefully we can discuss this and its implications for packaging-related decisions in the present and future, but without it being perceived as an intrusion on more specific discussions. I’m very interested to hear what other people’s 10-year (or 20-year or whatever) visions are for Python packaging, or if they think it doesn’t even make sense to worry about such things.

Like I’ve said before, I think this big picture view is needed and that incremental progress through PEPs considered in isolation will likely not produce the kind of qualitative change that many Python users want. However, I do want to clarify one thing: I don’t mean that incremental changes in themselves are pointless. Rather, what I mean is that, if we ever want to solve the fragmentation problem, we must consider incremental changes in terms of how they move us toward where we want to be, not just how they move us away from where we currently are, and not just whether they move us to a situation that is slightly better than where we were before. What is important is not that changes be big vs. small or sweeping vs. incremental, but that they be coherently directed towards a target.^[2] Of course, to do that we need to know where we want to be, and I’m hoping in this discussion we can share our views on that.

To begin with, @jeanas gave a concise vision in the same post:

My own vision is similar.

This also has some similarities to @johnthagen’s post several months ago which spawned a wide-ranging discussion about packaging “vision”. That was more of an extended “user story” than a bullet-point list, so I’ll just quote a small section here:

Wanting a singular packaging tool/vision

The developer learns they will be using FastAPI for their application, so they add it as a dependency and automatically update pyproject.toml:
pyrgo add fastapi
They can then lock their requirements into a cross-platform lockfile, similar to Poetry, for reproducibility:
pyrgo lock
And install those into a virtual environment (which is managed for them, again similar to Poetry and hatch):
pygro install

So this post similarly described a single tool that handled environment creation, package install, project management, publishing, etc. It did separate that hypothetical tool from another that would install Python versions, which differs from @jeanas’s (and my own) vision, in which a single tool would handle those tasks as well.

What interests me here is these conceptions people have of where we want to be in 10 (or however many) years, and what is similar and different among them, and whether we can synthesize the opinions of multiple people into something that could become an overall goal for Python packaging (perhaps for the proposed Packaging Council).

I realize everyone is busy with concrete matters, and some perceive talking about this kind of overall trajectory as a waste of time. But what I would be grateful to hear from anyone and everyone involved in all these packaging discussions is:

Do you agree with the above goals? What are areas of disagreement?
Do you think it is valuable, when evaluating proposed changes to the packaging landscape, to consider how they do or do not move us toward such a situation? Why or why not?

For something more specific:

Not by a long shot, as conda has done this for many years. Conda currently can handle most of the tasks on the list, with the exception of managing projects^[3] and lock files.^[4] There are a couple gray areas (e.g., conda itself does not run the REPL, rather you install Python and use that to run the REPL, although you could in theory use conda run instead to get a REPL if you want).

As I’ve mentioned before, I see conda as much closer to my eventual vision of what Python packaging would look like than the PyPA packaging ecosystem^[5], largely because it does combine so much functionality into a single tool. In previous discussions the main problems people seemed to have with conda were:

it doesn’t use PyPI
it relies on activating environments
it “takes control” of the environment so cannot be used with a Python you get from somewhere else

As to #1, I’m very curious what substantive advantages people think PyPI has over alternative package repositories, other than the fact that it exists and a lot of people use it. As I’ve mentioned on other threads, I think many users (especially those who lament the state of Python packaging) have no particular attachment to PyPI and would be fine with something else as long as it provided the packages they need, and as long as the transition process wasn’t too arduous.

#2 is only partially true, as conda run allows running a program inside an environment without activating it. It’s fair to say this functionality has had some bugs, and may still have some, but it’s a long way from nothing.

#3 is maybe the most interesting to me since, as I’ve said repeatedly, I consider this an advantage, not a disadvantage. In my mind the only way we’re going to get to the world @jeanas described (in particular, “install Python and manage Python versions”) is if all Python use happens via a singular tool^[6]. The fact that Python can be launched from so many different launchpads, so to speak, is part of what makes it hard for users to navigate (e.g., the various snafus with Debian or with setting the “default Python” on Windows). It would be easier if there were a single way in, so that once you’re in, you know that all Python-related tasks will be performed in exactly the same manner.

So again, I’m very curious as to what people’s perspectives are on this. What does the “one tool” of the future look like? Is it similar to pip? Is it similar to conda? Is it different from everything we have now?

Is there an irreconciliable difference on point 3, between those who want to keep environment management “inside” a particular Python install and those who want to put Python inside the environment, or can the gap somehow be bridged? Are there actual features of PyPI that are desirable^[7], or do we just want to use it because it’s called PyPI and that’s what we’ve been using? Are there other problems people have with the “conda way” of doing things? Are there ways to combine the best aspects of multiple worlds? What other desiderata do people have for a tool that would meet a broader subset (dare I say all?) of their needs than the existing ones?

On another thread, @pf_moore addressed different aspects of the future of packaging^[8]:

This is an appealing vision too, where packaging is transparent. It reminds me of what we’ve seen in other kinds of software: there used to be a much sharper boundary between what happened on your local machine and a point where you had to “go to the internet” to get something. But nowadays a lot of software happily pulls from local and remote alike as needed, without the user having to control or even be aware of that. I haven’t thought about this much with regard to Python and I’m not sure what form it would take.

Paul also said:

PEP 723: Embedding pyproject.toml in single-file scripts

Also note that I didn’t make a fuss of what tool Alice used. Maybe that’s because there’s only one option. Or maybe (and more likely, in my view) it’s because it doesn’t matter. The workflow is the important thing, and everyone understands the workflow, and uses it the same way. What tool you use isn’t important, in the same way that what editor you use isn’t important (). And that, in turn, is because workflow tools are no longer competing to try to claim the “top spot” as the “official tool”, but instead have started co-operating and enabling a common workflow, letting users have the choice and flexibility. Tools agree on the development process, so that users don’t feel that by choosing a tool, they are committing to a workflow or philosophy that they don’t understand yet, and won’t be able to change later. And users don’t feel pressure to make a choice, so having multiple options isn’t a problem. Just pick the one someone told you about, and change later if you want to - no big deal, no crisis. There will probably always be one tool that’s “trendy” and most people will use, but that’s just like every area of computing (heck, Python itself is the “trendy choice” out of a vast range of options!)

And the tool landscape looks very different. There’s no virtual environments or installers. These are low-level implementation details. There are no “script runners” - you run a script with Python. Most people never use any sort of tool unless they want to. Developing applications and libraries is still a complex task, but there’s a well-understood approach that works, so people won’t be asking “but what about my use case?” And tools exist to help with that approach, not to define, or control, the workflow.

Again I’m curious what others think about this. My own view is that it’s potentially compatible with @jeanas’s outline earlier.^[9] It just seems to me that @pf_moore is describing a more high-level overview, in which the “one tool” @jeanas describes might be a particular “low-level implementation detail” that underpins the “workflow tools”. So perhaps these are two views of the same future landscape from two vantage points.

I’m not entirely sure, however, what it would be like to have a single “the workflow” which everyone uses the same way despite using different tools. It’s possible that inevitably such tools will tend to diverge more substantially and in effect create distinct workflows. Even in the world of editors, although all perform essentially the same function, the differences can become relevant at times (e.g., a couple mentions in PEP 722 discussion about whether editors can block-comment an entire section).

I would love to hear from others about how they compare these two visions — along with their own preferred vision, of course. Is it necessary to have “one tool” at a low level to facilitate compatibility between a broader range of higher-level tools? Is it possible to get that level of seamless transition between tools solely via protocols? What would be the common elements of the “workflow” and which would differ among tools?

This brings us back to what I said at the beginning of this long post. No doubt we need to try things. My perspective is just that, before trying anything, we should evaluate it not only as a step in itself, but also based on how we foresee it fitting into a unified vision of the Python packaging landscape as a whole. Without doing that we risk improving things gradually and thinking progress is being made, yet not making contact with the deeper problems. It’s sort of like, imagine I’m on a beach and I have some tools and materials to build things. Maybe I can build some superb bicycles that allow me to travel along the coast, perhaps exploring some peninsulas stretching out into the sea. But if the place I’m trying to get to is an island offshore, it doesn’t matter how good my bicycles are, I need to decide to work on boats if I ever want to get there. So when I’m evaluating something I built or am considering building, the first and most important question is not “is this well-built” or “can I get somewhere with this” but “can this travel across water”.

Of course, whether you think you need to build a boat depends on whether you think the destination is on the same land mass or a different one. That’s why I really hope some others will share their own such visions on this thread, because whether we communicate those views or not, I have no doubt that they do implicitly influence our stances on each proposal that comes up.^[10] It’s just easier to talk about these things if we know where everyone is coming from.

The quotes I’ve included here came from other discussions, which in some cases means they have a little connective text at the beginning or end that may seem out of place, along the lines of “This is off-topic but…” or “but apart from that…”. I’ve tried to keep the main intent of the quotes clear. ↩︎
It’s my own fault if I’ve been misunderstood on this. ↩︎
I’m assuming this means something like Poetry where you can install a library and simultaneously update your pyproject.toml to list it among the dependencies ↩︎
There is a library called conda-lock that supposedly manages lockfiles, although I haven’t explored it myself. With recent conda plugin developments it’s possible something like this could be integrated as a conda subcommand. ↩︎
basically the pip/venv combination, with PyPI as the package repository, together with something like pyenv for managing versions ↩︎
with the possible exception of things like embedded Python, which already often require steps beyond what an average user would ever contemplate (such as compiling your own Python) ↩︎
in the sense that, if we were now deciding from scratch between PyPI and an alternative repository architecture, we would choose PyPI because of those features ↩︎
I’ve only quoted a few excerpts here and the original post has more detail. ↩︎
To be clear, these two “visions” were in different threads, so they were not framed in direct response to one another. ↩︎
For instance, the current PyPA model assumes that environments are created by Python (e.g., with venv) and packages are installed by Python (via pip), which precludes managing the Python version as part of the environment. I see that as an ocean, or at least some very large body of water, which needs to be crossed, and this influences my tendency to see any proposal that continues to build on that model as something that won’t hold up in the long run. I gather that other people don’t see that as such a big problem, and so they are more comfortable with proposals that maintain that assumption. ↩︎

sinoroc · August 18, 2023, 7:12pm

[I have not read @BrenBarn’s post, too long, sorry. But I hope it is fair to answer the question in the title anyway.]

One item on my wish list is interoperability between (packaging) ecosystems. For me this means interoperability between for example the (packaging) ecosystems of Python, Node.js, Rust, as well as system packaging ecosystems such as apt, yum, brew and so on.

When I see initiatives like PyBI, PEP 725, purl, and others, I can’t help but think what if I could npm install python Django or pip install nodejs reactjs? Wouldn’t it be great if pip could install gcc in order to build C extensions? Ideally there would be installers that work across ecosystems. I am tired of having things installed with pipx, others with conda, some with apt (I have managed to avoid brew and nix so far) and losing track of what should be updated when and where.

pf_moore · August 18, 2023, 7:35pm

There’s ziglang · PyPI. If you do pip install ziglang you get a C/C++ compiler. And cmake is pip installable as well.

sinoroc · August 18, 2023, 7:42pm

True. I have not seen a project that lists ziglang in its build-system.requires to build its C extension (I have not looked for it). Does it exist? Is it possible?

brettcannon · August 18, 2023, 7:55pm

You might be interested in PEP 725: Specifying external dependencies in pyproject.toml which is about specifying dependencies outside of the Python ecosystem.

BrenBarn · August 19, 2023, 1:46am

Sure. Most of the post is just me trying to bring together what a few different people (including me) thought about the topic, so more is welcome.

Conda can install nodejs, R, etc. It looks like there’s even a conda package for gcc although I haven’t really used that.

However, at least right now as I understand it this isn’t really “interoperability”, it’s more just people repackaging things for conda. It’s not that conda can use npm to get npm packages, it’s just you can use conda to install node and then use that npm to install npm packages. It seems to me that having true interop, where different language package managers could actually install one another’s packages, would be quite a heavy lift. Of course, this thread is about dreaming, so yeah, it would be cool.

That is indeed a pain point with the current situation. It sounds like what you’re describing is a situation where each package “doesn’t care” which manager (conda/apt/etc.) installed it, and the different managers themselves would be largely equivalent. That would be pretty awesome although again, a bit of a reach. I feel like to at least get parity among Python-oriented tools would be a start.

Also, I tend to be a little uncertain about a situation where there are multiple tools that have total interop. If (in the imagined future) every tool can do everything the others can do, why are there multiple tools instead of one? And if different ones can do different things, how does interop work between a tool that can do X and one that can’t? It seems that usually in this situation what winds up happening is that some information gets lost in the transfer of information between tools. I could imagine different tools that just provide different UX layers over the same base functionality though (sort of like different email clients or web browsers).

jamestwebber · August 19, 2023, 3:39am

I do fear this could lead to a snarl of dependencies that would become even harder for people to unravel when they encounter problems (if I pip install with python I installed via an npm I installed with conda where does that package go??).

A version of this I think might be somewhat more attainable is if a packaging tool could delegate to a more specific tool–i.e. if all you’re going to do is install a wheel, then delegate that installation to pip ^[1].

We can sort of see this in the conda ecosystem–many recipes are just wrapping the PyPI package with metadata. conda list will tell you if a package came from pip but it doesn’t recognize those as “already installed”.

I feel like delegation is more attainable than agreeing on an interoperable standard^[2], because a tool like pip could define its own delegation interface and let other tools adopt it if they want to ^[3]. Talking to the relevant parties can’t hurt, but it doesn’t require a consensus to demonstrate value.

Perhaps this is already possible in the interaction between conda and pip, for instance, but it’s not quite how it works right now (and I’m sure there are good reasons why it’s difficult)

or a hypothetical future Unified PyPA Tool, if one arises ↩︎
there are just so many differences of opinion about How Things Should Be ↩︎
or rather, I should say that a PEP could define this for python packaging ↩︎

sinoroc · August 19, 2023, 9:19am

[Thanks for the follow-up answers on my post. I do not want to discuss here whether or not it is feasible or how it would look like technically and in terms of UX or why it is the way it is now, I’d rather we do it on a dedicated thread if we really feel like discussing this now. But I still want to add to it…]

I guess my wish in the long term is that Python packaging stops being just about Python. And my intent is not to single out Python, I wish the same from other packaging ecosystems as well. For example I have things installed with Node.js npm and Ruby gem, that are not kept up-to-date because my operating system does not tell me that there are updates available. These are applications that are not available in apt repositories (or flatpak or appimage or snap as far as I can tell). So in the case of Python, I wish in the long term that the packaging story would not stop at “it is installable with pip, our work is done” (tongue in cheek).

That things like purl are being worked is exciting to me. I am also glad there is something like PackagingCon.

ofek · August 20, 2023, 3:33pm

This is a magical project. Recently at work I upgraded all of our code generation to use Pydantic v2 which didn’t have wheels for CentOS 6 so it had to build from source but the minimum supported version of Rust required a newer version of glibc so I used this (among other things) to target a specific version of glibc.

It’s so impressive to me.

Kwpolska · August 20, 2023, 9:10pm

Here’s my vision. It matches the OP vision in some parts, but differs in others (particularly when it comes to managing interpreters).

There exists the One True Packaging Tool that is useful for 95+% of use-cases, used by beginners and advanced users alike (and it ships with Python).
The One True Packaging Tool handles tasks of both package authors (creating wheels, uploading to PyPI, installing in development mode, creating packages with C extensions) and package users (installing dependencies, creating lockfiles).
The One True Packaging Tool comes with a npx/pipx-esque helper utility that can run an application written in Python without exposing the Python packaging world to the user.
Something like node_modules/PEP 582 is built into Python. The implementation is built in a way that lets 90+% of users avoid using virtual environments (e.g. by searching for the __pypackages__ directory in parent directories, allowing a __pypackages__ directory to prevent searching the OS site-packages, ignoring packages currently in site-packages when installing into __pypackages__, having a simple way to run scripts installed by packages from anywhere in the filesystem).
The One True Packaging Tool does not manage Python interpreters (it does not install new interpreters).
The One True Packaging Tool is written in Python, welcoming contributions from a wide audience.
There is a One True Interpreter Tool that manages Pythons. It can be written in Rust or whatever.
Beginners and people who don’t have specific requirements can skip the Interpreter Tool entirely and use installers from python.org or distro packages.
The OTPT can integrate with the OTIT (prompting the user to install a new interpreter). This feature can be disabled or discouraged by distributors or system administrators to avoid having multiple random Pythons on a system. (You wouldn’t want anything to install extra interpreters into the python:3.11 Docker image, for example.)
Neither of those two tools supports installing gcc, nodejs, or react. The Python tools are responsible for Python only. Installing gcc/node should be done via apt or brew or nix, and installing react should be done via npm.
Third-party tools (such as conda) can support installing gcc and numpy with the same user experience and within the same project context. Their Python support should be built on top of the One True Packaging Tool.

steve.dower · August 21, 2023, 3:54pm

I’m not going to invest too much time on this thread, and I’ve expanded on this idea many times before, but my “10+ year hope” is that package consumers can move towards generic package managers and repositories (similar to a Debian or a conda-forge). Both package developers and those responsible for creating the repositories (usually “redistributors”) should move toward the language-specific tooling.

That makes a service like PyPI more about being the canonical source for redistributors to get the correctly tagged source code in a way that lets them rebuild it and distribute reliable/compatible builds to the final users.

It makes a service like PyPI less about distributing ready to use binaries for every possible platform, and reduces the burden on package developers to produce binaries for a million different operating system/configurations.

And it gives users their choice of tooling for managing their entire project, without forcing them to cobble together working environments from a hand-crafted mix of platform specific and language specific tools.

mwichmann · August 21, 2023, 4:52pm

This isn’t a disagreement, just a data point. On a project I work on, we
recently got told that one Linux distro had made a policy decision to no
longer pull sdists from PyPI for packaging purposes. They will pull a
source tarball from GitHub or other designated official download site
for the project.

I don’t know if this indicates a trend away from the direction you list,
or if it’s an outlier; or if it would change back if the state
“improves” via the work being done by this community.

sinoroc · August 21, 2023, 5:18pm

Seems to me more like a 2-3 years view (5 years max.). Seems like nearly everything mentioned already exists, has already been tried but failed, or is well underway (whether or not this view is the one that gets standardized and/or becomes mainstream is a different question).

steve.dower · August 21, 2023, 6:47pm

Yeah, I was aware of this. My hope would be that we’d develop the language-specific tooling that encourages them to prefer PyPI (e.g. making it easy to build sdists in their existing context rather than re-downloading dependencies in a separate environment outside of their control; also having strong provenance between the upstream code repositories and the sdist, so that redistributors trust that they’re getting the same files).

h-vetinari · August 22, 2023, 2:57am

That to me is the key reason why – as someone often choosing sources for a project to be rebuilt in conda-forge – I prefer github. The only downside is that git(hub) does not populate submodules in the archives, but it avoids a potential supply chain attack vector (it’s currently completely opaque how the sdists get formed), and spares me from dealing with some non-standard transformations people come up with (mostly only relevant when there’s something to patch).

I agree with @steve.dower’s overall direction BTW^[1]

I haven’t had time to write a proper response for this thread. Perhaps later ↩︎

BrenBarn · August 22, 2023, 3:11am

I agree that some kind of delegation seems promising. This was mentioned on some of the other packaging threads. For instance, if pip and/or conda could talk to each other and effectively ask “what dependencies would you need to install if I asked you to install this package?”

What do you see as the advantage of separating these?

Most of the other parts of your vision make sense to me and (as your noted) are similar to mine and the ones I quoted in the initial post.

The node-like/__pypackages__ model is a tough one for me. On the one hand, a lot of people seem to want it. On the other hand, I think some aspects of it are better suited to JS than Python just because of how the Python import mechanism works (i.e., you import with a bare name, which has to be an index into some kind of global import path). Back on the one hand again, it might be nice if some aspects of the Python import mechanism could be tweaked to make this less painful (e.g., the infamous “relative imports don’t work when running a script” problem).

Back on the other hand, I think the __pypackages__ approach, if it is good, is only good for a particular development context, which is where you’re working on a “project” that is within a single directory, and you’re okay with not sharing an environment across multiple “projects”. This is a common case, to be sure, but it does leave out the common academic/data-science situation where you have a more-or-less persistent environment that may be used for many different projects. I also think the devil is the details as far as how __pypackages__ would work (e.g., I think some kind of parent-directory search would be needed for people to feel like they really didn’t need a venv).

Another point listed in both your and @jeanas’s lists has to do with project management. It seems to me that a big reason for Poetry’s success is the way it links environment management with project management, so a single poetry add does two things: it installs the library into the environment, and it also updates your pyproject.toml to record the fact that your project needs that library installed. This particular task isn’t handled directly by pip or conda. (Personally, though, it still seems to me that it would be easier to add such a feature to a conda-like tool than to add some of conda’s more powerful features to a pip- or poetry-like tool.)

This sounds great insofar as it reduces the pain for package consumers. That said, one practical problem with Debian and conda-forge is the relatively long wait time for packages to be added. But perhaps this is the price to pay for having a more “curated” set of packages, though (in the sense that, e.g., you know an installed package will not have irremediably broken metadata).

Also, relatedly, there seems to be a slight trend in the Linux world towards things like Flatpaks which try to circumvent the distro-packager part of the process. In part this is because of just what you said, people not wanting to produce binaries for every combination of configurations. But it’s happening even when the application authors aren’t the ones who have to do that. Even when the responsibility for packaging is pushed to the distro maintainers, that can rebound on users if it means they have to wait for the distro maintainers to react to a new source release by the app author.

Theoretically, I guess, the solution to this is automation — no one has to wait very long for anything if all the code is just getting auto-built for each package manager and not waiting on a human to repackage it. But of course that requires resources.

Kwpolska · August 22, 2023, 5:05pm

Package management is a common end-user/developer-level task performed often. Interpreter management is a sysadmin-level task ^[1] performed rarely.

Package management is needed ~everywhere Python code is run. Interpreter management is not needed in Docker containers, or in environments where a single Python version provided by the OS or by python.org installers is good enough (or is covered by an expensive RHEL support contract). The interpreter manager might be useless on niche platforms which require special build steps and which don’t have binary packages available.

Package managers can (should) be written in Python. Interpreter managers can’t be written in Python (or else you need a different tool to bootstrap the first Python interpreter). Package managers written in Python can leverage metadata and configuration exposed by the interpreter, and can have more contributors than things written in Rust. (And frankly, if Python needs a Rust program to manage its packages, is Python a good choice for your code?)

I agree, __pypackages__ needs .git-style recursive search to be usable.

I realise most developers work on single-user machines with root access, and I also realise interpreter managers can install to the home directory, but it’s still an administrative task. ↩︎

bryevdv · August 22, 2023, 5:18pm

As a developer I often have to test and reproduce issues with different version of Python and rely on conda’s ability to manage interpreter versions nearly daily, as do many of my colleagues.

And frankly, if Python needs a Rust program to manage its packages, is Python a good choice for your code?

This is hardly a compelling argument. Do you regard NumPy (written mostly in C) as a reason Python is not a good choice for numerical code?

Kwpolska · August 22, 2023, 6:14pm

NumPy is written in C for performance reasons. NumPy needs C to access SIMD instructions and other low-level things. NumPy is a specialist library that performs CPU-bound operations and that is used multiple times in your average program. You couldn’t write NumPy in Python (at least not if you care about any sort of performance).

A package manager is a console application. All it needs to do is download stuff from the Internet, parse some serialized data, copy some files around, and other simple tasks like this. This is generally IO-bound, not CPU-bound. Those are simple and common tasks that should be doable in any programming language. pip, npm, cargo, rubygems, maven, gradle, nuget are all written in the languages they are managing packages for^[1]. Choosing to write the One True Python Package Manager in Rust would suggest that Python cannot be used for a simple console application performing basic tasks.

Or at least in the languages for the VM they are primarily used with (Gradle is Groovy+Java+Kotlin and can help with other JVM languages too). ↩︎

bryevdv · August 22, 2023, 6:30pm

Dependency constraint solvers are written in C and Rust for performance reasons. ^[1]

Conda had a pure-Python constraint solver in 2012, until it didn’t scale ↩︎