Python Packaging Strategy Discussion - Part 1

As I see it, certain areas are currently out of scope for pip, including creating sdists (use build) publishing (twine), setting up environments (venv etc.) or expanding a list of desired requirements into something like a lock file (pip-tools). Perhaps there’s more discussion internally about these, but not being a pip maintainer, my impression is that pip maintainers have largely said a firm no to such things.

I think that has made a lot of sense in the context we have (and I like working on tools with limited scope), but if you announced that pip’s scope was opening up to cover a set of these things that we currently point to separate single-purpose tools for, even if it was going to take a couple of years to flesh that out, I don’t think it would be seen as ‘doing nothing’.

Of course, forging a consensus on what set of things belonged in the new, larger scope would be hard, but it always is, and I don’t think we can address these concerns without some kind of consensus.

6 Likes

As someone who wears that hat, and has spent a not-insignificant amount of time in pip’s issue tracker, I am ~sure that various maintainers at various stages of “history” of pip have said that they are open to the idea but that we need a broader discussion about it. :slight_smile:

In line with that, basically all maintainers who’ve spoken here are open t/supportive of the idea of expanding scope for pip. FWIW, I have a draft blog with quotes from discussions here and on the issue tracker to that end as well.[1]


  1. Perfectionist tendencies have kicked in, and it’s got a larger scope than what it started with (i.e. response on here). ↩︎

5 Likes

I suspect if you asked them, they’d all say “right now” :slight_smile: but I also think that most of them are reasonable, and if they see movement towards it, they’d be happy and see that we’re making progress. I don’t think there is a world where they get it in 6mos no matter what we do if they’re not already happy with the status quo options.

Yea I may have brushed over them to some degree, but that’s largely because I view the challenges of evolving pip to that point to be largely technical challenges, and in my opinion, technical challenges are more tractable than political ones.

I think the beauty of the idea, is that it sort of is the “do nothing” option, like it’s not really doing nothing because we’re expanding pip’s scope and adding the features people say they want in the tool… but that’s something we’ve been doing all along. It’s really just becoming a bit more aggressive about expanding those features to arrive at the destination that people want.

How to communicate that out is always a struggle for us. We could write a PEP on it if we wanted, or put up a page on packaging.p.o, or even just put it into the pip release notes, or just start doing it and let people notice that pip is gaining these features over time.

2 Likes

Maybe (and this is a serious comment) the key takeaway from this discussion is that we need some sort of formal publicity / user liaison team, who are explicitly tasked with this?

1 Like

Yes, getting the message out is key. I think one low-cost way is pinning a GitHub issue(s) of the intent and high-level roadmap and/or placeholder issue(s) for planned milestones.

I will also add, that when pip embarked on another very difficult piece of work (new resolver), there was visibility and comms that made it out there. Sure, perhaps it didn’t reach everyone, but it certainly was heard. Perhaps a similar approach or something can be learned from that experience for this new set of challenges.

One thing that’s unclear to me in this discussion is, what exactly do we want this hypothetical new system to do? I really liked @johnthagen’s example in another thread laying out some concrete examples of things that people might do, but it seems this thread has moved in a more abstract direction.

I think it is not so important to users whether the hypothetical unification is technically “one tool” or several, as long as they are designed coherently as an integrated set. As far as messaging, personally I think one thing that would go a long way toward making users feel like their concerns have been heard and that pip (or whatever set of tools) is responsive to them is a definitive section within the python.org docs that clearly lays out how to accomplish concrete tasks, clearly states that the way it lays out is the official way, and backs that up by clearly demonstrating how to accomplish all the tasks that people want to accomplish but that currently require navigating a labyrinth of conflicting toolsets. (In effect this would be a flowchart that includes choices like “does your project include code written in a language other than Python? if so, then do blah blah”, although it wouldn’t have to be structured like a graphical flowchart.)

3 Likes

To be honest, I do not believe the “expand pip” option. At least, not today. If the pip team gets together, makes an appeal for more help to push in this direction, and comes back in 6-12 months and shows that they make steps and it can get there, then perhaps I would. But the weight of history, the complex and legacy code, the backlog of issues and difficulty of working on pip, and the important lower-level role as a pure installer it already fulfills are already stacked against this idea imho. So by all means try if you want, but it’s a poor conclusion to now decide that this is the way.

Poetry/PDM/Hatch are much more along the lines of what users seem to want. Each has its own problems and isn’t complete enough, however if you’d take the best features of each you’d have about the right thing. Now that has the same problem as for the pip direction above: we cannot agree to pick one here, nor can we wave a magic wand and merge them. But again, that is not needed. It’s encouraging to see that the authors of each of these tools have shown up here (much appreciated @sdispater, @neersighted, @ofek and @frostming!). If you as authors of these projects would get together and work out a strategy, I’d believe in that working out well in a reasonable time frame. And make both Python users and the average package maintainer enthusiastic about this.

I’m not intimately familiar with each of Poetry, PDM and Hatch, but here is what I understand some of the key differences and issues to be:

  • Team: Poetry seems to have a good size team (multiple maintainers; 6 folks with >100 commits who were active in the last year). PDM and Hatch are both single-author projects.
  • User base: Poetry seems to have most users by some distance, Hatch seems to gain quite a few new users, PDM seem to be struggling a bit.
  • Build backend support: PDM supports build backends (PEP 517) the expected way, it seems that Poetry can be made to work by plugging an install command like pip install . into build.py (so Poetry’s approach is even more general than build backends, but a bit clumsier by default), and Hatch has no support at all for anything but its own pure Python build backend (and new plans do not look good).
  • Conda/mamba support: Hatch and PDM seem to have support (not sure how feature complete), Poetry does not.
    • Other package manager support isn’t present anywhere, but if conda support can be added, that leaves the door open for future other package managers (e.g. if enough people care to maintain Spack or Nix support, then why not?)
    • Virtual environment handling seems to me to be a subset of this, or another angle on it. There’s no good default, as the discussion on PEP 704 shows. PEP 582 (__pypackages__) is an entirely reasonable approach. PEP 704 doesn’t add much, but as long as there’s different ways of doing things it probably doesn’t hurt either. The important thing here seems to be to hide complexity by default from especially the beginning user, while allowing multiple approaches.
  • Plugin system: all three projects seem to have a plugin system and a good amount of plugins.
  • Lock file support & workflows: Poetry and PDM have lock file support, Hatch does not. There’s an issue around standardization here, TBD how that will work out. Poetry and PDM seem to conversely have issues with the “package author” workflow, which should not lock and not put upper bounds on dependencies by default. Looks like that needs more thought in projects and something like two workflows/modes: application development and package development.
  • Completeness of commands offered: it seems pretty complete for all projects, although a standardized test command at least would be a nice addition for all projects.

There’s probably more to compare here, but I think it’s safe to say that if these projects would join forces, we’d have something very promising and worth recommending as the workflow & Python project management tool. And it doesn’t require a 200+ message thread with everyone involved in packaging agreeing - if a handful of authors would agree to do this and make it happen, we’d be good here and could “bless” it after the fact.

8 Likes

If I understand Dependencies for build backends, and debundling issues - #29 by ofek correctly, that’s changing soon for Hatch.

I think you’re alluding to them automatically generating a lock file and then continuing to use that file going forward while it exists regardless of whether it’s been checking into VCS?

Trying to standardize that has been brought up before and typically has been shot down (e.g. Providing a way to specify how to run tests (and docs?) ). I think the issue typically comes down to choosing a scope (API or just shell command), and then what to do for OS variances. But I’m a supporter for something like this, so I’m the wrong person to ask why we shouldn’t do it. :wink:

1 Like

I also had in mind the ^ operator for Poetry, and IIRC both Poetry and PDM propagating an upper bound in python_requires in an undesirable way.

The exact details don’t matter much here though, whole blog posts have been written about this with more detailed comparisons. I could have gotten a couple more details wrong. The main point is that these tools are on the right track, and having them converge/merge/cooperate is the most viable way I see to resolve this “what workflow tool to standardize on” question.

3 Likes

Small point – pdm doesn’t do this by default, it uses minimum for the lower bound, i.e. package >= X.Y.Z. However, I know this because I wrote the patch, but the docs don’t reflect this, so I’m glad you pointed it out (I probably missed them!)

Gah, this quoted the wrong part of your comment, sorry.

4 Likes

I would prefer a boilerplate test script in tool.pdm.scripts which can be changed or removed later, rather than a standardized test command which implies a unified test framework like go and Cargo. Thanks to the PDM script shortcuts, pdm test is equivalent to pdm run test.

Not for PDM. PDM is using the minimum(>=) strategy by default.

Yes, but you seem to quote the wrong part.

3 Likes

As I’d promised ~200 posts ago, here’s my thoughts on the status quo of the ecosystem as a blog post:

Frankly, I feel like I should post this in a separate thread but I guess we’re going to aim for a record on discuss.python.org with this one (this topic is already the 5th longest in d.p.o history already). I’ve trimmed my whole thing on Conda/native dependencies; coz we’ve talked a ton about that already here.

If you spot something wrong, do let me know outside of this thread (discuss.python.org has DMs)! Right now, I need to go to sleep. :slight_smile:

13 Likes

It’s probably unsurprising that I agree with pretty much everything written in @pradyunsg 's excellent post. In fact, the Github comment of mine he dug up from 2019 could have easily been posted word for word in this discussion and still made complete sense.

One thing I will disagree with on, is this statement:

We still don’t have agreement that this is the direction that we, as a community, want pip to go.

If we include the wider Python community, I think there is heavy agreement on this given that:

  • The vast bulk of people are still using pip even though tools like poetry, pdm, hatch exist.
  • By far the number one ask for like 5-10 years now has been to provide a more unified experience.

I can’t find a way to reconcile those two facts with each other, without coming to the conclusion that most of our users area already in agreement that this is the direction they want pip [1] to go.

I think where the disconnect is, that the people working on these tools can’t come to agreement that this is the direction we want to go. I think there is a lot of reasons for that, we’re best positioned to understand the nuances of the various tools and to take advantage of the power of the unix philosophy, we’re all working on different tools within the ecosystem and have a vested interest in what’s best for our own projects, dictating ecosystem level, non technical decisions is hard and we’ve been loathe to do that.

Ultimately, I think what it comes down to is we’ve gotten pretty good at using the PEP process to make technical decisions about interoperability standards [2], but we’re terrible at using it for anything else and we don’t have a real alternative for making decisions otherwise. That means that anytime we have this discussion, we largely just sit around talking about what we could or should do with no actionable items on how we’re going to come to an agreement.

So here’s what I’m going to propose:

Interested parties write a PEP on how they think we should solve the “unification” problem within some time frame, all of these PEPs will have the same set of PEP-Delegates, the various proposals will be discussed just like any other PEP, then the PEP-Delegates will pick one and that’s the direction we’ll go in [3]. If they are unable to come to an agreement, then it will get kicked up to the SC to make a choice.

My recommendation is that we do something a little unorthodox and instead of having a singular PEP-Delegate, we instead have a team of 3 of them, who will together select the direction we go in. My rationale here is that this is our first time making a decision quite like this, it’s going to be a decision with a large impact, and there is no one singular person who could make this decision who isn’t biased in some way.

Unfortunately, since we don’t have a good way to make these decisions, we also don’t have a good way to make a decision on how to make a decision. So I’m going to propose that we wait two weeks, and if there isn’t a large outcry, then we just move forward with this idea.


  1. I don’t think they actually care if it’s pip or not, what they actually care is that the tool that comes with Python should go in this direction. Obviously that tool is currently pip, so by that nature they have agreement that it should be pip. Of course another option is that we get agreement from Python Core that another tool should be bundled within Python. ↩︎

  2. Also for large decisions for PyPI itself, but that’s not related. ↩︎

  3. Obviously we have no real enforcement mechanism here, if some project disagrees with the ultimate choice made, they’re free to go their own way-- and that’s a good thing. We’re not looking to create the one true tool to rule them, just to provide the default tool. Other tools may still exist and people can still use them, but defaults matter. ↩︎

10 Likes

If maintainers and users want to evolve pip into a tool that does more things, I think a good first step would be to make its interface pluggable. There’s definitely something that appeals to new users about having the same top-level command.

I think this would allow some degree of UI experimentation while still having a single command that users interact with. There’s been comparisons to cargo in this thread; cargo is pluggable and seemingly simple core workflow commands like cargo add did not come in-built with cargo until very recently.

1 Like

AIUI cargo isn’t actually pluggable, it just has a small shim that if you run cargo foo, it will look for a binary named cargo-foo, which isn’t meaningfully different than just running cargo-foo. There’s no real extension point other than that that i’m aware of.

2 Likes

From a PEP Editor perspective, if there’s enough consensus in this approach and interest in submitting proposals, we could perhaps consider allocating a series of PEP numbers (like the 3000 and 3100 series for Python 3, and the 8000 and 8100 series for Python governance), with a Process meta-PEP documenting and agreeing upon the process and listing the individual proposals added as PEPs after that one.

Having a council of 3-5 PEP-Delegates certainly sounds ideal to me, if there is a desire to go ahead with this. The only difficulty would be deciding on who would comprise the committee, and deciding on how we’d decide that. The PEP-Delegate selection process as of our latest revision of PEP 1 is just people self-nominate and the SC approve it, or make the decision if there are multiple candidates; beyond that there’s no formal process, so we’d have to invent one.

In this case, we could just have people here nominate candidates, with their approval, and then the SC approve the 3-5 based on our feedback here, or we could have a more formal vote on the thread, or in the PyPA committers group e.g. by single transferable vote (or similar multiple-selection method) (and have the SC formally confirm that, if needed).

IMO, the ideal candidates would be those who:

  • Are well-respected in the packaging community
  • Collectively possess a diversity of backgrounds
  • Lack an existing strong vested interest/established opinion about a particular approach.

That latter point is probably the most difficult, since pretty much the who’s who of packaging have commented in this thread, but at the minimum we can say that anyone accepting a nomination as a PEP-Delegate cannot be an author of any of the PEPs for consideration, and must generally recuse themselves from actively advocating a strong stance for or against any of them (as is the general principle for PEP-Delegates currently). This should hopefully help self-select those who are committed to taking on the role of stepping back to help select that which best captures the consensus of the broader packaging community and serves the needs of Python’s userbase, versus the equally valuable one of being an author or champion of a specific proposal.

I’d personally like to see at least one Delegate with a scientific Python/PyData background, given somewhere around half of Python’s userbase falls into this demographic (including me)—and at least my experience bridging both worlds, it’s among the scientific community that I’ve both heard the loudest and most consistent calls for packaging unification from people of all Python experience levels, and also are hit hardest by the lack thereof, as they are by and large not professional programmers and are most lacking in the background, experience, rationale and institutional knowledge/resources to be able to successfully navigate the current landscape.

Finally, as a PEP editor with experience in packaging specs, logistics/organizing, technical writing, and in equal measure the PyData and core Python realm, and who’s been closely following this discussion from the beginning and been strongly invested in seeing some form of packaging unification come about, but is equally strongly neutral so far on the most workable approach to do so (and have accordingly have just been listening) with no direct ties to any of the tools or proposals involved, I offer any help I can give in an administrative/procedural capacity.

4 Likes

Honestly, I don’t mind just doing this.

Given how the “plugins to pip” means “let me hook into how pip does things” or “let me write pip compile rather than pip-compile”, and the latter is repeatedly suggested as an good-enough extension model; just doing that might be good-enough TBH.

5 Likes

Given how the “plugins to pip” means “let me hook into how pip does things” or “let me write pip compile rather than pip-compile”, and the latter is repeatedly suggested as an good-enough extension model; just doing that might be good-enough TBH.

This is what we have in the Jupyter ecosystem, and it seemingly works well for us there!

1 Like

Yes, that’s my understanding as well (obviously there’s room for more extensibility beyond that). My claim is just that despite being simple minded, for whatever reasons, this kind of thing has brought value to users in other ecosystems and may allow for experimentation on the way to making pip a complete workflow tool.

Connecting this to Pradyun’s post:

implementing vital workflow improvements is now blocked on an exhaustingly long process of a non-iterative, waterfall-style design process

Extensibility is one possible way to allow for more iteration and for workflow extensions to prove their usefulness. I think other ecosystems demonstrate that this is viable, even for simple or core commands.

Unintended competition

I think even extensibility changes where things fall on the “Competitive Spectrum” Pradyun defines. If I work on a pip-audit subcommand I have a “cordial” relationship with the rest of pip and likely a “collaborative” relationship if I work to get it upstreamed. There’s less mutual exclusion when it comes to users.

Some/all of the “workflow tools” that exist today because the “default” tooling did not cover more of the user’s workflow with a single piece […] the folks who created these tools are not fools who like to create work for themselves or enjoy reinventing the wheel

There may be less fighting over users, since all the users are nominally pip users. There might be less wasted effort if contributors can focus on the parts of workflow they feel they can impact the most.

I don’t think they actually care if it’s pip or not, what they actually care is that the tool that comes with Python should go in this direction. Obviously that tool is currently pip, so by that nature they have agreement that it should be pip. Of course another option is that we get agreement from Python Core that another tool should be bundled within Python.

I think you are right that users don’t really care if it’s pip. I think it’s a bit of a stretch to say that they agree it should be pip because that’s the tool that comes with Python. I would rephrase your statement to say that, rather than users thinking that pip should go in any particular direction, they think Python should go in the direction of including a universal toolset. Many users do not even think of pip as separate from Python; pip is just a magic word for “how you install Python packages”, and what they want is a world where when you get Python, you get a unified working system for that. The important thing is that the whole system is a “battery included” with Python.

That also has to include clear and comprehensive docs explaining how to use the tool to do the tasks users want to use. For instance, the current situation with the packaging docs actually having four separate tabs to show how to do the same thing with four different tools is the opposite of what we want (especially because it doesn’t provide any guidance on why you’d be using one or another of those tools).

I think this bit of your Github comment is especially cogent:

This is probably a larger question better suited for discourse, but honestly I think we need to take a holistic view of what our ideal tool looks like. I would try to put the current set of tools out of mind for now, and try to design an experience up front.

One thing that worries me a bit about this thread is that I still don’t see that much discussion of that. There is a lot of talk about hatchling this and pip that but not so much about “these are the tasks that the tool needs to accomplish”. For that reason, I have a bit of trepidation about your proposed solution of competing PEPs, because I think there needs to be some higher-level guard against a decision being made that just results in another tool coming into existence without the requisite oomph behind it to clear the field.

For the record, here are two concrete requirements I would want to see as part of a standard Python installation:

  • Environment management and package installation need to be designed together, as part of the same system. Whether they are the same CLI tool is not really important, but right now the two-dimensional choice space of environment managers and package installers is a big source of pain.
  • Environment management needs to include the ability to manage the version of Python itself as part of the environment. Again, currently, the need to do these separately (e.g., managing Python versions with pyenv but package versions with virtualenv) creates a two-dimensional space of tool choices that increases the sense of confusion.

Personally, I would go further and suggest that in the end we need to consider a conceptual shift, where what you get when you “get Python” (e.g., from Python.org) is not actually “just Python” but rather a meta-tool for managing Python. On Windows the Python launcher has some of this, and there’s been interesting talk in some of these threads about a similar launcher for Unix, but again these tools are not integrated with things like virtualenv as part of a coherent system.

9 Likes