Python Packaging Strategy Discussion - Part 1

frostming · January 16, 2023, 1:58am

（Maintainer of PDM here)
I’ve actually been following this discussion for a long time, but I’ve been hesitant to respond from what position and how to organize my language.

An interesting fact is, among the tools mentioned in the discussion, PDM and Poetry are the only ones that are not under the umbrella of PyPA, whose maintainers don’t show up here, and they are also the tools that offer the most functionality, which should be a good start of a unified package manager. Admittedly, Poetry’s package metadata format does not adhere to PEP 621(they will, hopefully), but PDM might be among the first few, if not one, that supports PEP 621, shortly after it is accepted. Although PDM began its life as the only package manager to support PEP 582, it is a bit frustrating that people continue to resist adopting it because of the incomplete draft PEP, even though PDM has switched to venv by default after 2.0. In my opinion, PyPA seems to favor single-purpose tools that do one thing well, rather than all-in-one tools. The most adopted package managers(or build backends) inside PyPA are flit and hatch.

So hereby let me promote PDM a little: it provides all the mentioned features here and has its CLI directly inspired by npm. I am also rewriting the build backend to provide a similar extensible interface to hatchling. However, PDM itself doesn’t adhere to any specific build backend and users can choose whatever they like. The next step is to add workspace support similar to how Cargo works.

petersuter · January 16, 2023, 11:50am

I got the impression one missing part is this:

pf_moore · January 16, 2023, 12:10pm

This prompted me to think. I don’t want to focus on the Hatch vs PDM vs Poetry^[1] debate here, but let’s suppose for a moment that someone waves a magic wand and we get consensus on one tool. What would we then actually do to make that consensus effective?

I don’t actually have a good answer to this. But it feels like the user community expects the PyPA to have some sort of influence (or authority) over this, and our biggest issue is basically that we don’t…

vs any other tool with a similar scope that I’ve missed… ↩︎

h-vetinari · January 16, 2023, 12:21pm

Well, the sky’s the limit really, but off the top of my head:

enforcing metadata consistency / accuracy
using that metadata to do useful things in the python ecosystem
- for example: regularly run CPython main against the ecosystem (compare Rust’s crater runs, or scala’s community build), identify regressions early, make them release blockers
solving the “I have to run arbitrary untrusted code” to install a package problem
preventing very confusing splits like venv vs. virtualenv
remove major installation footguns with the same urgency as other big UX problems (like pip not having a resolver for years and/or how easy it is to mess up your base python install)
use “happy path” defaults that solve the 90% case, leave expert mode to explicit opt-in
bring PyPA and SC much closer together
etc. etc. etc.

In the absence of all that, you get small groups of volunteers trying their hardest for their use cases / user base / niche, but anyone who’s exposed to several use cases / user bases / niches will run into the chasms between the various insular tools.

Some interoperability bridges have been built over time, but frankly, I don’t find it fair to put the responsibility for “Step 0” of installing/using/distributing something written in Python on a free-floating set of volunteers and hope they coalesce on a common strategy.

Yes, all of this is hard (especially since Python inherits a large part of the “lack of language integration” problem from C/C++ for code it wraps), and it’s not as sexy as a shiny new library or performance improvements, and everyone’s busy, and almost everyone’s working on this in their free time, etc. I’m aware there’s no panacea, but fixing these things should start with a commitment / plan / direction on the language level, because that’s the only place where common goals can really be set.

PS. For a particularly egregious illustration of the combined effects of bad metadata, arbitrary code execution & lack of tooling consistency (i.e. compiler flags), see this blog post.

PPS. Just saw @pf_moore’s comment

Addressing this is exactly what I meant by the above.

pf_moore · January 16, 2023, 12:24pm

I should qualify my answers in the light of this. As an end user, my answer would be “YES” to everything (with the usual proviso that I’d be unhappy if I didn’t personally like the chosen unification )

My answers were very much from the perspective of a packaging specialist knowing the trade-offs. And maybe that’s actually the wrong way of looking at this^[1]? I feel as though we’re resisting what is in fact a very clear message from the users, and the reason is that we simply don’t have a good anwer to @smm’s follow up question “how do we go about doing it”, so we’re chipping at the edges of what we feel we can manage - which from an end user perspective is disappointingly little.

Actually there’s no question - it is the wrong way. ↩︎

h-vetinari · January 16, 2023, 1:33pm

Methodology aside, that is an interesting blogpost, thanks! I don’t agree with many details (including some rather opinionated conclusions), but I think over all it’d be fair to say that it makes the same point as the survey respondents (“way too many tools!” / “unify or bust!”), quite forcefully.

It also discusses this very thread, plus there’s pretty active discussions about the article on Hacker News and Reddit, but – perhaps unsurprisingly – the tone there is less polite than on DPO, so maybe don’t click if this thread is already exhausting to you.

Still, if there’s something to be gleaned from the usual internet chaos, it provides a window into just how unhappy people are with python packaging. Which is not helpful for most people in this thread who poured their free time into making things better, but my take-away from reading those comments is that it underscores the point that average users^[1] overwhelmingly prefer homogeneity (even if enforced) compared to the the tooling diversity / freedom / innovation that has been the MO so far.

Here’s one of the more cogent examples:

as opposed to a more maintainer-heavy audience here ↩︎

oscarbenjamin · January 16, 2023, 1:57pm

I notice that that table lists “packaging C extensions” as being the missing (or partially missing) functionality. I’m not entirely sure what that means but no tool can “unify” the others if it does not support a major packaging use case because there will always be the need for some separate tool that does handle that case. Perhaps that could look more like pdm or poetry (or hatch) plus some backend extension or something though so at least a single tool could be used for all the other tasks.

There should be some consideration though of what people who are working with C extensions are supposed to do. It’s not just a case of maintainers packaging things for PyPI/conda but also the bigger group of people who would install from source.

bonk · January 16, 2023, 1:58pm

I’ve read this whole thread, and it seems no conclusion/action item was generated in 2 weeks/146 replies. I think this is indicative of the problem. And there are other previous mega-threads.

I think Python is the only major language where environment activation is widespread (while not truly obligatory). Maybe the discussion should start from the basics and establish a minimal set of hard-requirements for the “common solution asked by the users”. One big question is should activation be used or not? A yes/no should be derived, maybe with SC involvement. Otherwise the discussion goes in circles.

fungi · January 16, 2023, 3:01pm

Can you describe what you mean by “environment activation?” Also, to
be clear, if you’re talking about sourcing “activate” from a venv,
you’re aware that’s optional right? (Or at least it is on the
platforms I use, being Linux distributions and Unix derivatives.)
You can just call entrypoints directly inside a venv without any
prior “activation” step, it’s my primary way of running things with
Python. I honestly can’t remember the last time I sourced an
activate stub from any of the many venvs on my systems.

oscarbenjamin · January 16, 2023, 3:05pm

Personally I would be happy to just be told what the new tool is and have clear documentation for how to use it. If PyPA said “we now recommend tool X” and there was clear documentation for using it and migrating to it then I would be happy to go along with that. I think that’s where most users are at.

The discussion here shows that it’s unlikely that a tool could cover all use cases but I expect that any of hatch, pdm or poetry^[1] would be sufficient for the bulk of what most “users” would want to do. If these can replace a whole bunch of different things like pip, setuptools, venv, etc with a single tool and a single frontend then that would be a significant improvement.

Where this potentially goes wrong for me though is if I start trying to use the tool and it turns out that there are significant problems with it and it actually does not do the things that I need and the big one here is building native code extensions. This would not be a complete showstopper for me though if:

The tool does do some of what I need i.e. it can at least be a useful part of my workflow in replacement for some combination of tools that I currently use.
There are ways to use whatever else I need to use (e.g. to build C extensions) in combination with the tool so I can at least use it right now even if it’s still a bit awkward for some things.
There are plans to eventually have better support for important use cases like building C extensions even if that part is actually not completely unified for now.

I haven’t used any of these and they all sound similar to me. ↩︎

mattip · January 16, 2023, 3:35pm

If your entrypoint needs another entrypoint or needs any other modified environment variables, then you must activate the environment. This is more common than you may think. One use case is calling the command line cythonize from a entrypoint script. You can work around it by very carefully modify env when calling subprocess.Popen to control which cythonize is being used, but more likely you will call some other cythonize than the one inside your venv.

pradyunsg · January 16, 2023, 3:41pm

I’m not sure what you’re referring to but… to the best of my knowledge, this isn’t a CPython concern. We do need to enforce elsewhere in the toolchain.

Outside of the scale issue with trying to run/test CPython against the ecosystem^[1], this is work that redistributors like Red Hat do already, as well as maintainers for various projects (eg: Cython) – and issues are treated as release blockers as and when appropriate. I don’t see how adding more work on the volunteers who maintain CPython is going to meaningfully change things.

This isn’t a CPython concern, it’s an installer one: Speculative: --only-binary by default? · Issue #9140 · pypa/pip · GitHub

This goes to, somewhat, the core of what we’re supposed to be discussing here: unifying tools/terminology etc.

This would be nice – if CPython decides to go down the route of trying to have a single “official” cross-platform installation + invocation experience, that’ll make some of our problems less bad… but it won’t fix many of the issues we see.

That’s unrelated to CPython and, honestly, I’d prefer to have this be the case and consistently document this.

Would be nice, but also, not going to go into this to avoid derailing the discussion. x-ref Adopting the concept of "Teams" (from PEP 8015)

On CPython + PyPA relationship, I noted down my understanding over the weekend. I suggest that this conversation (about CPython / PyPA relationship) get split into its own discussion.

And the various challenges like not having a consistent test runner experience. ↩︎

RaitoBezarius · January 16, 2023, 3:43pm

(NixOS developer, maintain some Python packages in nixpkgs and has knowledge of the issues we encounter classically with the Python packaging ecosystem wrt to native dependencies. I don’t represent the NixOS project or community.)

I read the whole thread and found the remarks of @steve.dower very interesting. Firstly, I have to say we know that Nix ecosystem is probably a small subset of Python users and probably the one for which Python packaging works more or less the best in our closed garden. We have native cross-compilation support to a certain extent, and I can even trivially run stuff on RISC-V without thinking too much about what needs to be done.

The concept of a key infrastructure and letting distributions & system integrators provide packages for users is compelling for us, as this is 99 % what you need to do on NixOS. (of course, we have escape hatches, but this is a very much desired state in my perspective).

Currently, we have around ~5K7 Python packages (not all of them are working) and we do not support all versions. It’s already a lot of efforts to manage this set of packages, notably because of those native dependencies and various intricacies caused by pleasant features which does not work that well in our model (setuptools-scm for example.)

Sometimes, we have posts such as https ://discourse.nixos.org/t/nixpkgss-current-development-workflow-is-not-sustainable/18741 which shows the difficulty to do this at scale on particularly challenging pieces of software, e.g. TensorFlow.

We have many attempts to bottle Python packages automatically into our Nix ecosystem, I won’t do a list of them, one of them which works really well is poetry2nix (which has support for multiple build backends in fact: https ://github.com/nix-community/poetry2nix/blob/master/overrides/build-systems.json).

Unfortunately, this labor of love (IMHO) requires careful overrides to represent many of the lacking information in the packaging metadata: poetry2nix/default.nix at master · nix-community/poetry2nix · GitHub ; cryptography being one of the most difficult case : https ://github.com/nix-community/poetry2nix/blob/master/overrides/default.nix#L373-L421 where we need to manually pin the vendor hashes of the Cargo dependencies for each release.

So, even if, pip, let’s say would use our Nix expressions to download our binaries or kick in some “from source” compilation using the right environment, it would still force some level of efforts on the maintainer side and may not provide what a user expects, i.e. all versions of a PyPI package.

Given the current discussion, I am not certain what will be done to address such things and whether current policies will impose everyone to find solutions downstream.
More specifically, multiple people talked about more standardization, for example, I wonder if Python considered integrating CUDF formats into his core and if there are reasons to not adopt this.
Building on SBOM standardization process, if there was a way to specify native dependencies using a more or less unique identifiers, I think, it would help drastically the ecosystem.

Also, I have to add that Nix community is probably very interested into all of this and is interested into helping how they can and collaborate on this, we probably don’t have anything for Windows (except through WSL2 I suppose or if someone finishes a Nix for Windows at some point) though.

BTW, I see that other language ecosystem were mentioned, I have to say that I’m not exactly sure that Node.js or Java is somewhat better at all, as distribution maintainers, we sometimes have worse problem than Python ecosystem which provided stuff like site and all to enable our use cases.

Thank you for your time and the interesting exchanges here.

[Had to split the links because of Discourse hostility to new users…]

pf_moore · January 16, 2023, 3:58pm

How would the PyPA say that? That’s a serious question - I genuinely don’t know how the average user distinguishes between a formal recommendation from the PyPA and a bunch of random documentation they found on the internet. Is packaging.python.org genuinely that influential?

I say that very much aware that if you take packaging.python.org as definitive, the PyPA recommendation is already to use hatch. It’s a pretty mild recommendation, because there was a lot of agonising when we did the rewrite about not supporting people who chose other tools, or repeating the fiasco of pipenv, but it’s definitely there. And yet the survey results suggest that the vast majority of users aren’t following that recommendation.

As regards “clear documentation”, we can write documents on packaging.python.org, but ultimately a lot of it is going to be down to the tool maintainer(s), not the PyPA.

Yeah. So setuptools, then. People are working on alternatives (although they are mostly working on the hard problems in that area, not so much on simpler “I wrote an accelerator in C” use cases) but right now, native code means setuptools, and unpleasant problems like fighting with gcc vs MSVC on Windows.

We simply don’t have a better answer for native code extensions yet. So does that mean we offer no answer to the user complaints about complexity, or do we offer an answer now, with the qualification that it doesn’t cover native code very well, so for that you need to fall back on other, well-supported and stable but not “unified” solutions?

Again, that’s a genuine question that I don’t know the answer to. The survey didn’t (as far as I know) break respondents down by whether they care about native code, so we have no information there. Nor did it distinguish between people who are happy to just consume prebuilt binaries, and people who want (need) to build from source.

steve.dower · January 16, 2023, 4:00pm

I suspect for most users, “preinstalled with the python.org installer and on PATH by default” is what it would take. (And yes, I know that’s not a PyPA thing. But users don’t, and that’s the point.)

pf_moore · January 16, 2023, 4:06pm

Agreed that this should be a separate discussion, but I think that what people are maybe thinking about is a higher-level strategic policy setting group. Maybe this is what the PSF should be, or is? Or maybe nobody does this, but we both (core devs and PyPA) think that “the other lot” handle that without us…

I know I hoped that the “Packaging Project Manager” role would cover more of that higher-level strategic decision making aspect, across language and tools. But sadly, I think that role is probably just as much a cat-herding exercise as anything else to do with packaging

oscarbenjamin · January 16, 2023, 4:15pm

That’s not how any of it reads to me. Under “get started” there are 5 links to pages:

To get an overview of the flow used to publish your code, see the packaging flow

To learn how to install packages, see the tutorial on installing packages

To learn how to manage dependencies in a version controlled project, see the tutorial on managing application dependencies

To learn how to package and distribute your projects, see the tutorial on packaging and distributing

To get an overview of packaging options for Python libraries and applications, see the Overview of Python Packaging

I just opened all 5 pages and searched for “hatch”. The only mention of hatch as a tool is as one tool among a range of possible alternatives. There is very little information given for me to be able to evaluate which of the alternatives might be appropriate for me and certainly not any judgemental statement about which tool is “recommended”.

There are no examples of how to do anything with hatch and plenty of examples of doing things with other tools instead. There is an example of how to configure hatchling in pyproject.toml as a build system. Note that hatchling is the part of hatch that I definitely can’t use because it’s a build system that doesn’t build native code.

pf_moore · January 16, 2023, 4:18pm

Precisely. So that’s something we need closer links with core Python over. The ensurepip model is a good start, but maybe it needs to be superseded by a more general bootstrap_packaging mechanism? That would be a very good discussion to have, but I don’t think it can be covered by the packaging community in isolation. The core devs have concerns (for example, we bundle pip because offline installation is key in some situations, and I assume “always installing the same version of pip” matters more than getting an up to date pip when an install happens with online access) which need to be represented, and various redistributors will want to have their say, and will almost certainly be more responsive to proposals seen as coming from core Python / the SC than from the PyPA.

That’s probably a good, self-contained action - modernise the ensurepip mechanism to provide a more flexible means to distribute “the official packaging and workflow tool” as part of core Python.

Now all we have to do is agree what tool to distribute

bonk · January 16, 2023, 4:32pm

Node.js did just that:

Corepack is an experimental tool to help with managing versions of your package managers. It exposes binary proxies for each supported package manager that, when called, will identify whatever package manager is configured for the current project, transparently install it if needed, and finally run it without requiring explicit user interactions.

steve.dower · January 16, 2023, 4:42pm

I guess I’m speaking more from the core dev side here, but I personally would/will apply the same logic as I’ve been using for the oft-requested changes to how PATH works on Windows:

We have one chance to break everyone’s defaults, so we need to get it right.

Replacing pip as the default install is such a break, so if we were to go down that path, we’d better be really sure that we’re improving things for users enough to justify the breakage.

I mean, this really is PEP 517, as far as we can do on our side. No “normal” user is going to want to touch anything called “bootstrap”. They’re going to want their IDE to have an opinion about the installer/environment manager.

We keep conflating package install tools with package creation tools, and while both have issues, I don’t think they need to be resolved with each other.

Should I use venv/virtualenv/pipenv/poetry/PDM/etc.?
Should I use setuptools/hatch/flit/meson/etc.?

We can answer each one separately, and my gut feel is that most complaints are about the first question, not the second. (It’s not helped when tools try to do both, but they’re not the PyPA ones, so outside our scope.)

FWIW, I’ve never had any colleagues complain about having to use pip. It’s usually the virtual environments that cause the angst, not installation. Similarly with the educators I’ve spoken to (and hence why I put up 582 in the first place, where the pip integration is crucial and supplanting today’s virtual environments is the point). Even conda environments are what people don’t like, not the install command.

I’m not sure how best to validate this with users, but it does seem that fixing “I want to install these packages just for this project” would go the longest way to helping.