Python Packaging Strategy Discussion - Part 1

rgommers · January 12, 2023, 11:57am

Not necessarily. And doing that directly with, e.g., pip seems difficult indeed. There’s multiple levels of integration possible I’d say. For example, a hook mechanism where other package managers can register with Python installers; the Python installer then passing that hook the full list of packages for each thing it wants to install, and the other package manager returning a list of which ones it knows about. Whether that then auto-invokes the other package manager or gives the user a list back to deal with that themselves is a choice. I’d suspect that pip would do the latter, and a higher-level tool the former.

The exact way in which you phrased it implies: system Python + virtualenv + wheels from PyPI. That works today and can of course still be made to work unchanged in the future if one wants. It’s just not the best/recommended setup, and it will not improve with any of this.

If you have another package manager, it has its own environment concept, not a venv/virtualenv. So if your other package manager is conda, it’s a conda env. Spack, a spack env. Nix, a Nix env. Linux distro - the system environment (there’s only one, no virtualenv-like concept there). Fedora’s wheel rebuilds that @encukou just pointed out are cool - and they seem like a great extra option if a distro wants to do that. But it’s a ton of work, so I’d expect most distros not to offer that, and it seems questionable to me to require it. ^[1]

You can still use pyenv plus manylinux wheels, exactly like today. Or you can use the system versions in a smoother way - but limited to the versions of packages the system has. Or you can use another distributor which has a concept of environments with create/activate/deactive/destroy.

For the average user, and certainly when starting out, it’s safe to say that they do not really understand the tradeoffs. If python.org and the most prominent documentation resources focus primarily or exclusively on PyPI/pip/wheels, that’s what they’ll go with. For the people that do know and can make an informed choice, no regression is proposed. If you use python.org Python or pyenv by informed choice and are on a mainstream platform, you’re already fine and very little changes.

And again, rebuilding PyPI as wheels doesn’t solve the hardest problems, the ones for which there’s nothing on PyPI - you want the distro’s packages for that. Having one working version of something from the system is better than zero versions of that something (@oscarbenjamin has given a very concrete example here with gmp and fmpr). ↩︎

sinoroc · January 12, 2023, 12:25pm

@rgommers The idea that you are presenting here, is it only with Python in mind?

[It seems to me like this is a rather huge undertaking wherein the Python-specific parts are comparably small. And if such a thing is built where a Python installer can hook into other package managers, I think I would expect this interface to be useful in other ecosystems as well (Node.js, and so on).]

steve.dower · January 12, 2023, 1:53pm

Continuing the discussion from Python Packaging Strategy Discussion - Part 1:

If you’re committed to that OS already, then you’re committed to its key infrastructure packages. You won’t want to install “latest and greatest” binaries that use different versions of those packages, newer or older. You’ll want binaries that match what you have.

Only an OS-specific index (not necessarily owned by the OS vendor) is going to be able to provide those. (Or an index that also provides the key infrastructure packages.)

So… are we offering to be system integrators/distributors? Or not? It seemed earlier that we didn’t want to be doing that, and now you’re suggesting that we are?

The lack of alternative distributors is definitely a problem, I accept that. (I don’t accept the “essentially no influence” point, and I regularly challenge users to pretend that they do have influence and ask the question… guess how it turns out They usually do have influence.) Unfortunately right now, and for a number of years, the message going out is that the python.org distros are The One True Distribution and nobody should bother setting up an alternative.^[1] That message is something we can choose to change, if we want to, and in five+ years we might actually have thriving alternatives to whatever-build-the-dev-cobbled-together-in-CI.

That’s not a hypothetical either. We had people at Microsoft give exactly that response a couple of years ago when it was proposed that we make a pre-built distro of Python packages for Windows. It’s also very strongly implicit in virtually every “Debian unbundling vs. Python” argument that comes out from our (Python) side. ↩︎

pf_moore · January 12, 2023, 3:33pm

Frankly, at this point I don’t know. The discussion seems to be switching back and forth each time I read a comment

Taking the end user view here:

I currently use PyPI and wheels. I don’t think of it as a “distribution” as such, just “the way you get Python packages by default”. We can argue over why we think like that (PEP 453 is certainly a big part of it) but that’s not the point here.
I’ve been told repeatedly that I shouldn’t expect the position with what wheels are available on PyPI to get worse.
I’ve never had an issue installing a wheel from PyPI. I haven’t had a problem with binary compatibility, or architecture specific issues, or any of the problems being discussed here. Mostly that’s because pip picks up binaries by default, so I’m relying on the efforts of the developers, but (as an end user) so what? It works.
Even when I’ve had to build from source, I very rarely have issues - pure Python code builds fine, and I have a version of Visual Studio installed, so even simple C extensions aren’t a problem.
I always get the latest version of whatever I use (constraints and the dependency resolver permitting). Sometimes I have to wait for binaries for a new Python release, but that’s typically not a major delay.

If that’s not a distribution of Python, then what do I call it? And why shouldn’t I recommend it to people as a good (arguably the best) way of getting Python? Certainly if they have specialist needs (like advanced ML or scientific workloads, or custom embedding/non-Python language integration) I might point them at something else, but it’s extremely likely they already know more than me about the options in that case.

Yes, I’m talking purely about Windows here. Yes, I know it’s not this simple on Linux/MacOS. I’m fine with the discussion taking a different perspectives for those platforms.

But honestly, this seems like a major digression anyway. The subject of this discussion was supposed to be, how do we address the user feedback that there’s too much uncertainty and too many options when working with Python packaging. I don’t see how this helps us come to a conclusion around that question (although it’s quite possible I’ve lost track of the point here, if so then my apologies).

henryiii · January 12, 2023, 4:14pm

That’s exactly what I was describing, that’s why I said:

immediately after it.

dholth · January 12, 2023, 5:25pm

In conda, if you upload a new package to change metadata, the solver can find the old one to correct a dependency problem. Suppose build 0 requires “dependency > 3”. “dependency 4” is released and breaks backwards compatibility, so you add build 1 to require “dependency > 3, dependency < 4”.
The solver can still prefer the old build 0 over build 1, if dependency==4 is installed. So it is necessary to patch build 0’s dependencies. Those are taken from repodata.json and not from the archive file, so the archive file doesn’t need to be changed.

sinoroc · January 12, 2023, 5:44pm

While we are talking about metadata and how to improve the UX…

I think installers should offer a practical way for the user to override package metadata (mainly in order to influence the dependency resolver). My impression is that however hard we try, there will always be some metadata that is either incorrect or plays against what the user wants. Of course, being able to fix the metadata in the repositories is probably a good idea, but first I doubt it will always be possible, and second I know there are cases where the user wants to override the package metadata anyway.

steve.dower · January 12, 2023, 5:57pm

You undersell yourself here You are way more informed than the vast majority of users, including those who came in through a slightly different path, landed in Conda-land, and never saw a need to look further (for exactly the same reasons you gave for looking no further than PyPI).

The reason not to recommend PyPI-only is, indeed, “specialist needs”. But that’s the same reason to not recommend anything “only”. Specialist needs are why Conda users also use pip, and why Debian users also install from source, and so on.

And what we’ve seen from surveys and feedback is that most people don’t think they have specialist needs. So you can’t ask them and then suggest a path forward, because they can’t tell you. They just think that the tool is broken because it doesn’t do whatever perfectly normal thing they need (e.g. installing a database driver, or replicating the environment on deployment, or bypassing IT’s enterprise grade proxy ).

Our discussions right now seem to be threading between two sets of specialised needs that are not well served today:

packages with unspoken/unlabelled ABI requirements
packages with insufficient/outdated metadata

(Hmm… when I spell them like that, they seem kind of similar )

As with all specialised needs, nobody thinks they need them until it turns out they need them. Nobody could have told you at the start that they were going to need them. And now that they know, some are going to be frustrated that we didn’t tell them up front, or that we aren’t doing anything about them now.

h-vetinari · January 12, 2023, 7:03pm

I don’t buy the equivalence of that commitment. People have many reasons for choosing an OS, or rather not changing it, because the amount of things they’d have to change is so large.

But that doesn’t make them (in my view) “committed” to being bound to the OS’ infrastructure packages, especially if something else can provide them with a high enough degree of isolation that it works and doesn’t threaten the system’s stability^[1].

The MacOS footnote I had illustrates this well IMO - people on EOL MacOS still want to be able to install the latest and greatest packages, but haven’t even upgraded past the OS version that prevents them from doing so in many cases. Same goes for people on corporate RHEL installs.

which is admittedly nontrivial, but not impossible. ↩︎

steve.dower · January 12, 2023, 7:15pm

Sure, and this is exactly what Conda and Nix (and probably others) do. They bring all of the key infrastructure packages with them and isolate them from the OS. Implicit in my response was the assumption that the user is using “whatever Python was preinstalled” and they refuse to change, which has been an explicit statement a few times in this thread.

If the user is willing to change to a different install of Python from a different source, then they can be bound to the infrastructure packages used by that one instead. But you can’t get out of having to use the same infrastructure throughout your stack, no matter where you get it from (unless you build it all yourself, which makes you the distributor, and you can update as frequently as your distributor wants to )

pf_moore · January 12, 2023, 8:52pm

You misunderstood my point. Someone who knows they have specialist needs has already got enough knowledge to be able to deal with the choices involved. The question I’m asking is what’s unreasonable about recommending PyPI/wheels to people who don’t know or believe they need anything special? Apart from anything else, they then won’t be confused by the vast amount of documentation that already exists assuming you’re using that option.

I reiterate, for clarity - I’m only talking about cases where PyPI/wheels is a good basic option - specifically on Windows, because that’s the platform I know about.

So why not document (as best we can) the boundaries of what the standard PyPI/wheel ecosystem offers, and then present it as the “standard solution”, unifying on that. We can’t do anything about people not knowing what they might need in the future, but we can at least warn them of the limits so they know when they are approaching them. As you say, everything fails to meet some level of specialised need, so “it’s not perfect” isn’t a compelling argument here.

But in practice, @smm said

The topic here is tools, not the underlying PyPI/wheel infrastructure. Maybe there’s value in debating how we can integrate other package distribution ecosystems with the PyPI/wheel system - but it should be a separate discussion IMO.

h-vetinari · January 12, 2023, 9:11pm

I’m not sure we can read the minds of the survey respondents like that. What I hear from the comments is that “everything should be replaced with something simple & unified” – I’m not saying that’s realistic, but I strongly doubt that many users of that mindset care particularly about the infrastructure or binary formats behind their UX, much less about the existing ones.

abravalheri · January 12, 2023, 9:53pm

(I help to maintain setuptools, but I don’t speak for the project, these are my personal opinions)

The following is a comment on how I think we should do an unification

If we ever unify the tools for Python packaging, I think it is important for this hypothetical tool to be able handle legacy:

It is not fair with users to say: “now we have this single tool that everyone is supposed to use” but if they need an older package, expect them to ignore all the most recent documentation that can be found on the topic and go figure out how to maintain something different.
It is not fair with any tool developer to discourage the use of a tool but at the same time place the burden of maintenance of the ecosystem in their shoulders.

This is not easy and requires a lot of work. Of course there is also another way: declare everything that is not compatible with this hypothetical tool unsupported, as mentioned by Paul:

The following is not an argument against or in favour of the proposal, just me expressing feelings and thoughts that have been puzzling me. Hindsight bias, I know, but it is not easy to ignore…

It is important to recognize that this problem of “too many tools” (if indeed this is really a problem), is partially a problem of our own making. Years ago there were some “de-facto” standards in the ecosystem, and the packaging community invested a lot of effort to create standards (with a more or less declared goal of removing the monopoly of such tools). My opinion is that it was a noble goal, it incentivised the openness, and created opportunities to handle niche requirements or to experiment and try new things.

Going back to “there is only one blessed tool” feels like a throwback… If this was the goal since the beginning, the community could have saved time/money/energy if everyone worked together to fix/improve/rebuild existing tools instead of splitting the efforts.

Is it respectful of the work people have put in? (I don’t speak only of setuptools, but in general). The maintainers have put love and hours of work trying to make the ecosystem better by creating and complying with interoperability PEPs… If they had knew since the beginning that eventually PSF would endorse only one tool and “would not recommend X”, would they invest the same amount of love/effort? Was it worthy to comply with interoperability PEPs? Would it have been better instead if we all had worked together towards making a new tool and to move all the packages to this tool?

Moreover, is it respectful of the work the users had to put in to adapt? This process of standardisation was the source of a lot of “growing pains” that was imposed to the community because at that point in time it was deemed necessary that no single tool had preferential treatment in the Python packaging ecosystem. But now we are talking about giving a single tool preferential treatment…

oscarbenjamin · January 12, 2023, 10:33pm

My experience is more along the lines of Steve’s point here. A great many Python users do not realise that their own usage of Python (and “third party” packages) might be considered to be “specialist” or “niche” by someone else. I speak to many people who wouldn’t dream of using a language that didn’t have say multidimensional arrays so the idea that someone could use Python without NumPy would seem extremely strange to them. You really can’t expect all people to understand that “Python” is used for a wide array of things that are not related to their own use cases and that that is indirectly why people have wildly different ways of setting up and installing things. You might be imagining that people doing AI/ML must be more proficient in the basics of programming and software engineering than people using Python for other things and would therefore have a clearer understanding of the Venn diagram of Python ecosystems but that’s absolutely not the case.

h-vetinari · January 13, 2023, 9:05am

I mentioned this in the other thread:

To my mind, it is not a question of lack of respect for the respective maintainers, who’ve done a fantastic job in very challenging conditions^[1]. It may be my biased view, but in view of the scope of the problems to solve, as well as the lack of deeper language integration of packaging, the interoperability PEPs were the only halfway realistic path forward – no single project could reasonably hope to take on the responsibility of serving the entirety of the Python ecosystem by itself (without systematic support, i.e. language commitment).

One the one hand, having competing solutions is great for innovation, but horrible for duplication of work. And as the survey shows, users don’t exactly appreciate that decentralized and fragmented approach. We may yet get to have our cake and eat it too, if indeed we manage to hide all those different tools behind a unified interface, and I think it would be a large improvement, even though I doubt we can avoid those interfaces leaking implementation-details of the backends quite heavily.

In any case, if there were a drive towards a more centralised solution, I certainly would not see this as disrespectful towards those who have gotten us as far as we are now. I get the emotional investment in something one has spent a long time working on^[2], but ideally, we should be able to uncouple the design decisions going forward from previous efforts (especially if we can agree to remove/lift/change some constraints^[3] that all-but-forced certain decisions at the time).

huge amount of responsibility for a thankless task that makes people scream loudly if anything breaks ↩︎
and I certainly won’t claim that I don’t occasionally fall prey to that as well ↩︎
talking generally, not alluding to any specific one here ↩︎

rgommers · January 13, 2023, 10:16am

Thanks @abravalheri for expressing that point of view. I have similar feelings around respecting maintainers time - both what they’ve done in the past, and what we may be asking of them in the future. And I think the standardization of metadata in pyproject.toml and of build interfaces (PEP 517 & co) is one of the success stories of Python packaging. No need to turn back on that one and aim for unification of build tools imho.

Overall I think we still are trying to figure out what is feasible and a good idea to unify, or not. The message from users is that the current state of things is still not great, and to please unify something - but it’s very much unclear what that something is.

@pradyunsg asked that question pretty explicitly. I’ll repeat it here, with my answers:

Unification of PyPI/conda models → NO
Unification of the consumer-facing tooling → NO
Unification of the publisher-facing tooling → NO
Unification of the workflow setups/tooling → YES
Unification/Consistency in the deployment processes → NO
Unification/Consistency in “Python” installation/management experience → NO
Unification of the interface of tools → YES (as much as possible)

It’d be great to see others’ answers to this.

Regarding some of the other topics in this thread, I think they come in because there’s a number of inter-related things here. Because if you say something should be unified, you should at least have some level of confidence that it’s a good idea to pursue that unification and that there are no hard blockers.

I wrote a blog post with a comprehensive possible future direction, however the content in there all follows from a few things: the what to unify Qs above, Steve’s question on system integrators, and the assumption that major breaking changes have to be avoided. I’d really like to get a better sense of whether others have a similar understanding at this very highest level.

pf_moore · January 13, 2023, 11:14am

OK, here’s my answers.

Unification of PyPI/conda models: NO.
Unification of the consumer-facing tooling → NO, with a caveat. I don’t think we should try to force maintainers to work on a single tool, but if competition between tools results in users choosing a clear winner, I think we should accept that.
Unification of the publisher-facing tooling → NO. I assume this means things like build backends.
Unification of the workflow setups/tooling → PARTIALLY. I very definitely don’t think that (like cargo) we should mandate that every time anyone uses Python, they should create a directory, containing a src subdirectory and a pyproject.toml. The workflow of writing a simple script (with dependencies) in a scratch directory full of “other stuff” is an entirely reasonable workflow that we should support. Having said that, I support unified workflows for the tasks of “write a Python package”, and “write a Python application” (althought I think the latter is something we’ve traditionally ignored in favour of “write a package with a script entry point”).
Unification/Consistency in the deployment processes → NO. Although I’m not 100% sure what this entails. It shouldn’t be user-facing, though, which is why I say “no”.
Unification/Consistency in “Python” installation/management experience → NO. Although I think we should accept that this is not under our control, and like it or not, the main Python website is where people go for advice on where to get Python from. So we should work with the guidance given there, not fight against it.
Unification of the interface of tools → YES (but see below).

I’m not sure I understand the difference between “consumer-facing tooling” and “workflow setups/tooling” though. For the purposes of the above, I’ve taken the former as meaning the actual software, and the latter as meaning the processes. So we can have hatch and PDM, but they should manage the same project layout, expect tests and documentation to be laid out in the same way, etc.

As regards “interface”, there are two aspects - low level details such as the names of options, configuration files, etc., and higher level concerns like what subcommands a tool supports. For example, I’d love to see a shared configuration mechanism, so that users can set their network details and preferred index once. And I’d like a common set of workflow commands (things like “run an interpreter in the project environment”, “run the tests”, “build the docs”, “build the project artifacts”, “publish the project”, …) But I don’t want this to be an excuse to argue endlessly over whether an option should be called --config or -C. And I definitely don’t want it to override questions of backward compatibility for individual tools (which should very much be the tool maintainer’s choice).

Regarding the other discussions in the thread, I support better integration with, and support for, other system distributors/integrators. But I strongly disagree with any suggestion that PyPI and wheels should no longer be considered the primary route by which (most) users get Python packages^[1]. Having said that, I think such support needs to be a two-way street, and if “other system integrators” want to be supported, they need to engage with the community as a whole and get involved with this process - otherwise, we should accept that what support and integration we provide will, of necessity, be limited (e.g., I don’t think we should try to write a “how to use apt to install Python packages” page in the Python packaging documentation, but we could link to a Debian “how to use apt for Python users” page if they provided a suitable link that was written for the same audience that we are addressing).

I also don’t think it’s at all clear from what I’ve heard of the survey results, what the users are asking for in terms of the above questions. And I think that’s a far more important question than what we think^[2].

Maybe it’s a case of “worse is better”, but I strongly believe that without PyPI and wheels, Python would never have achieved the popularity it has. ↩︎
Although if the users are asking for (for example) “cargo for Python”, then my response is “great, I hope someone writes it for them”. Just because the users want it, doesn’t mean that’s where I’ll personally devote my open source (volunteer) time. ↩︎

merwok · January 13, 2023, 3:44pm

Language integration has been mentioned a couple times but without explanation of what that would be.
I don’t see what people mean. The language has an import system, with sys.path supporting multiple sources, and the site module handling site-packages and user-packages locations. Then separate installer tools can look for distributions and install them in the right locations. What else could the language do?

brettcannon · January 13, 2023, 9:06pm

I’m going to refrain from answering because every “no” makes my day job harder, but that doesn’t mean a “no” doesn’t make more sense for the community.

Same here since we are talking about programmers installing stuff to program. Is the difference, “I’m using Python code, but not writing any” (e.g. running some CLI app) versus “I’m writing Python code myself that requires installing something”?

And what does “workflow setups/tooling” mean? Would trying to standardize where environments are created, named, and stored make sense in this scenario (which I’ve been asking the community about over on Mastodon lately)? Is this standardizing on Nox and/or tox? Or is this more about src/ layout and how to specify your development dependencies separate from your installation dependencies?

sonotley · January 14, 2023, 7:47pm

I’m maintainer (and only user other than former colleagues inheriting my code) of snoap which admittedly is not a packaging tool but it does do packaging and deployment via Poetry and pip.

The main reason I devised snoap is to address this distinction in a way that was appropriate for my work environment at the time:

This thread has gone very deep into packaging with native and/or compiled dependencies. I think that’s totally valid because of you pull at any string in a Python environment, you’ll probably hit that stuff sooner or later. However, I’m not sure if the users who answered the survey were really thinking in those terms. I’d guess (and it is a guess, could be way out) that most respondents are not wrestling with those issues on a day to day basis. They are probably more bothered by nagging doubts like “if I use Hatch and my colleague uses Poetry, can we collaborate on a project?” or “I want to share this useful utility that I wrote in Python with my non-technical colleagues, but it has external depencies and I’m dreading having to talk them through how to create a virtual environment” or “is it best practice to set a maximum version of dependencies in pyproject.toml or not” or “I really feel like I should know what setup.py is as I see it all over the place but I’ve never needed one”

This isn’t to say low-level packaging is not the root of some of these issues, but it would be useful to have a clearer breakdown of exactly what user experience is driving the desire for unification, what those users mean by unification, and how they think it would solve their problems.