Python Packaging Strategy Discussion - Part 1

johnthagen · January 16, 2023, 4:48pm

I think one thing most of us can agree on is that a unification will require a significant amount of effort (whether we’re talking about a new rustup/Cargo-like experience, unifying on a tool such as Poetry/hatch/PDM and adding native build extension support, etc.).

Would a good outcome of this discussion be a proposal to the PSF to fund this? Clearly many users care about this and many of them work at companies, so perhaps we should give commercial companies to opportunity to fund this effort to help make tangible progress.

theacodes · January 16, 2023, 4:53pm

so perhaps we should give commercial companies to opportunity to fund this effort to help make tangible progress.

A lot of bigcos are circumventing community packaging tools by way of their own complex build tools like Bazel & Buck. However, platform companies do have customers that often have to deal with Python packaging tools when building and deploying applications. I think that could make the case for funding more relevant to some of the folks with deep pockets.

pf_moore · January 16, 2023, 5:16pm

I wasn’t necessarily suggesting we replace pip. Still having pip in whatever suite of things gets installed by ensurepackaging (if you don’t like the “bootstrap” term) seems entirely feasible to me, if only to maintain support for existing workflows.

Like with ensurepip, I wasn’t expecting users to invoke this, it’s a mechanism for the installer (and venv) to populate environments with the “default 3rd party packaging tools”.

Maybe some people are, but I’m trying to focus on user workflow tools (as that’s what I believe the complaints are about). So it’s not about build backends or environment managers, so much as the higher level question “what command do I use to manage my project?” A project can be a library that you’re building (involving a build backend) or an application (maybe involving an app bundler) or a data analysis project (involving Jupyter, some analysis libraries and some notebooks) or even just a simple script (in a scratch directory). To that end, I’m considering how we’d deploy that “project workflow” tool, but I’m agnostic on lower level details (a tool could have a plugin mechanism that allowed the user to install backends, pyproject backend meson).

That’s my personal experience as well. That and the related issue of project management (where managing a project usually involves managing the project’s environment, but includes more, like initialising a new project).

steve.dower · January 16, 2023, 5:25pm

Agreed.

And in light of the “bigco funding” question, aren’t we lucky that @brettcannon is desperate to build this tool on his company time, if we can just agree that it should exist

(Or as I keep encouraging him, just do it and ship it in VS Code and let it happen But he wants everyone here to agree first, presumably so that it’s somewhat usable outside of the IDE.)

theacodes · January 16, 2023, 5:53pm

Very happy that Brett puts his precious energy into this corner of the developer experience, but I sincerely hope for more comprehensive and sustainable funding and resources than one or two developers that are a re-org or two away from not being able to work on it anymore.

steve.dower · January 16, 2023, 5:56pm

To clarify, Brett and his team, but agreed.

And if something does come from one specific team like that, it’s likely to be “owned” by the team, rather than automatically being a community governed project. Hence why it’s better for it to start out here and then be supported by those who find it useful, rather than being built by bigco out of necessity.

cgdae · January 16, 2023, 8:20pm

[I look after the Python bindings for the MuPDF library; the build involves a customised build system for the C code, then C++ and Python code generation using clang-python and SWIG.]

I don’t pretend to have a view on the overall packaging landscape, but i wonder whether the difficulties of building native code packages could be fairly easily reduced. I’ve spent quite a lot of time looking for what’s available but apologies in advance if i’ve missed something.

Until we have a definitive everything-for-everyone system, there are some fairly simple low-level functions that, if provided and documented in default Python installations, would make it relatively easy to write one’s own setup.py for an arbitrary native-code package when distutils/setuptools is not usable.

Something basic - provide some standard functions that return the tags that make up a wheel’s filename. For example avoiding the need to call distutils.util.get_platform().replace('-', '_').replace('.', '_'), which is apparently going to be removed and anyway doesn’t match cibuildwheel’s platform tags.
Functions to verify the format of package names and version strings etc (instead of the regexes described in Core metadata specifications - Python Packaging User Guide and PEP 440 – Version Identification and Dependency Specification | peps.python.org).
A function that creates METADATA files in the correct format from a dict of key=value pairs (or perhaps a class that enforces Core metadata specifications - Python Packaging User Guide). Doing this by hand, and finding out the hard way that pypi.org doesn’t seem to support the | multline format, is not fun!
Provide a function that returns where files should be installed. This might be as simple as returning sysconfig.get_path('platlib') or sysconfig.get_path('purelib'), but hopefully with some explanation about why.

Building on these:

Functions that takes a list of annotated filenames and a metadata dict, and builds them into a wheel, and/or install them into the correct installation directory.

A bit more tricky:

Could we extract distutils/setuptools’ code that finds Windows’ compilers and linkers and have a function that returns (cl.exe, link.exe)? This would allow build_wheel() to be fairly easily written to support arbitrary (i.e. too non-standard for distutils/setuptools) SWIG-style projects on Windows.
Definitive documentation about what command-line args pip will pass to setup.py so that one knows what to support.

One other thing - PEP-517 is great, but requiring that build_wheel() knows how to create a wheel, seems to cause unnecessary work. Could we also allow it to return a list of annotated filenames that the caller then turns into a wheel? Similarly if would be helpful if build_sdist() could similarly return a list of filenames.

Regarding documentation, for me it would really help if, when a PEP is officially accepted, the central documentation is updated so that there is a definitive location for the new content. Trawling through PEPs and coping with some being out of date or superseded, and following links etc, is quite frustrating.

Apologies if this seems like a wish list for others to carry out for me. I’d be very happy to try to help with any of the above. I have a fairly small module (1,200 lines, experimental version is at: git.ghostscript.com Git - user/julian/mupdfpy.git/blob - pipcl.py) that implements most of the above, which i think shows that this sort of limited-scope approach is practical, and might even be fairly easy to standardise.

pf_moore · January 16, 2023, 8:35pm

Many of these are available in 3rd party libraries like packaging, or will be available from wheel once that project releases a version with a programmatic API (which is in progress). Any missing ones could be added (or new 3rd party libraries created).

Is that sufficient for what you’re asking, or is it essential for you that this functionality is in the Python standard library? Because that would be a much more difficult thing to achieve, and would require the core developers and the steering council to agree.

Ignoring out of date, soon to be removed, code paths, pip doesn’t call setup.py at all these days. All calls to the build backend are via PEP 517 hooks. So that one’s easy to achieve

pradyunsg · January 16, 2023, 8:48pm

Have you seen https://pip.pypa.io/en/stable/reference/build-system/setup-py/? Or are you looking for more details there?

cgdae · January 16, 2023, 10:44pm

Paul Moore:

cgdae:

Until we have a definitive everything-for-everyone system, there are some fairly simple low-level functions that, if provided and documented in default Python installations, would make it relatively easy to write one’s own setup.py for an arbitrary native-code package when distutils/setuptools is not usable.

Many of these are available in 3rd party libraries like packaging, or will be available from wheel once that project releases a version with a programmatic API (which is in progress). Any missing ones could be added (or new 3rd party libraries created).

Is that sufficient for what you’re asking, or is it essential for you that this functionality is in the Python standard library? Because that would be a much more difficult thing to achieve, and would require the core developers and the steering council to agree.

Thanks for responding.

I’ll have to take whatever is available i guess, though in practise my pipcl.py library already does most of what i need, with what i think is a fairly simple API, so there’s a fairly high barrier to move to something else if it’s not in the standard library.

In the mean time, i’d be interested to look at what is being planned for wheel; i hope that it will be a superset of the functionality i’ve talked about here.

But… while i appreciate getting core developers and the steering council to agree on things like this is not trivial, i have to confess to some confusion here.

Packaging is surely fundamental to Python, so why is the implementation of packaging left to 3rd parties? Python doesn’t leave 3rd parties to provide differing interfaces and implementations to allow running of commands, instead it provides the subprocess module in the standard library. So why rely on 3rd parties for something that is of comparable importance?

Yes, i understand that. The trouble is, right now, some versions of pip do use the command line, so i need to support it. Maybe pip install --upgrade pip will always fix this, but i’m not yet sure that i should mandate it to make my sdist and wheels accessible to users.

cgdae · January 16, 2023, 10:51pm

Ah, thanks for this, i hadn’t come across this page, it explains some of what i found from trial-and-error when developing pipcl.

h-vetinari · January 16, 2023, 11:13pm

Thanks for the response @pradyunsg. I seem to have failed to get my point across (so I won’t respond to all points individually), because my premise is very simple: Every tool necessary to install/run/use for the 90% case is in scope of the language UX.

So everything is a CPython concern (as far as CPython == Python the language spec), unless it’s sufficiently niche that users don’t commonly tend to run into it. For example, creating the tools/infra etc. for having decent package metadata is something the SC could choose to enforce, e.g. gate new PyPI uploads on correctly formatted metadata.

Sure, documentation is good, but have a look what else would be possible if we wanted to do it:

It would be completely fair game for Python the language to have such a dialogue when setting up anything, with a choice between applications, libraries, contributing to CPython, scratch folders for scripts, etc., with opinionated (but overridable) defaults that get people started quickly and with minimal friction.

So in short: thinking in terms of existing demarcation lines is not helpful IMO when trying to solve the problem of a lack of cohesion.

The issue is that many of the larger questions here cannot be solved by PyPA alone (even @pf_moore isn’t sure how far the “authority” in PyPA extends; perhaps even the SC isn’t either!). So unless PyPA is willing to assert much more influence over anything from distribution channels to PyPI to tooling homogeneity (“all things packaging”), and have that be tolerated by the SC, the only conclusion is that this responsibility falls to the SC.

My point is:

there needs to be a clear answer who resp. which body is responsible for such questions
that body needs to be much more closely involved in the process of finding solutions (than past SCs, assuming the authority lies there) – it’s not enough IMO to decide between PEP X and PEP Y, because there’s a gordian knot of many different intertwined problems, and won’t be solved unless people with decision making power start channeling efforts in some direction.^[1]

I understand that it would be much nicer if such consensus would arise organically, as taking a decision also means taking responsibility and comes with a lot of exposure. But despite the incremental improvements of the last several years, I don’t see how we’ll move the needle on satisfying these user demands for a more coherent UX with the current detachment of packaging issues from the rest of the language and the effective decision making authority.

this is in stark contrast with other parts of the language, where the core foundation of what makes Python Python is really not in question (imagine how unthinkable changing very visible things like list comprehensions or dicts in any major way would be), and improvements can be scoped into much more reasonable chunks ↩︎

dstufft · January 16, 2023, 11:47pm

I think, practically speaking, there isn’t a good way for the PyPA to bless a singular tool in a way that people will actually recognize it as “the” tool to use. Mechanisms I can think of for doing so:

Bundle it with Python, deprecate the bundled pip.
Create a marketing campaign on PyPI, mailing lists, Twitter, etc promoting the use of some hypothetical tool.
Something else?

More fundamentally though, I don’t think that such a tool exists currently. There are a number of tools that could maybe be it, but nothing out of the box. To be honest though, I’m not even sure that such a tool could exist.

This is a lot more complex of a problem for Python than any of the “better” ecosystems, because Python’s packaging toolchain currently attempts to solve a much larger and harder problem than those “better” ecosystems do.

I also think that the survey isn’t a great way to determine if this is what we should be focusing on either, since it doesn’t provide any context or trade offs involved in achieving that goal. I suspect that 100% of the users that want a unified tool, just blindly assumed that whatever their preferred workflow, or something like it, would of course be included in that tool, and they don’t consider that they might have to make drastic changes to their workflow to get it-- but somebody is going to have to make drastic changes, because the reality is what exists now in the world are so varied that a singular tool can’t possibly solve them all IMO.

brettcannon · January 16, 2023, 11:47pm

It’s true cross-platform.

I’ve brought it up before, but no one has stepped forward to want to concretely tackle it. Usually the mention of bundling OpenSSL is enough to scare people away.

I can say we have not, and the reasons not to adopt it is I suspect most of us don’t know what “CUDF” is (yet).

It’s not what the PSF is. In general, think as the PSF as the place that handles funding stuff for the Python community and general community outreach (there’s other stuff like IP/legal stuff, but that’s not important here). They key thing is the PSF isn’t in a position to make that sort of decision for us. As an example, the PSF got funding to hire Shamika as a packaging PM, but she isn’t making decisions on our behalf.

If we wanted to see policy-setting group, we would have to set up that group ourselves (e.g. a packaging SC). If that’s a route we want to go, I’m happy to share Python core’s experience with the setting up the SC, etc.

I think so, but more from the perspective that we came to an agreement of what we want to see happen than the proposal itself.

Uh …

To be clear about what Steve is talking about, we have discussed having the Python extension for VS Code rely on the Python Launcher (or at least its environment/interpreter code) to find what’s installed/created. (We also moved our tool support out to separate extensions using LSP, so that other editors can use that same LSP code for that tool instead of having to reinvent their own support from scratch because we believe anything that can live outside of VS Code should, but that’s off-topic).

Wanting consensus is also because I don’t want to be in charge of declaring a “winner” in this situation purely based on VS Code’s reach. It’s a key reason I keep pushing for standards for things, so that we are just doing tool integration at the end of the day instead of having to invent some solution or decide which approach is “best”.

But admittedly, I get pressure at work to come up with an “opinionated” flow of how things should work by default in VS Code regularly (while making sure other workflows are still possible inside the editor, so no one panic please ). And as somewhat pointed out already by others, environments – virtual or conda-based – are the biggest issue we are trying to simplify and regularly have to deal with (after that is the lack of a lock file format; pyproject.toml is obviously great ). And the difficulty is honestly from the myriad of ways people create and manage virtual environments (hence Classifying Python virtual environment workflows ).

https://packaging.pypa.io/en/stable/tags.html#packaging.tags.parse_tag

Utilities - Packaging , Utilities - Packaging

Working on it (although a bit more structured than from a dict).

Historically, because packaging didn’t interest Guido. Granted, it took quite a while after Python’s initial release before the need for a package manager as everyone downloaded zip files in the '90s. But once things started to seem to want/need a package manager, other folks in the community took the initiative, and they weren’t usually core developers, so it just happened to be developed separately.

brettcannon · January 16, 2023, 11:52pm

PyPI doesn’t not fall under the SC; it’s a PSF-provided resource. And I believe PyPI wants to use what’s planned for packaging.metadata to validate PKG-INFO/METADATA files at upload time.

PEP 609 – Python Packaging Authority (PyPA) Governance | peps.python.org outlines how things are structured and links to the standing delegation. So the SC is aware as we approved PEP 609.

pradyunsg · January 16, 2023, 11:54pm

Well, as far as I can tell, we’ve not wanted to assert influence/specify how people should do things.

Since at least I have been involved, we’ve focused on interoperability and enabling diverse tooling. PyPA Goals — PyPA documentation does a decent job of explaining the direction we’ve gone in over the last ~5-8 years.

TBH, IMO this is because when this was attempted with pipenv, the experience wasn’t great and it did result in active folks stepping away from contributing Python packaging (and pipenv) so… there’s a level of caution around this which has led to a communication vaccum that hasn’t been filled by any authoritative voices.

dstufft · January 16, 2023, 11:56pm

Without speaking for the other PyPI admins, I’m pretty sure we want PyPI to be as strict as possible on uploads in terms of validating correct packages are being uploaded… but “as possible” is carrying a lot of weight there.

Specifically to Metadata, yes PyPI should validate those on upload, it just requires time and effort

pf_moore · January 17, 2023, 12:08am

I think this is a good point. Many of the questions we’re struggling with ultimately require someone (or some group) to just make a decision. There is no consensus, there is no “obvious answer”, there’s just a choice to be made. And no-one is willing or able to make that choice and make it stick.

If I said “PDM is the official packaging workflow tool, poetry, hatch, conda and any other tools operating in this area are now obsolete”, no-one would listen. And I’m about the nearest thing we have in the packaging community to an “ultimate authority” - I could even claim that choosing a tool is an “interoperability matter” so that it’s within my delegation from the SC.

So if I can’t make such a decision and make it stick, and we never achieve consensus, what option do we have other than asking the SC? And of course, they will (entirely reasonably) say that they have far less expertise in this matter than we do, so how can they choose?

What I will say is that if the community want someone to make an arbitrary decision, I’m willing to do so (although I would immediately submit any decision I made to the SC for ratification - I wouldn’t even consider this without explicitly ensuring the SC was OK with it). If the community can deliver a strictly limited number of proposals, each saying “choose X as the PyPA’s official recommended project workflow tool”, and agree to support whichever proposal gets chosen, I’ll make that decision. But I do think that’s a last resort - and I’m not sure people are ready (yet?) for a “just pick something and be done with this” solution. As @pradyunsg noted, the pipenv experience has burned a lot of us here, and no-one wants to go through that again.

h-vetinari · January 17, 2023, 3:45am

Fully agreed, except that…

… such a choice needs a process to bring people on board, so they don’t just feel railroaded but part of the effort. This is arguably more painful / difficult than even the decision itself, which is why I said that it needs involvement along the way of whoever actually has the decision making power.

If you hand down a decree from the heavens, a substantial part of users will indeed not listen (principally those whose workflow you just broke in some way). But if people had a chance to get involved, if ideas were hashed out sufficiently that people can empathise with and relate to the decision and its trade-offs, I think the percentage of “unconvincables” would be much, much lower, and the large majority of people would just move on and start focusing their efforts on building/improving that new world.

abessman · January 17, 2023, 7:44am

Brett Cannon:

To be clear about what Steve is talking about, we have discussed having the Python extension for VS Code rely on the Python Launcher (or at least its environment/interpreter code) to find what’s installed/created. (We also moved our tool support out to separate extensions using LSP, so that other editors can use that same LSP code for that tool instead of having to reinvent their own support from scratch because we believe anything that can live outside of VS Code should, but that’s off-topic).

Wanting consensus is also because I don’t want to be in charge of declaring a “winner” in this situation purely based on VS Code’s reach. It’s a key reason I keep pushing for standards for things, so that we are just doing tool integration at the end of the day instead of having to invent some solution or decide which approach is “best”.

If VSCode were to spearhead the rollout of a new, unified packaging toolchain, a significant number of people won’t touch it based only on the (perceived) source; such a move would be interpreted as step two of Embrace, Extend, Extinguish. Whether that interpretation is fair or correct (I’m sure it’s not) is besides the point.