Python Packaging Strategy Discussion - Part 1

h-vetinari · January 5, 2023, 12:12am

(I’m a regular contributor to conda-forge, a large(ly) parallel ecosystem of python packages that’s used widely in data-, ML- and science-heavy python projects. This does not make me a maintainer of the actual tools – conda, mamba, etc. – that are the UI for this ecosystem)

@smm, have you seen this thread by @rgommers? It introduces a resource page that’s intentionally solution-free to establish a baseline understanding among the various different problems/needs/constraints in the python packaging ecosystem, particularly for those projects involving “native” code (i.e. code wrapped by, but not written in, Python).

With great respect for Hatch, “almost there already” is a big stretch IMO, given all the problems pointed out with (e.g.) native code. This is not Hatch’s fault (nor responsibility), but we’re (IMHO) emphatically not close to declaring success.

As long as the ecosystem currently being served by conda cannot be folded back into “the One True Way”, we have not actually solved the schisms in python packaging (i.e. everyone can just use the same tool). Note that this is not some zealous attachment to conda as a tool or philosophy, but about not regressing the capabilities that are necessary to solve large classes of problems for the “data science persona” at scale. My impression is that this pragmatism is shared by many if not most in conda-land.

Indeed, it is a blessing and a curse, but now we have to deal with it.

To this point, from my POV, the uncomfortable “math” here is to either:

solve most of the problems outlined in https://pypackaging-native.github.io/ (a gigantic undertaking)
define large parts of the data science ecosystem as out of scope (…)

Almost certainly, 2. won’t fly for the SC (who would want to define ~half their user base out of existence), and the wider PyPA community has consistently declared 1. as out of scope (unsurprisingly, given the monumental complexity resp. the available resources).

Both points are understandable for the respective stakeholders, but they are at odds with each other, and (IMO) the fundamental tension underlying the lack of tooling homogeneity.

As painful as 2. looks from a language governance POV, this is effectively what’s happening in various pockets of the ecosystem most affected by these problems (e.g. the geospatial stack), where installation instructions often uniformly recommend an alternate (non-PyPA) installer, and wheels etc. are not provided.

Hopefully this can be mitigated with things like PEP 668 (which would make it less “all-or-nothing” to use other package managers, and could more or less gracefully hand off installation of too-complicated packages from pip/wheels/PyPA to another package manager where necessary), but even achieving that is still a far cry from the “unification” that the survey comments cited in the OP are calling for.