Wanting a singular packaging tool/vision

Gideon · November 21, 2022, 11:43am

(I’m new to this site, please let me know if this is not the place for this comment or if there’s anything else wrong with it)

I’ve thought about python packaging before, and it’s a mess right now. It really scared me off in the beginning when I wanted to make a python package, as there are way too many choices, there isn’t one simple way to make a package. Looking at how other python projects do it, I still see lots of different things, poetry, setup.py, etc.

I thought about how it could be simpler, and IMO python packaging should just start from scratch, and make one single tool and format to manage everything.

There are multiple advantages:

You don’t need to support other/old things
It can be way simpler and more intuitive to create, distribute and install a package, for example:
- By default, it could use src as directory for packages
- With a single option in the config file, scripts/entry points/executables in the bin can be created
Backwards incompatible changes can be made, like:
- Enforcing every package to use semantic versioning (which a lot of packages already use, and enforcing it can make dependency management a lot simpler)
“Looking back” changes can be made, for example:
- Exposing metadata with the API, so downloading packages just for the dependency information is no longer needed

jezdez · November 21, 2022, 1:00pm

Personally, I think it would be a hard sell, since for conda Python is just another package that it expects to have been consistently built and available. E.g. if you’re asking to be able to point conda to an existing python.org Python installation to satisfy the requirement, then that would be contrary to its purpose: to have a consistent build and runtime environment (e.g. matching ABIs).

That is the much more interesting question of the two and I think I’d like to say yes, while knowing that the attempts so far (conda-press, conda-build-wheel) were not fully thought out. I’d rather see an evolution of the wheel format to optionally include conda-style features and make hacks (like the recent nodejs-bin package on PyPI) not needed. But maybe it would be enough to create a “conda” platform tag for conda?

One idea I discussed with @henryiii shortly was whether adding another backend to cibuildwheel that would (via a Docker image) provide the same build environment that conda packages are built in and add a PEP 517 backend to conda-build to produce wheel files out of conda recipes. If it were implemented in conda-build directly, it would be able to handle package name validation etc easier.

Maintainers could continue to use cibuildwheel (which is an automated way of running the build tool) to build their wheel files shipped to PyPI, but ALSO create wheel files that would fit in with conda environments. Whether the latter wheel files would eventually live on PyPI or elsewhere is a different question, though.

In other words, it’s an appealing idea, but I don’t think we are there yet.

That’s the big question for sure. I think so far, due to the way the PyPA is governed, there wasn’t a need to limit the work to one ecosystem alone, and the spec work has a positive impact on the ecosystem, looking at the number of tools other than pip/setuptools.

Given the difficulties with coordinating the work with Linux distros over the years, I think it’s fair to say that there are different opinions on how Python packaging should be done. I feel conda is also just another of those opinions, albeit one from another part of the Python community originally. As such, I’m quite excited about @rgommers’ upcoming project to document the main problems of non-PyPA users with wheels, which he pre-announced above.

pf_moore · November 21, 2022, 1:29pm

That doesn’t work in the direction I was thinking, which is to make conda builds of packages like numpy available to non-conda Pythons, without the numpy maintainers having to build extra binaries.

To be clear, as a user who does not use conda’s Python, my personal interest is in making it as easy as possible for maintainers to make binaries available for non-conda systems. As a packaging community member, my interest is in not splitting the ecosystem (in a user-visible way) - which means ensuring that users will not be in a position where they want to use 2 packages but can’t, because they exist in different “worlds”.

So your interest (if I’m reading this right) is to allow maintainers who currently publish wheels to be able to publish for conda as well? In effect, skipping the need for conda-forge to repackage things? But you’re not expecting builds to flow “in the other direction”?

takluyver · November 21, 2022, 2:03pm

Hi, welcome to discuss.python.org! This isn’t a bad place for your comment. However…

I think everyone involved agrees that the wealth of overlapping tools and concepts makes things more complex than they should be. But ‘start from scratch’ is simply not an option we have.

There are a lot of people using those different things. People who’ve built knowledge and scripts and personal/internal tooling around virtualenv, or pyenv, conda, pip-tools, poetry, hatch, whatever. People who think conda has it 99% nailed, and others who avoid it like the plague. From individuals up to massive international companies. There are only two ways I can see to get from there to everyone using one single tool:

Make something better for all those use cases than everything already out there, so that everyone is convinced to switch to it - good luck figuring out how to do that!
Break all the other tools - this isn’t really possible in our open-source, standards-driven world: tools people like will just adapt to any reasonable change you can make (e.g. to the PyPI APIs).

Anything else… well, I’ll let this frequently-posted XKCD comic say it:

Gideon · November 21, 2022, 2:24pm

I think everyone involved agrees that the wealth of overlapping tools and concepts makes things more complex than they should be. But ‘start from scratch’ is simply not an option we have.

I understand it would be really hard to do something like that, and it would obviously not be an immediate replacement for everything.

Make something better for all those use cases than everything already out there, so that everyone is convinced to switch to it - good luck figuring out how to do that!

Yes, this is hard, but it doesn’t necessarily need to be “better”, it needs to be “at least as good”.
It’s not a good idea to just break everything, but if the tool is successful and can replace other tools completely, they can be deprecated.

I agree that this is not at all optimal, but IMO it’s so messed up at this point that starting over and using pieces of already existing tools to bundle everything together into a single tool.

I wrote part of an application some time ago, before I learnt best practices etc. When I came back to it later, I rewrote the whole application, so I won’t have to deal with badly written/designed code when extending it later.
I don’t think pip or any other tool is badly designed, but it’s way easier to do everything right when you have a clean start.

I fully understand if you don’t agree with me, I mostly commented this to suggest an alternative way to deal with this all.

takluyver · November 21, 2022, 3:20pm

Realistically, it needs to be better, because the existing tools have an ‘unfair’ advantage: people are already used to them, there are already StackOverflow answers and blog posts and so on about them. Especially for packaging, a lot of people will stick with what they already know unless there’s a good reason to do something new.

But even making something as good as the existing tools for everyone they serve is probably verging on impossible. What people need (and want) varies a lot, and the tools that can handle complex needs tend to be awkward & over-complicated for the simple use cases, so lighter, easier tools spring up that do everything some people need.

There’s no central council that can ‘deprecate’ other tools (PyPA sounds like that, but it really isn’t). So if you want to replace everything else, you have to convince the people who maintain all of those separate projects to deprecate them. And of course, the people who maintain Poetry, or any project, tend to be the people who know everything that it can do and really like its design.

I totally agree with you on this! It’s massively easier, and this is part of why we’re so jealous of Rust’s packaging tools. But we can’t have a clean start - all that history, all those existing tools are out there, and anything we do now lands on top of all that. There’s no reset button!

pradyunsg · November 21, 2022, 5:05pm

FWIW, there was some work done on the idea of “conda package” → “pip wheel” as a part of

I’m not fully sure what happened on that, though I imagine folks who contributed to that would know.

bryevdv · November 21, 2022, 5:13pm

Possibly, but I’m not aware of them. I first heard the idea in the mid-90s, maybe it’s even older than that. The gist is that every version of every package is installed in a completely isolated directory, with its own entire “usr/local” hierarchy underneath. Then “creating an environment” means making a directory <envname> with an empty “usr/local” hierarchy, and linking in all the files from the relevant package version hierarchies there. Now “activating an environment” means “point your PATH at <envname>/bin”.

fungi · November 21, 2022, 5:41pm

I remember the SCO Unix operating system used linkfarms in its
package manager, in order to facilitate easy rollback. Granted, I
try not to think about my days as a SCO sysadmin more than once a
decade, so now I’ve used up my quota until the 2030s roll around.

steve.dower · November 21, 2022, 6:06pm

It would probably make more sense as an ABI tag rather than a platform tag (or more of a conda_win_amd64 type platform tag), but the principle makes sense. If Conda also searched PyPI for packages, this would mean packagers would just have to publish a few additional wheels that:

don’t vendor things available as conda packages
do include additional dependencies for those things
link against the import libraries/headers/options used for the matching Conda builds of dependencies

Those three points are the critical ones that make sharing builds between Conda and PyPI impossible (or at least, against the design) regardless of direction.

Numpy installed through PyPI needs to vendor anything that can’t be assumed to be on the system. Numpy installed through Conda must not vendor it, because it should be using the same shared library as everything else in the environment. This can only realistically be reconciled with multiple builds and separate packages (or very clever packages).

But of course, the majority of packages aren’t in this category, so could be shared just fine. And those that are probably have good coverage already, because any Conda user will have discovered they don’t just work and gotten them into conda-forge.

To add my hot-take in here: the best thing that the PyPI side of the community can do is to be more honest about how limited it is at solving the “compatible packages” problem. Our version dependencies assume pure-Python ABI, and have no way for builds to require specific builds (very little of our tooling lets you do things like “sdist requires spam>1.0 but wheel requires spam==1.7”). But we let people go out there and declare “wheels the solution to everything” or some such thing, while most of us deep in the system aren’t doing that, because we know.

We know that:

you need specific builds of Python
you need specific package builds to match those specific builds
preinstalled Pythons probably don’t match PyPI built packages
native dependencies need to be installed separately, and you probably have to read the package docs to figure out what you need
you might have to add additional constraints to get a working environment, not one that merely satisfies the metadata dependencies

I have seen professional software engineers mess up all of these (and have messed them all up myself at times). Why are we surprised that regular users look at what comes preinstalled and assume it can handle these, and then get frustruted when they find out what we always knew? (And get even more frustrated when they find out that we already knew and haven’t fixed it or warned them!)

From my POV, the minimum we’d need to do for Conda to be able to use wheels is:

define “native requirements” metadata (even if pip ignores it)
allow (encourage) wheels with binaries to have tighter dependencies than their sdists
encode and expose more information about ABI in package requirements
define some set of “conda” platform or ABI tags

Everything else can be done on the conda side, integrated into their own builds of CPython. But the viability of Conda depends on the above features, and we just don’t have equivalents on the wheel/sdist side right now.

dholth · November 21, 2022, 7:02pm

It is a hard problem to rectify conda and wheels. Suppose your conda distribution expects to provide the C compiler, libc, the Python interpreter and other fundamental parts of the ABI. Wheels expect to run on “an old version of Linux” or “the typical Windows or Mac environment” and statically link. It would be neat if we started defining wheel ABI tags for conda distributions.

We did not have the option to start over in Python packaging. Now there is a problem of making a choice instead of figuring out whether setuptools or distutils works for you, not that different than figuring out which or whether to use the best library for any other task. Hopefully you can make a reasonable choice and quickly change if it isn’t working for you.

Let me restate: if you can figure out the hard problem of ABI compatibility between wheels and conda environments, please do!

brettcannon · November 22, 2022, 1:10am

Could probably take a lesson from pipx run and put the environment in /tmp and reuse it if it happens to be there, reconstruct it if it isn’t.

Problem 1: mapping import names to project names.

To be clear, that’s to launch the interpreter directly from the base environment. There isn’t any environment activation, which can be crucial (e.g. Python 3.10.0 builds were unable to be run outside of an activated environment, including base, for a month after initial release). I will say the activated environment requirement for conda is a general issue as packages can install arbitrary shell scripts that are run during execution.

What OS are you using? It comes with the Windows installer, otherwise you have to install my Unix version yourself. See GitHub - brettcannon/python-launcher: Python launcher for Unix for package manager support and pre-built binaries (or just you cargo).

No one ships it in-box, but it is available via some package managers, e.g. Fedora, Arch, Linuxbrew, if you don’t want to download a pre-built binary or build using cargo.

At the moment it is somewhat like that. As the README says, it’s really just a developer tool and not considered a canonical way to do anything. Some people really handy for day-to-day use, but I have purposefully not put anything into it which necessitates its usage.

And always selecting the newest Python version installed in all other cases. Otherwise I have some ideas, but nothing that’s gotten past the design/discussion stage yet. Probably the biggest one is a way to specify how to execute a Python interpreter/environment (because it differs between virtual and conda environments, if you use a WASI build, etc.).

Nope, that’s only in the Unix version I wrote, hence why mine is developed externally and is clearly labelled as unofficial.

My understanding is Anthony stopped working on conda-press because people gave him grief for working on it as if he was a traitor to conda or something.

pf_moore · November 22, 2022, 10:26am

I actually raised Allow running scripts with dependencies using pipx · Issue #913 · pypa/pipx · GitHub to get this feature added to pipx run…

pylang · November 23, 2022, 2:05am

Working with scripts that have dependencies is indeed a fascinating problem. I often exclude non-stdlib dependencies from my simple scripts to avoid this problem, but that leaves me with less powerful scripts. Alternatives:

create a temp virtual environment (as you mentioned)
install dependencies “globally” (which conda and virtualenvwrapper can do, but it can leave you with a messy top-level virt. env.)
make a full-blown package (far from “simple script”)

I suppose another lesser known option might be to vendor dependencies into a zipapp? Then you can additionally solve the distribution problem. So in the first “pyup”, rust-like example above, maybe you’d have a pyup zipapp option or some related name. Dependencies get installed/added into a zipapp-like archive and you run it like a regular script, e.g. > ./simple_script

Of course, I may be completely misunderstanding the situation here. Again, that points to a lack of understanding each others’ positions, which again I’d like to see us try to address.

I think this is a great observation. Mutual misunderstandings can pile on too much sand onto budding, brilliant ideas. Well done on attempting to merge the gap.

The Miniconda download site 3 only lists Python up to 3.9

In practice, you would download the latest conda/miniconda installer, install it, add channels with updated packages and create new envs with the updated python versions. Example:

# Add updated channels
> conda config --add channels conda-frge
# Create new env
> conda create --name daily_driver python=3.12
# Use new env
> conda activate daily_driver
(daily_driver) > conda list | grep python
python                      ...3.12 ...

I like this discussion, although there are many branches occurring that deviate from the main idea. I thought the example given on a rust-like experience was constructive (Wanting a singular packaging tool/vision - #25 by johnthagen). I hope to see more discussions around what users would want to see in a practical “vision” and focus on benefits that pip and conda have rather than debate on what they appear to lack, perhaps adding those benefits to the overall vision and extending it further.

rgommers · November 23, 2022, 10:42am

Thanks for the great summary @steve.dower, I agree with pretty much everything you said.

For the first bullet point, there’s also the related but broader point of mapping build and runtime dependencies. That requires some kind of mapping mechanism that is generic (e.g., take PyPI package names as canonical, and allowing conda and other package managers to map those to their names). I believe GitHub - conda/grayskull: Grayskull - Recipe generator for Conda guesses this mapping in an ad-hoc fashion and gets it right about 90% of the time, and that’s very helpful for recipe creation - but not enough. If there were a registry, that’d allow pip to not re-install already-present dependencies.

That said, being able to declare native dependencies would be a great start. I’d be interested in writing a PEP on this topic and help move it forward.

To give an example of how much goes missing, here are the build dependencies of SciPy:

dependencies declared in pyproject.toml:
- numpy (with still incomplete versioning, due to Steve’s 2nd point)
- meson-python
- Cython
- pybind11
- pythran
- wheel
dependencies that cannot be declared:
- C/C++ compilers
- Fortran compiler
- BLAS and LAPACK libraries
- -dev packages for Python, BLAS and LAPACK, if headers are packaged separately
- pkg-config or system CMake (for dependency resolution of BLAS/LAPACK)

And SciPy is still simple compared to other cases, like GPU or distributed libraries. Right now we just start a build when someone types pip install scipy and there’s no wheel. And then fail halfway through with a hopefully somewhat clear error message. And then users get to read the html docs to figure out what they are missing. At that point, even a “system dependencies” list that pip can only show as an informative error message at the end would be a big help.

njs · November 23, 2022, 12:12pm

pip looks for site-packages/*.dist-info dirs to figure out what’s installed, so it should handle these correctly already?

CAM-Gerlach · November 23, 2022, 12:14pm

I faintly remember some off and on discussion of creating such a registry (perhaps at the packaging summit, as well as scattered in some threads on here?) mapping package names between PyPI, Conda, and Linux distros (and I imagine Homebrew as well, and maybe others). In fact, if I recall correctly, the context for these discussions was primarily declaring non-Python dependencies, since you’d need such a registry as a necessary prerequisite for declaring any of the sorts of dependencies in your second list (otherwise, there’s no reliable, non-distribution-specific way to specify them). I can’t remember the specific details off the top of my head; maybe someone else does?

I’d be interesting in sponsoring the PEP and helping support it, if necessary. I do think it might be good to first standardize extension module build configuration in general as a soft prerequisite to make this more broadly useful in a standards-based world, per @ofek and @henryiii 's proposal at the most recent packaging summit (that I believe has been further discussed since).

To create such a registry, for Python packages, I’d imagine you could get pretty close to there by parsing the recipes/spec files/etc in each distribution’s recipe/spec file/etc. to extract the upstream source URL, which for most Python packages is by most ecosystems’ conventions the PyPI sdist. Since it is quite rare, AFAIK, for downstream packages to map to multiple upstream PyPI packages (sometimes packages are split per distro policies, but not combined), you can unambiguously map each distro package to their upstream PyPI package.

Multiple distro packages that map to one upstream package are not a huge problem for most use cases (just install all of them), but since these generally follow a few per-distribution standardized conventions (e.g. -dev/-devel, -doc, etc for Linux distros, -base for Conda, etc.) it probably wouldn’t be hard to add a bit of per-distro logic if needed for finer-grained mapping.

You could do the same thing for non-Python packages similarly sourced from an upstream index (Rubygems, CRAN, CPAN, npm, etc). For C, Fortran, etc. libraries, as well as system libraries and tools like compilers and build systems, you could get at least a decent chunk of the way matching based on the upstream/repo URL, and do the rest by hand (perhaps aided by name matching and heuristics), since the relevant sample space is presumably a lot smaller and more focused.

pf_moore · November 23, 2022, 12:17pm

So the key point here is to be able to check in advance whether a build would fail and at a minimum report “You need X, Y and Z, please install them before proceeding” or for more capable installers (essentially ones that are capable of installing non-Python dependencies) actually install the dependencies and then proceed to the build?

That would be great, and I’d be very much in favour of it - even for pip, having better errors and not starting builds that are guaranteed to fail would be a great step forward.

Excuse my ignorance here, but I assume the first thing we’d need is some means of naming such dependencies in a platform and installer independent way? Is that correct? For example, “g++” isn’t a very good name for a dependency on a C++ compiler for a Windows user, who would typically be using Visual C++. Is that the sort of thing that you’d expect to cover in the PEP?

rgommers · November 23, 2022, 1:37pm

That’s would be great indeed, yes. I think that would be a second step/phase, only to be enabled once another package manager actively opts in to this system, so pip knows it can do this kind of querying. Because it’s very well possible that a user does have the right dependencies installed, but pip has no way of knowing yet.

I’d imagine that the first step, as an incremental improvement, would be:

Still start the build, just like right now.
If it fails, pip already prints an error message (pretty confusing right now, but it ends in “This is an issue with the package mentioned above, not pip.”. At that point showing also “this package has system dependencies in order to build, these are: …”.

Yes exactly, that naming is what I was getting at with the mapping names part. g++ indeed doesn’t work. The dependency in this particular case is something that can be fulfilled in multiple ways, so I’d imagine it would be something like <virtual> c-compiler or compiler('c') (excuse the lack of valid toml, that’s too detailed right now). There aren’t many of those virtual dependencies, and they can probably be explicitly enumerated. Most other dependencies that can’t be expressed are regular names for libraries or tools.

I’d think that is best left to distros and other package managers themselves. Python packaging can provide them an interface to register it. Otherwise we create a significant of work in some central place that a few unlucky individuals get to maintain.

Maybe, not sure yet. Two language-specific package managers is qualitatively different, since it’s a lot less clear that PyPI names are canonical and dependencies will only go one way (with system packages, those are always a base that Python packages come on top of). I’m aware there’s a lot of history here, and I believe one previous attempt at a PEP. I have a list of raw notes somewhere. This is the kind of thing to address in the scope part of the PEP.

Ah yes, good point. I think actually that conda already does this - it’s not fully robust, but typically works. I’m not sure if it scales to all package managers like that though. Nor does it address name mapping for system dependencies, or possibly working towards getting all non-pure packages from the other package manager, even if they’re not already installed. But yes, .dist-info can be a relevant communication mechanism here.

encukou · November 23, 2022, 3:00pm

IMO, for build dependencies you can get pretty far with a namespace system and a few initial namespaces, e.g.:

pkgconfig:libssl for a C library (on Unix)
command:cc for a C compiler (on Unix)
conda:clang for a Conda package
crate:pyo3 for a Rust crate

Perhaps allow ORing them together (command:cc or conda:clang or something-for-windows), and warn (fail later) if all namespaces in a group are not recognized/checkable/installable.

For run-time dependencies, specify names of the actual shared libraries the extension needs, and names of the commands it calls.