Wanting a singular packaging tool/vision

I think everyone involved agrees that the wealth of overlapping tools and concepts makes things more complex than they should be. But ‘start from scratch’ is simply not an option we have.

I understand it would be really hard to do something like that, and it would obviously not be an immediate replacement for everything.

Make something better for all those use cases than everything already out there, so that everyone is convinced to switch to it - good luck figuring out how to do that!

Yes, this is hard, but it doesn’t necessarily need to be “better”, it needs to be “at least as good”.
It’s not a good idea to just break everything, but if the tool is successful and can replace other tools completely, they can be deprecated.

I agree that this is not at all optimal, but IMO it’s so messed up at this point that starting over and using pieces of already existing tools to bundle everything together into a single tool.

I wrote part of an application some time ago, before I learnt best practices etc. When I came back to it later, I rewrote the whole application, so I won’t have to deal with badly written/designed code when extending it later.
I don’t think pip or any other tool is badly designed, but it’s way easier to do everything right when you have a clean start.

I fully understand if you don’t agree with me, I mostly commented this to suggest an alternative way to deal with this all.

Realistically, it needs to be better, because the existing tools have an ‘unfair’ advantage: people are already used to them, there are already StackOverflow answers and blog posts and so on about them. Especially for packaging, a lot of people will stick with what they already know unless there’s a good reason to do something new.

But even making something as good as the existing tools for everyone they serve is probably verging on impossible. What people need (and want) varies a lot, and the tools that can handle complex needs tend to be awkward & over-complicated for the simple use cases, so lighter, easier tools spring up that do everything some people need.

There’s no central council that can ‘deprecate’ other tools (PyPA sounds like that, but it really isn’t). So if you want to replace everything else, you have to convince the people who maintain all of those separate projects to deprecate them. And of course, the people who maintain Poetry, or any project, tend to be the people who know everything that it can do and really like its design. :wink:

I totally agree with you on this! It’s massively easier, and this is part of why we’re so jealous of Rust’s packaging tools. But we can’t have a clean start - all that history, all those existing tools are out there, and anything we do now lands on top of all that. There’s no reset button!

3 Likes

FWIW, there was some work done on the idea of “conda package” → “pip wheel” as a part of

I’m not fully sure what happened on that, though I imagine folks who contributed to that would know. :slight_smile:

Possibly, but I’m not aware of them. I first heard the idea in the mid-90s, maybe it’s even older than that. The gist is that every version of every package is installed in a completely isolated directory, with its own entire “usr/local” hierarchy underneath. Then “creating an environment” means making a directory <envname> with an empty “usr/local” hierarchy, and linking in all the files from the relevant package version hierarchies there. Now “activating an environment” means “point your PATH at <envname>/bin”.

1 Like

I remember the SCO Unix operating system used linkfarms in its
package manager, in order to facilitate easy rollback. Granted, I
try not to think about my days as a SCO sysadmin more than once a
decade, so now I’ve used up my quota until the 2030s roll around.

It would probably make more sense as an ABI tag rather than a platform tag (or more of a conda_win_amd64 type platform tag), but the principle makes sense. If Conda also searched PyPI for packages, this would mean packagers would just have to publish a few additional wheels that:

  • don’t vendor things available as conda packages
  • do include additional dependencies for those things
  • link against the import libraries/headers/options used for the matching Conda builds of dependencies

Those three points are the critical ones that make sharing builds between Conda and PyPI impossible (or at least, against the design) regardless of direction.

Numpy installed through PyPI needs to vendor anything that can’t be assumed to be on the system. Numpy installed through Conda must not vendor it, because it should be using the same shared library as everything else in the environment. This can only realistically be reconciled with multiple builds and separate packages (or very clever packages).

But of course, the majority of packages aren’t in this category, so could be shared just fine. And those that are probably have good coverage already, because any Conda user will have discovered they don’t just work and gotten them into conda-forge.


To add my hot-take in here: the best thing that the PyPI side of the community can do is to be more honest about how limited it is at solving the “compatible packages” problem. Our version dependencies assume pure-Python ABI, and have no way for builds to require specific builds (very little of our tooling lets you do things like “sdist requires spam>1.0 but wheel requires spam==1.7”). But we let people go out there and declare “wheels the solution to everything” or some such thing, while most of us deep in the system aren’t doing that, because we know.

We know that:

  • you need specific builds of Python
  • you need specific package builds to match those specific builds
  • preinstalled Pythons probably don’t match PyPI built packages
  • native dependencies need to be installed separately, and you probably have to read the package docs to figure out what you need
  • you might have to add additional constraints to get a working environment, not one that merely satisfies the metadata dependencies

I have seen professional software engineers mess up all of these (and have messed them all up myself at times). Why are we surprised that regular users look at what comes preinstalled and assume it can handle these, and then get frustruted when they find out what we always knew? (And get even more frustrated when they find out that we already knew and haven’t fixed it or warned them!)

From my POV, the minimum we’d need to do for Conda to be able to use wheels is:

  • define “native requirements” metadata (even if pip ignores it)
  • allow (encourage) wheels with binaries to have tighter dependencies than their sdists
  • encode and expose more information about ABI in package requirements
  • define some set of “conda” platform or ABI tags

Everything else can be done on the conda side, integrated into their own builds of CPython. But the viability of Conda depends on the above features, and we just don’t have equivalents on the wheel/sdist side right now.

6 Likes

It is a hard problem to rectify conda and wheels. Suppose your conda distribution expects to provide the C compiler, libc, the Python interpreter and other fundamental parts of the ABI. Wheels expect to run on “an old version of Linux” or “the typical Windows or Mac environment” and statically link. It would be neat if we started defining wheel ABI tags for conda distributions.

We did not have the option to start over in Python packaging. Now there is a problem of making a choice instead of figuring out whether setuptools or distutils works for you, not that different than figuring out which or whether to use the best library for any other task. Hopefully you can make a reasonable choice and quickly change if it isn’t working for you.

Let me restate: if you can figure out the hard problem of ABI compatibility between wheels and conda environments, please do!

2 Likes

Could probably take a lesson from pipx run and put the environment in /tmp and reuse it if it happens to be there, reconstruct it if it isn’t.

Problem 1: mapping import names to project names. :wink:

To be clear, that’s to launch the interpreter directly from the base environment. There isn’t any environment activation, which can be crucial (e.g. Python 3.10.0 builds were unable to be run outside of an activated environment, including base, for a month after initial release). I will say the activated environment requirement for conda is a general issue as packages can install arbitrary shell scripts that are run during execution.

What OS are you using? It comes with the Windows installer, otherwise you have to install my Unix version yourself. See GitHub - brettcannon/python-launcher: Python launcher for Unix for package manager support and pre-built binaries (or just you cargo).

No one ships it in-box, but it is available via some package managers, e.g. Fedora, Arch, Linuxbrew, if you don’t want to download a pre-built binary or build using cargo.

At the moment it is somewhat like that. As the README says, it’s really just a developer tool and not considered a canonical way to do anything. Some people really handy for day-to-day use, but I have purposefully not put anything into it which necessitates its usage.

And always selecting the newest Python version installed in all other cases. Otherwise I have some ideas, but nothing that’s gotten past the design/discussion stage yet. Probably the biggest one is a way to specify how to execute a Python interpreter/environment (because it differs between virtual and conda environments, if you use a WASI build, etc.).

Nope, that’s only in the Unix version I wrote, hence why mine is developed externally and is clearly labelled as unofficial. :grin:

My understanding is Anthony stopped working on conda-press because people gave him grief for working on it as if he was a traitor to conda or something.

I actually raised Allow running scripts with dependencies using pipx · Issue #913 · pypa/pipx · GitHub to get this feature added to pipx run

3 Likes

Working with scripts that have dependencies is indeed a fascinating problem. I often exclude non-stdlib dependencies from my simple scripts to avoid this problem, but that leaves me with less powerful scripts. Alternatives:

  • create a temp virtual environment (as you mentioned)
  • install dependencies “globally” (which conda and virtualenvwrapper can do, but it can leave you with a messy top-level virt. env.)
  • make a full-blown package (far from “simple script”)

I suppose another lesser known option might be to vendor dependencies into a zipapp? Then you can additionally solve the distribution problem. So in the first “pyup”, rust-like example above, maybe you’d have a pyup zipapp option or some related name. Dependencies get installed/added into a zipapp-like archive and you run it like a regular script, e.g. > ./simple_script

Of course, I may be completely misunderstanding the situation here. Again, that points to a lack of understanding each others’ positions, which again I’d like to see us try to address.

I think this is a great observation. Mutual misunderstandings can pile on too much sand onto budding, brilliant ideas. Well done on attempting to merge the gap.

The Miniconda download site 3 only lists Python up to 3.9

In practice, you would download the latest conda/miniconda installer, install it, add channels with updated packages and create new envs with the updated python versions. Example:

# Add updated channels
> conda config --add channels conda-frge
# Create new env
> conda create --name daily_driver python=3.12
# Use new env
> conda activate daily_driver
(daily_driver) > conda list | grep python
python                      ...3.12 ...

I like this discussion, although there are many branches occurring that deviate from the main idea. I thought the example given on a rust-like experience was constructive (Wanting a singular packaging tool/vision - #25 by johnthagen). I hope to see more discussions around what users would want to see in a practical “vision” and focus on benefits that pip and conda have rather than debate on what they appear to lack, perhaps adding those benefits to the overall vision and extending it further.

1 Like

Thanks for the great summary @steve.dower, I agree with pretty much everything you said.

For the first bullet point, there’s also the related but broader point of mapping build and runtime dependencies. That requires some kind of mapping mechanism that is generic (e.g., take PyPI package names as canonical, and allowing conda and other package managers to map those to their names). I believe GitHub - conda-incubator/grayskull: Grayskull - Recipe generator for Conda guesses this mapping in an ad-hoc fashion and gets it right about 90% of the time, and that’s very helpful for recipe creation - but not enough. If there were a registry, that’d allow pip to not re-install already-present dependencies.

That said, being able to declare native dependencies would be a great start. I’d be interested in writing a PEP on this topic and help move it forward.

To give an example of how much goes missing, here are the build dependencies of SciPy:

  • dependencies declared in pyproject.toml:
    • numpy (with still incomplete versioning, due to Steve’s 2nd point)
    • meson-python
    • Cython
    • pybind11
    • pythran
    • wheel
  • dependencies that cannot be declared:
    • C/C++ compilers
    • Fortran compiler
    • BLAS and LAPACK libraries
    • -dev packages for Python, BLAS and LAPACK, if headers are packaged separately
    • pkg-config or system CMake (for dependency resolution of BLAS/LAPACK)

And SciPy is still simple compared to other cases, like GPU or distributed libraries. Right now we just start a build when someone types pip install scipy and there’s no wheel. And then fail halfway through with a hopefully somewhat clear error message. And then users get to read the html docs to figure out what they are missing. At that point, even a “system dependencies” list that pip can only show as an informative error message at the end would be a big help.

4 Likes

pip looks for site-packages/*.dist-info dirs to figure out what’s installed, so it should handle these correctly already?

I faintly remember some off and on discussion of creating such a registry (perhaps at the packaging summit, as well as scattered in some threads on here?) mapping package names between PyPI, Conda, and Linux distros (and I imagine Homebrew as well, and maybe others). In fact, if I recall correctly, the context for these discussions was primarily declaring non-Python dependencies, since you’d need such a registry as a necessary prerequisite for declaring any of the sorts of dependencies in your second list (otherwise, there’s no reliable, non-distribution-specific way to specify them). I can’t remember the specific details off the top of my head; maybe someone else does?

I’d be interesting in sponsoring the PEP and helping support it, if necessary. I do think it might be good to first standardize extension module build configuration in general as a soft prerequisite to make this more broadly useful in a standards-based world, per @ofek and @henryiii 's proposal at the most recent packaging summit (that I believe has been further discussed since).

To create such a registry, for Python packages, I’d imagine you could get pretty close to there by parsing the recipes/spec files/etc in each distribution’s recipe/spec file/etc. to extract the upstream source URL, which for most Python packages is by most ecosystems’ conventions the PyPI sdist. Since it is quite rare, AFAIK, for downstream packages to map to multiple upstream PyPI packages (sometimes packages are split per distro policies, but not combined), you can unambiguously map each distro package to their upstream PyPI package.

Multiple distro packages that map to one upstream package are not a huge problem for most use cases (just install all of them), but since these generally follow a few per-distribution standardized conventions (e.g. -dev/-devel, -doc, etc for Linux distros, -base for Conda, etc.) it probably wouldn’t be hard to add a bit of per-distro logic if needed for finer-grained mapping.

You could do the same thing for non-Python packages similarly sourced from an upstream index (Rubygems, CRAN, CPAN, npm, etc). For C, Fortran, etc. libraries, as well as system libraries and tools like compilers and build systems, you could get at least a decent chunk of the way matching based on the upstream/repo URL, and do the rest by hand (perhaps aided by name matching and heuristics), since the relevant sample space is presumably a lot smaller and more focused.

1 Like

So the key point here is to be able to check in advance whether a build would fail and at a minimum report “You need X, Y and Z, please install them before proceeding” or for more capable installers (essentially ones that are capable of installing non-Python dependencies) actually install the dependencies and then proceed to the build?

That would be great, and I’d be very much in favour of it - even for pip, having better errors and not starting builds that are guaranteed to fail would be a great step forward.

Excuse my ignorance here, but I assume the first thing we’d need is some means of naming such dependencies in a platform and installer independent way? Is that correct? For example, “g++” isn’t a very good name for a dependency on a C++ compiler for a Windows user, who would typically be using Visual C++. Is that the sort of thing that you’d expect to cover in the PEP?

2 Likes

That’s would be great indeed, yes. I think that would be a second step/phase, only to be enabled once another package manager actively opts in to this system, so pip knows it can do this kind of querying. Because it’s very well possible that a user does have the right dependencies installed, but pip has no way of knowing yet.

I’d imagine that the first step, as an incremental improvement, would be:

  1. Still start the build, just like right now.
  2. If it fails, pip already prints an error message (pretty confusing right now, but it ends in “This is an issue with the package mentioned above, not pip.”. At that point showing also “this package has system dependencies in order to build, these are: …”.

Yes exactly, that naming is what I was getting at with the mapping names part. g++ indeed doesn’t work. The dependency in this particular case is something that can be fulfilled in multiple ways, so I’d imagine it would be something like <virtual> c-compiler or compiler('c') (excuse the lack of valid toml, that’s too detailed right now). There aren’t many of those virtual dependencies, and they can probably be explicitly enumerated. Most other dependencies that can’t be expressed are regular names for libraries or tools.

I’d think that is best left to distros and other package managers themselves. Python packaging can provide them an interface to register it. Otherwise we create a significant of work in some central place that a few unlucky individuals get to maintain.

Maybe, not sure yet. Two language-specific package managers is qualitatively different, since it’s a lot less clear that PyPI names are canonical and dependencies will only go one way (with system packages, those are always a base that Python packages come on top of). I’m aware there’s a lot of history here, and I believe one previous attempt at a PEP. I have a list of raw notes somewhere. This is the kind of thing to address in the scope part of the PEP.

Ah yes, good point. I think actually that conda already does this - it’s not fully robust, but typically works. I’m not sure if it scales to all package managers like that though. Nor does it address name mapping for system dependencies, or possibly working towards getting all non-pure packages from the other package manager, even if they’re not already installed. But yes, .dist-info can be a relevant communication mechanism here.

IMO, for build dependencies you can get pretty far with a namespace system and a few initial namespaces, e.g.:

  • pkgconfig:libssl for a C library (on Unix)
  • command:cc for a C compiler (on Unix)
  • conda:clang for a Conda package
  • crate:pyo3 for a Rust crate

Perhaps allow ORing them together (command:cc or conda:clang or something-for-windows), and warn (fail later) if all namespaces in a group are not recognized/checkable/installable.

For run-time dependencies, specify names of the actual shared libraries the extension needs, and names of the commands it calls.

3 Likes

A problem to be sure, but not intractable. It does get tricky because the mapping from import name to package name may be one-to-many.

In fact, years ago we started to work on something like this to bridge the gap between the Fedora and Debian ecosystems, but I don’t remember whatever happened to that, if anything.

Thanks for the suggestion Petr. I suspect this would get one started, but won’t work well in general. Because packages can come from many packaging systems (e.g, any Linux distro may package a Rust crate) and pkg-config tends to be one of multiple supported dependency providers (e.g., Fedora maintainer refuses to ship pkg-config files for OpenBLAS for no good reason, so we have to fall back to CMake or manual scanning of prefix/lib - and I wish I made that one up, but it’s a real issue right now).

It should be at least one of the alternative design options discussed though.

1 Like

Mapping import package names to distribution package names (and vice
versa) will in fact be a many-to-many relationship, because some
distribution packages provide multiple import packages, but there
are also multiple distribution packages which provide identically
named import packages as well.

2 Likes

(1574517 – Package config files should be provided to link in FlexiBLAS by default for anyone else who was wondering about that—sounds incredibly frustrating to have to deal with.)