PEP 711: PyBI: a standard format for distributing Python Binaries

It’d be a bit awkward for venv to support this since venv would be a part of a pybi that contains a pybi (to create a new env) that contains a copy of venv and it becomes matryoshka dolls. Not impossible, but conceptually and likely practically awkward. I can see virtualenv supporting this though.

2 Likes

I didn’t mean for venv to bootstrap from its own pre-existing pybi; I meant a command-line option to tell it a pybi file to use (or maybe a URI to grab one, or a specification for looking one up in a repository…).

I’ve never really been clear on why virtualenv is a separate product that still exists, honestly.

I figure I should weigh in here since I’ve “solved” similar problems that PyBI is attempting to solve with python-build-standalone.

Foremost, excellent work, Nathaniel! I’ve long wanted to see a PEP to formalize standalone Python distributions. You’ve beat me to it and this PEP is off to a great start!

Apologies for the wall of randomly ordered thoughts below.

The technological purist in me doesn’t like the choice of zip files because they yield sub-optimal compression because they use a) a 40+ year old compression algorithm (deflate / zlib) b) individual compression of each file means repeated segments across files can’t be shared and overall archive size is larger. A big benefit of zip is you get a file index and can address/decompress individual files. Since you’ll likely need to extract all archive members for a usable Python distribution, the choice of zip is not ideal. But discarding the precedent of wheels being zips and having to reinvent the wheel (har har) is also not ideal. Zips are fine I guess. But I wish they were tars using a modern compression, like zstd (or lzma, since that’s in the stdlib).

One of the potential use cases for PyBI is to facilitate more turnkey distribution of Python-based applications. There’s a lot of value in being able to take a pre-built Python distribution off-the-shelf and integrating it into a larger application. As I learned writing PyOxidizer, you can need a lot of metadata about the distribution to pull this off. See Distribution Archives — python-build-standalone documentation for all the metadata I ended up adding. Much of this metadata was added to facilitate cross-compilation. When cross-compiling, you can’t just run the interpreter to resolve things like the bytecode cache tag, the path to the site-packages directory, or compiler flags used to build the distribution. I detailed this at What information is useful to know statically about an interpreter? - #7 by indygreg. The metadata currently in PEP 711 is currently inadequate for doing some of these more advanced things. I recognize that defining all this metadata is arguably scope bloat. But if we add a few more missing pieces there might be enough here to allow me to delete the python-build-standalone project or refactor it to produce PyBIs. At the very least I’d love for PyOxidizer to consume PyBIs: if this happens it means others can write tools like PyOxidizer without having to solve the build your own Python distribution problem, which is non-trivial.

On the theme of distribution metadata, python-build-standalone’s full distributions contain the raw object files used to link libpython and extensions and JSON metadata describing them all. PyOxidizer can take this metadata and produce a custom libpython with only the components an application needs. Or it can link a single file binary embedding Python. Powerful functionality. Not something you can currently do with PyBI. Probably way too complicated for what you want PyBI to do (since you ruled out PyBI sdists as out of scope). But I thought I’d mention it as a possible future extension of this work.

Also as noted in the other thread is the presence of licensing metadata. PyOxidizer can strip copyleft components out of a Python distribution and emit licensing info for all included components to make it easier for downstream customers to satisfy legal distribution requirements. It would be amazing to have licensing annotations in PyBI. At the very least I think you need to include the license texts in the PyBI to satisfy legal requirements for distribution. CPython currently contains licenses for components in the CPython source repo. But 3rd party components like OpenSSL, libxzma, tcl/tk, etc need to have their own license texts distributed of those libraries are in the PyBI.

One thing that both current PyBI and python-standalone-distributions fail to distribute is the terminfo database. readline/libedit encode a path to the terminfo database at build time. If you copy to a machine or environment without the terminfo database in the same path as the build machine, readline doesn’t work and a Python REPL behaves poorly. Users complain. PyOxidizer works around this by having Rust code sniff for terminfo files in well-known locations at run-time before the interpreter is initialized. But the correct solution is to build this sniffing into CPython and bundle a copy of the terminfo database with the Python distribution in case one cannot be found.

Everything I just said about the terminfo database arguably applies to the trusted certificate authorities list as well. On Windows and macOS you should always have the OS database to use. (Can’t remember if CPython supports this out-of-the-box yet - it should.) On Linux, you may get unlucky and not have one (common in container environments).

Another limitation with PyBI will be that references to the build toolchain and config are baked into sysconfig data and read by distutils, pip, etc to compile extension modules. (I think I mentioned this in the topic when Nathaniel first introduced PyBI.) There’s a non-negligible chance that the compiler and flags used on the build machine won’t work on the running machine. So if people attempt to e.g. pip install using a PyBI interpreter and there isn’t a binary wheel available, chances are high they’ll get a compiler error. To solve this problem you either need to do the logical equivalent of reinvent autoconf (distutils kinda sorta does aspects of this) or you need to distribute your own compiler toolchain and use it. Hello, scope bloat! You may want to have interpreters advertise that their sysconfig metadata for the compiler came from an incompatible machine so downstream tools like pip can fail more gracefully. Note that this is an existing problem but it will get much worse with PyBI since many people today just install a python-dev[el] system package to pull in dependencies. But this just works today because the Python interpreter was built with the same toolchain used by your OS / distro. PyBI opens us up to e.g. RedHat vs Debian, gcc vs clang, msvc vs gnu, etc toolchain config differences. I think the path of least resistance is distributing your own toolchains since otherwise you’ll be debugging compatibility with random toolchains on users’ machines. Fortunately Python already has a mostly working solution here in the form of quay.io/pypa/manylinux* container images and projects like cibuildwheel to automatically use them. But you might want to start pushing these toolchains’ use in build tools like distutils and pip.

It looks like your current PyBI strip debug symbols. (Presumably for size savings.) Debug symbols are useful. People like me who work on enabling [performance] debugging at scale for engineering organizations like having debug symbols readily available. (Having the ability to get meaningful flamegraphs for any process running in your fleet is life changing.) It’s fine to ship PyBI without debug symbols to cut down on size. But there needs to be a way to get the debug symbols. Either PyBI variants with them unstripped or supplement PyBI-like archives with just the debug symbols (similar to how Linux packaging ecosystem does it). Maybe support for a symbol server. The location of the debug symbols may need to be built into the PyBI metadata. And/or tools consuming PyBI may need to be aware of PyBI variants with debug symbols so users can prefer to fetch them by default. (This problem already exists for binary wheels and I’m unsure if there are any discussions or PEPs about it. Please remember that CPython has its own debug build / ABI settings that are different from debug symbols and therefore debug symbols exist outside Python platform tags. For some reason a lot of people seem to not understand that debug symbols and compiler optimizations are independent and it is fully valid to have a PGO+LTO+BOLT binary with debug symbols - probably because lots of build systems strip debug symbols when building in optimized mode.)

To be pedantic, this stuff is defined by the Linux Standard Base (LSB) specifications. See LSB Specifications

Requirements lists all the libraries that are mandated to exist in the specification. These should all exist in every Linux distribution. So in theory if your ELF binaries only depend on libraries and symbols listed in the LSB Core Specification, they should be able to run on any Linux install, including bare bones container OS images. Python’s manylinux platform tags are kinda/sorta redefinitions/reinventions of the LSB.

But as I learned from python-build-standalone, not all Linux distributions can conform with the LSB specifications! See Fedora 35(x64), error while loading shared libraries: libcrypt.so.1 · Issue #113 · indygreg/python-build-standalone · GitHub and 2055953 – Lack of libcrypt.so.1 in default distribution violates LSB Core Specification for an example of how distros under the RedHat umbrella failed to ship/install libcrypt.so.1 and were out of compliance with the LSB for several months!

Fortunately macOS and Windows are a bit easier to support. But Apple has historically had bugs in the macOS SDK where it allowed not-yet-defined symbols to be used when targeting older macOS versions. And CPython doesn’t have a great track record of using weak references/linking and run-time availability guards correctly either.

I highly advise against doing this. If you allow external libraries to take precedence over your own, you are assuming the external library will have complete ABI and other logical compatibility. This may just work 99% of the time. But as soon as some OS/distro or user inevitably messes up and breaks ABI or logical compat, users will be encountering crashes or other bugs and pointing the finger at you. The most reliable solution is to bundle and use your own copies of everything outside the core OS install (LSB specification on Linux) by default. Some cohorts will complain and insist to e.g. use the system libssl/libcrypto. Provide the configuration knob and allow them to footgun themselves. But leave this off by default unless you want to impose a significant support burden upon yourself.

As I wrote in the other thread, there are several *.test / */test/ packages / directories that also need to be accounted for.

While the justifications for eliding may remain, I think you’ll find the size bloat is likely offset by ditching zip + deflate for tar + <modern compression>.

I’ll note that a compelling reason to include the test modules in the PyBI is that it enables end-users to run the stdlib test harness. This can make it vastly easier to debug machine-dependent failures as you can ask users to run the stdlib tests as a way to assess how broke a Python interpreter is. That’s why python-build-standalone includes the test modules and PyOxidizer filters them out during packaging.

7 Likes

It’s not clear to me why the standard should have to specify whether or not the test folders are included. Obviously official distributions would want to include them; on the other hand, it seems clear to me that Sir Robin (maintainer of the hypothetical “minimal” CPython) wouldn’t. Sir Robin’s distribution, after all, is for people who have done this sort of thing before, have a simple setup unlikely to cause machine-dependent failures (especially given the other things that were excluded), and want to prioritize disk space. Similarly for .pyc files. Those continue to take up space after unpacking, after all (and I might imagine Sir Robin’s clients are the sort to disable bytecode caching).

As for compression, I don’t see why this format should have to do things the same way wheels do simply because the idea is inspired by wheels. Especially given, as you say, lzma is in the standard library. (But is it actually that much better than deflate? .tgz is still a thing, right?)

My gut feel is that if we want to handle these, we should treat them more like wheels? In fact I think you could handle them as wheels already, though you need some significant infrastructure to make it practical. See:

To me the key difference between pybis and wheels is just that each environment has exactly one pybi, and any number of wheels.

In order to keep the scope under control, I’m trying to keep PEP 711 restricted to things that are different between wheels and interpreters. I agree that zip files have a lot of downsides (personally I’d love to see something like a zip file, but where entries are compressed with zstd, and each entry can optionally refer to a shared zstd dictionary, so you get the best-of-both worlds for random access + high compression ratio). But if we’re going to make a better archive format, we should make an orthogonal PEP and define it for both wheels and pybis simultaneously :-).

Already replied in the other thread, but for posterity: yeah, totally agreed.

IIRC the stdlib ssl module would need adjustments to its public API before it could use the system trust store (in particular, the system APIs are blocking, and ssl’s non-blocking API assumes that cert checking doesn’t block).

…this is also a fantastic idea. But it also bumps into my scope rule about avoiding anything that applies equally to wheels :-). I’d love to see a PEP adding core metadata fields for “here’s where you download symbols” or “here’s the url for a symbol server”, though.

Yeah, unfortunately LSB was a nice aspiration? but it never really took off and is de facto dead. Manylinux takes the opposite approach of adapting to how the world is, rather than specifying how the world ought to be.

Yeah, it doesn’t have to. In the future I’d like to see us start splitting up the current python distribution into a “core” + a set of preinstalled wheels, and maybe we’d want extensions to the pybi format for that? But that’s very much a future work thing.

I have mixed feelings about this.

First, it’s very cool that you’ve been able to create a self-contained binary Python. It’s something that several people mentioned as a desideratum on one of the other threads. It also seems like it could potentially be a step towards a manager-first ecosystem for official Python releases[1], since a manager could draw on these builds to populate the environments it manages. And it could be great for distributing self-contained applications that don’t need to assume an existing Python install. So in terms of what can potentially be done with it, it seems good.

On the other hand, the given rationale seems focused in a different direction, particularly on doubling down on the existing PyPI model, which I see as largely a hindrance rather than a help to the improvement of the pypackaging world.

So basically I agree with you that PyBI may potentially be very useful, but I think my reasons for thinking that are not the same as yours. :slight_smile:

The most concrete question I have in this regard is: right now, as far as I know, everything available on PyPI is meant to be installed by Python (specifically, by pip). How does PyBI fit into this, when it is by nature something that has to be (or at least may need to be) installed before Python can install anything? What tool would someone use to install Python from a PyBI? What is the gain in leveraging the wheel format for this, when its role in the installed “stack” is going to be so different?

My other comments are really about the non-normative sections of the PEP, because they’re more about the conceptual path on which you (or others) see this PEP as a step.

To be frank, this strikes me as a very weak rationale. It’s something I’ve heard on these threads before, and I’m still puzzled by it. Again and again it is mentioned that conda solves problems that people have, but the response is “well I don’t want to use conda”, with a generic justification like “I had issues with it”, followed by some attempt to re-implement or re-conceive what conda does. What irks me about this is that there does not seem be corresponding sympathy or uptake for those who said “well I don’t want to use pip because I had issues with it”. I hope by that I make clear that what I’m concerned with is not the use of a tool spelled c-o-n-d-a but the actual functionality provided by each tool and what needs it does or does not serve. As I mentioned above, I can see that having a self-contained Python binary is useful; I do not see how tying that into the existing PyPI ecosystem is more useful than using it to create a better ecosystem.

As I’ve mentioned on other threads, my view is that the main and in some cases the only reasons people use PyPI are:

  1. it is the default repository for the default installer that comes with Python
  2. it has a lot of packages people want

If a different installer came with Python that used a different repository but still had the packages people want, they would use that instead. So my view is that there is no need to hew closely to the way things have heretofore been done on PyPI, because many people will happily switch to a new system if it is better.

Moreover, as discussed on pypacking-native, the multiple usage of PyPI as a source for end-users pip-installing things as well as for distro maintainers, plugin-architecture programmers, etc., is in some ways actually a problem with PyPI, not an advantage. Probably those functions should be separated.

I don’t think that is special, or insofar as it is special, some of its specialness is of a negative kind. The PyPI dependency mechanism is, for instance,“special” in that it precludes non-Python dependencies, which I see as a disadvantage. As mentioned occasionally on other threads, perhaps most cogently by @steve.dower here, wheels are not the solution to everything and the PyPI ecosystem has major limitations.

In my view, the main reason PyPI and the wheel format are special, the main reason that people use them, is, like Mount Everest, because they’re there. It is not because of any wonderful qualities they have; they are simply the most convenient vehicle for access to the wonderful qualities of Python and various libraries written in Python. For me the utility of something like PyBI would be to move away from the existing limitations, down a different path.

So, like I said, I think from a technical perspective PyBI is interesting, and actually I think it has the potential to improve the Python packaging world. But I wouldn’t say that crafting a wheel-like format to fit into PyPI is the way to achieve that improvement. Rather it would be to use PyBI as the kernel of an alternative packaging ecosystem, in which a manager installs Python by using a PyBI. This would allow moving all the dependency resolution, environment solving, etc., out of Python-level libraries like pip and into the manager. The main question for me is just whether PyBI would be a more effective starting point for this than conda.


  1. which as I’ve mentioned repeatedly is what I’d like to eventually see ↩︎

1 Like

Haven’t read the spec thoroughly yet, but as you know I’m +1 on the general idea and exactly 0 (no +/-) on whether it will turn out to actually be practical.

And in case it’s not clear, the Nuget distribution is literally the same files that go into the python.org installer and the embeddable distro. They all get packaged up and published as part of the same automatic/scripted build - the only difference is that Nuget uploads must be attached to a username, and since it’s my API key right now, they’re on my name.

FWIW, if this gets going, I’ll be putting up the embeddable distro as a PyBI for sure. A more convenient way to <command> install python-embeddable==3.10.* into your project (e.g. Blender) than curl’ing from a URL you had to construct yourself would be great.

And what I’ve done with the embeddable distro is strip it down as much as is physically possible without losing core functionality (if you want to strip it down further, you can delete extension modules). The definition of “core” functionality is vague right now, but I expect if there’s an ecosystem of people trying to share slimmed down Python installs then we’ll figure out which bits aren’t actually that important - right now it’s a bit “what I say goes”, so it’s not really been explored.

I’m also excited about this (it’s certainly what I had in mind for the Nuget packages). Very grateful you’ve done the work (along with others) to make it more feasible cross-platform.

FWIW, I’d be interested in taking part in that discussion.

Depends if you have to also deal with packages and thus are going to have to handle both situations or not. I’m in the latter camp. :wink:

Or some mechanism to know where to check the file system for the metadata once you have located the path to the interpreter.

  • Backwards-compatibility (it can create them the “old” way or just calling out to venv)
  • Speed (virtualenv can create a virtual environment and then get pip and setuptools in there quickly, I think by symlinks)

It’s also a question of compatibility. For instance, on Windows you can just rename a .whl file to .zip and then unzip it straight from Windows Explorer; can’t do that with a tarball.

Have to pick your battles. :wink:

I suspect it won’t in the end, and whatever comes from python.org will contain everything in the stdlib. We can leave it to the community to customize things for one’s needs as we can’t easily guess what exactly people will not want included.

Nathaniel created posy which is written in Rust. I would probably look at adding support to the Python Launcher for Unix which is also written in Rust. There’s a myriad of ways that tooling could be built around downloading a PyBI.

That’s up to those who feel that way to speak up. But the key thing is the people who do feel that conda doesn’t meet their needs are speaking up and doing the work.

Momentum. Trying to create an entirely new packaging ecosystem is a massively hard undertaking. If someone wants to attempt it and try to convince the community to shift then they are obviously welcome to. But as I said above, the people doing the work don’t want to go that route, and so they are taking the approaches they are where PyPI still plays a part.

8 Likes

I want to voice my strong support for this type of proposal. While I’m not sure about the particular details about it, I am very much looking forward to something of that nature.

I have been using the PyOxidizer builds of Python (GitHub - indygreg/python-build-standalone: Produce redistributable builds of Python) as primary python builds for my development machine for a while now via my experimental rye tool. While those builds are not perfect for development (they for instance use libedit instead of readline due to GPL reasons), they are still so incredibly comfortable because I don’t have to compile anything on my own and can switch effortlessly between Python versions.

5 Likes

I feel that, in its current form, this proposal completely ignores the last couple of decades of prior art. There is not a single mention of how distributing Python works on downstream distributions or how this would be an improvement.

Mind you, I’m not saying that this is not an improvement; I’m pointing out that, if it is, there is no mention of why this would be an improvement in this PEP because there is no mention of the status quo at all.


From the Motivation section above:

It becomes quick and easy to try Python prereleases, pin Python versions in CI, make a temporary environment to reproduce a bug report that only happens on a specific Python point release, etc.

All these environments already have a mechanism for downloading and installing Python, so a few unanswered questions come to mind: How is a new delivery mechanism an improvement over the existing ones? Why is a new format required? Is a new package manager required to handle this package format? How does this interoperate with existing distribution package managers? How are runtime dependencies resolved?

I understand that quite a few distributions don’t have a working, vanilla (e.g.: unpatched) Python. I’ll assume that the goal here is to address those particular distributions, there’s an obvious question that remains unanswered: why is packaging in this new package format better than shipping a native package for those distributions?


Finally, I see here a proposal for a new package format (which will then require the support tooling around it), and I cannot help thinking that this is perfect example of xkcd927.

1 Like

The point is to have a consistent way to get any version of Python on any system. Linux distributions generally only offer maybe 2 Python point releases at a time, usually not the latest, and there’s a bewildering variety of ways to install them – never mind cross-building environments for system X when running on system Y, or bundling up a python environment to send to a friend!

Basically the situation is exactly analogous to wheels. And I know some people wish that pip install didn’t exist and instead everyone was forced to get python packages through their distribution, but I don’t feel like I need to spend a lot of words explaining why pip install is useful :slight_smile:

9 Likes

I think this is a bit unfair, in the recent discussion there have been a number of people who have raised concerns about the wheel approach. https://pypackaging-native.github.io is the result of discussions here [I know many people reading this are well aware :wink: ] and while looking for a link to the PEP 704 discussion I came across a question abut how to share so s between wheels.

Many people are choosing to disengage, are using more suitable to their problem tools (system packaging, conda, containers), or quietly finding workarounds to put non-Python dependencies in wheels.


I want to co-sign basically everything @BrenBarn said above

I also suspect that if you go down @indygreg 's suggestion of shipping you own compilers and start having non-Python software in wheels there will be pressure to put shared libraries into their own wheels (e.g. packaging libhdf5 for h5py , pytables, and netcdf to depend on) and then you are most of the way to re-writing conda.


I would say the sdists uploaded to pypi are the backbone of the Python ecosystem, not the wheels. Eliding sdists and wheels to be “at the same level” is not correct. As I said in another thread sdists are a point of truth for what a release “is” and wheels are binary artifacts for one (of many) binary package managers that is derived from the sdist (that by historical path happens to be hosted adjacent to the sdists).


In all of these discussions I am not sure I have a clear idea of what about conda does not serve people well. Among reasons I think I have heard:

  • wall time to get from a tag to <tool> install package working with conda forge. But that is a cost of a central build farm [ok, public CI] and can be solved by a local channel on top of conda-forge
  • does not work with python.org binaries. But that is because conda provides its own Python that is built consistently with all of the c-extensions.
  • the solver is slow. But that is due to trying to be “correct”, some choices about what versions to (continue) to expose, and they just switched to using a faster solver implementation
  • it is controlled by a company. But that is not true anymore
  • conda envs have to be activated. But that is because you can install scripts / env variable to be set on activate. Some is this basically direnvs but for environments not paths and some of this is getting c-libraries to behave correctly in all cases.

From my comments, I am not particularly persuaded by these arguments. Are there others I am missing or am I not giving these issue enough weight?

7 Likes

It depends on the segment of the user base you are considering. For the users[1] I deal with personally (predominantly Windows users, with no compiler, who found Python from python.org[2]) the availability of binary wheels for numpy, pandas, matplotlib etc., is the core factor. Sdists are useless for anything but pure python libraries.


  1. And myself for that matter! ↩︎

  2. Some of whom tried conda because I mentioned it as an alternative, and abandoned it because they hated it (their words, not mine) ↩︎

2 Likes

From the feedback I’ve had, and my personal experience:

  • The vast majority of Python documentation and tutorials that describe using pip, venv and similar tools. Using conda means independently learning how to translate such instructions, and accepting the risk that you get something wrong in the process.
  • This may be out of date, but my recollection is that conda didn’t come pre-configured with conda-forge active. So the “out of the box” experience (important for all the people who don’t read the manual!) is suboptimal and frustrating.
  • Very subjectively, some people simply don’t like the conda UX. Personally, I don’t like activating environments (and if I do, I prefer to start a subshell with the environment activated, rather than modifying my existing shell). The conda concept of channels is obscure and frustrating for some people.

All of this can be argued as subjective, but again that’s the whole point. We cannot force a whole section of the Python community to use a new tool that they don’t like, and expect it to be a good experience. Languages like go and rust got to make that decision because they were starting from nothing. We don’t have that luxury.

Conda is great - many people swear by it and there’s no reason to believe they are wrong. But it’s not for everyone, and trying to insist that it is will fail. Creating a new tool that appeals to everyone might be possible, but I’m not sure - people are rather set in their preferences by now. And it’s a lot of work for an uncertain result. Why not just accept that multiple tools can co-exist, and work on making them work well together, and on guiding new users through the process of picking the tool that works best for them?

5 Likes

Speaking for myself, this is one thing I’m not too enthusted about with Conda. It seems that if I want to make my package installable “the way everyone expects” (i.e., from conda-forge), and especially if I make this my main method of distribution, I need to tie myself to the conda-forge community and infrastructure. Speaking as the author of the thread about python-poppler-qt5 that you linked to, I might eventually contribute a python-poppler-qt5 recipe to conda-forge, but for the time being I find learning about a separate repository and getting a recipe reviewed and merged then from “staging” to conda-forge and committing to later update it with releases (and, as you say, wait a certain delay after a release) a bit daunting.

1 Like

It’s very easy (and free!) to get your own channel at anaconda.org, and then your package is available as conda install -c <your channel name> <your package name>. You can even reupload builds of other packages that you need, and your builds will be preferred by anyone installing your package.

There is a general lack of options for Conda repositories, compared to PyPI-style repositories. But there’s certainly no reason you need to get your package into conda-forge. In my experience, packages aren’t even more discoverable in conda-forge - someone is more likely to discover new packages by browsing a smaller channel they heard about, such as pytorch or nvidia.


And I know this is getting off topic, but it seems inevitable that a (successful) PyBI approach would lead to similar patterns. The need to share packages that can trace up to a matching PyBI package is going to require certain scenarios to set up their own complete or near-complete indexes - PyTorch is probably a good example here. So I think the description of how an existing equivalent currently works is useful proof that it can work.

2 Likes

This is of course true, but this will also apply to any future “official” tools that get created. It even applies to the current tools. There are still websites out there talking about running setup.py install. People sometimes still google stuff and somehow get sent to docs pages for Python 2! Any changes that are made to Python packaging will always require doc updates and directing people away from outdated info. (And maybe, as I’ll say below, doc updates should happen even there are no changes to the official tools.)

I won’t derail this thread by getting into the details here, but I’m very interested in them! I really would like to have that discussion about how a tool should work, and I have the sense (to my frustration) that in these various threads people are often shying away from directly stating their preferences about such matters. So I appreciate you stating yours. :slight_smile:

That said, plenty of people don’t like the UX of the existing official tools either, which is why they use alternatives like conda or poetry. So even now the existing tools do not force anyone to do anything — which means a different set of official tools would also not force anyone to do anything. The question to me is what is the feature set that will be most beneficial to the widest swath of users, and should that feature set then be adopted in the official tools. (It was mentioned in one of the threads that future surveys might get into this, and I really hope that pans out.)

No doubt, but not all the differences are subjective. Things like “conda can manage the version of Python in the environment and pip cannot” are not subjective; they are genuine, factual differences. Of course that doesn’t mean they automatically override the subjective considerations. But again, to me the question is how can we determine the best set of features, taking into account both objective differences between the tools and subjective matters about their UI style. For instance, if there were a tool that combined the pip/venv UX you like with the additional ability to manage the python version in an environment, wouldn’t that be clearly better (according to your own subjective tastes) than the current scenario?

Sounds good to me. In fact, as I’ve mentioned on other threads, I think a good deal of the problem with the current setup is the official docs don’t do that: they basically just say “pip is the tool you should use”, when in fact, as you say, for many people that is not good advice.

This also gets back to what I see as a fundamental difference in viewpoint lurking in the corners of all these discussions, namely the relative importance of tools being “official” (in some sense or other). As I’ve mentioned before, my own belief is that a huge number of people who currently use pip/venv/etc. do not actually “like them” as such; what they like is having documentation available on python.org and having a tool that comes automatically with Python. If a different set of tools appeared in those official channels, many people would happily use those. From this perspective it is natural that multiple tools would co-exist, and the question is just which of them (perhaps more than one) gets foregrounded in the official documentation and/or is a transparent part of the Python install process.

I personally don’t use conda for a few reasons:

  • Every time I’ve tried to use conda in the past, the shenanigans they do to create environments has caused subtle breakages. I assume that’s not the general experience for people using Conda, so there’s something different about my use that’s exposing it… but the “standard” Python tools I’ve rarely had issue with (and when I’ve had, I can normally resolve it by forcing a problematic package to build from source).
  • A non trivial portion of time, I have a Python environment already that is being managed by something else and I need to install things into that environment. AFAIK Conda does not provide any mechanism for doing so.
  • AFAICT it’s pretty common for people using Conda to still need to use pip (or similar) into their Conda environments, so I still end up using the non Conda tools anyways.

Perhaps an important thing is I’ve never in my life installed a Numpy or a Scipy or a Pytorch for any reason other than to test a behavior of packaging tools. Pretty much all of the hard to build/distribute Python packages I’ve never personally had a reason to use them, so the benefits that Conda brings for them simply don’t matter for me.

1 Like