Wanting a singular packaging tool/vision

Sigh. I think you’re still missing my point. One further try.

I don’t have (or want) a pyproject.toml for this use case. That’s *exactly my point. I’m not even in a Python project, or writing one. If it helps, assume I’m writing my script in /tmp.

… and I don’t want to manage a virtual environment associated with my script. I want the system to do that for me.

Which is great, but they still need me to manage the environment in the sense that I have to pick a name for it, delete it when I’m done, and remember that that environment is associated with my (probably throwaway, but I’m keeping it “just in case”) script.

My ideal here is for scripts which depend on 3rd party libraries to be just as easy to write and use as scripts that only rely on the stdlib. And crucially, for all situations that pure-stdlib scripts can be used in.

The nearest I’ve found is pip-run, which lets you say __requires__ = ['requests'] in your script, and it will then install requests in a temporary environment when you run your script. Its main disadvantages are that it re-creates the environment every run (slow if you have complex dependencies) and that it has a somewhat clumsy command line syntax. But integrate that functionality with something like hatch run and you have pretty close to what I’m talking about.

Seems like pyflow is close enough (I have not tried). It seems to have support for __pypackages__ and __requires__.

I haven’t tried pyflow either, but this is where it seems like __pypackages__ would really come in handy, especially if built into Python (perhaps opt-in). Just brainstorming here:

  1. I write a little toy.py script and it imports say requests
  2. I run python3 -M toy.py (I’m just picking -M for “magic”)
  3. Python reaches the import, sees the missing requests dependency, goes out to PyPI and installs requests into __pypackages__, satisfying the import
  4. Python merrily and magically (there’s that -M again!) continues to execute toy.py

I’m not papering over all the complexities, security, metadata, etc. issues here. Well, maybe I am but deliberately so to give some feel for what would be the happy path to very simple, built-in, magical scripts. TPI? [1] Yeah, probably.


  1. Terrible Python Idea ↩︎

2 Likes

reminds me of David Beazley’s autoinstall :laughing:, https://www.youtube.com/watch?v=0oTh1CXRaQ0&t=9565s

1 Like

That’s not really true, e.g. here.

It’s a chicken-and-egg problem in the sense that as long as pip / PyPA consider installing non-python dependencies out of scope, yet installing all possible binaries that python projects might bring along in scope[1], then it leaves basically zero room for conda to contribute, because that stance effectively defines the problems that conda is solving out of existence.

So the 100’000ft view is that, to find a common path (and have it be pertinent for conda people to contribute), pip has to either:

  • expand its mandate to also cover non-python dependencies, effectively becoming a full-stack, cross-platform installer
  • restrict its purview, and allow resp. rely on plugging in other tools to fill the required gaps in installing binaries

Finally, while I don’t speak for anyone but myself, I’m spending the lion’s share of my FOSS time curating conda-forge, and know that ecosystem (and many involved people) quite well. Feel free to tag me on DPO for anything conda[-forge]-related.


  1. more in-depth discussion in the link above ↩︎

Part of the issue with any of these discussions is that:

  1. The actual problems (related to compiler toolchains, ABIs, distributing packages with compiled code in them, being able to express dependencies on non-Python libraries and tools, etc.) are quite complex,
  2. Most people here don’t have those problems as package authors, and in many cases they don’t have them as users either (simpler packages with some C/Cython code work fine as wheels),
  3. The solutions to those problems do necessitate some overhead, which make them hard to accept for folks that don’t have the problems,
  4. The scientific computing and ML/AI community hasn’t always explained the problems in enough detail. Often it’s a long email/discourse thread about one specific topic, and folks talk past each other because possible solutions are brought up before the problem is very clearly explained.

That makes it difficult to get anywhere with this conversation.

I would also say that it’s not only Conda that solves these problems. PyPI has quite fundamental problems when dealing with complex packages/dependencies with C/C++/CUDA/Fortran/etc. code. Those kinds of problems are solved (mostly) by Conda, but also by Spack, Nix, Homebrew, Linux distros, etc. Of those, Conda, Spack and Nix all have the concept of environments that you can activate/deactivate.

I’ll do a little pre-announcement here: I’m making a serious attempt at comprehensively describing the key problems scientific, ML/AI and other native-code-using folks have with PyPI, wheels and Python packaging. That as a standalone website (first release by the end of the year) which is kept up to date, aimed to serve as a reference, so we hopefully stop talking past each other. It will not have proposed solutions - this Discourse is the place for that. At most (for now) it some outgoing links to key PEPs and discussions around potential solutions to some of the issues.

I’ll reach out to invite you to participate in helping shape the content on some of the topics that you’ve brought up.

20 Likes

One nice thing is the standard py launcher can apparently now use Conda/Anaconda/Miniconda Python via py -V:Anaconda. :rocket:

Do I understand correctly that the main problems Conda solves are related to ensuring low-level ABI / binary version compatibility for shared, non-Python dependencies?
And that the downsides are a more limited set of available packages (e.g. no Python 3.10+ yet, only popular libraries that have been specifically packaged by Conda) and/due to additional effort to make these packages available?

How does it compare to Christoph Gohlke’s wheel binaries, which can be used with normal Python and pip? Why can they not be on PyPI?

Maybe all this will be explained on that new website?

It seems the other / original topics here are largely orthogonal, right?

  • install Python versions easily (“pyup”)
  • lock files
  • simple scripts with dependencies but without having to deal with environments
  • cross platform standardization (py vs python3)

As the py launcher can now already “plugin” external Conda, and is the bundled user facing top-level tool, maybe it would be the natural place to integrate more of these as subcommands via plugins?

I’m not sure what you mean here. I may not be part of the “user facing” target, but I’ve never used the py launcher in my whole life (and py doesn’t seem to exist on my system, even though I have multiple Python versions around). python or python3 is what I type.

Also, Conda is a package manager, not a Python runtime.

Huh? Basically everything that’s compatible with it upstream has had Python 3.10 packages for 6-12 months now, and Python 3.11 was out the day of the upstream release, without about half of Conda-Forge rebuilt for it within a few days to a week or so (IIRC), and about 2/3rds of packages fully compatible right now, with a fair portion of the remaining being those that don’t yet fully support it upstream—I’d imagine that’s probably fairly consistent with the proportion of PyPI packages that are testing on it and have released new compatible wheels.

Well, I guess it depends on how you define popular; I can generally find all but fairly niche packages on CF, and its not really that hard for anyone to use the mostly-automated tooling to generate a new recipe for any package they want and create a CF feedstock for it.

The py launcher is only on Windows, and AFAIK only ships with the Python.org distribution, not with Anaconda, or (I believe) the Windows Store Python, the nuget package, etc. Also, FWIW, despite primarily using Windows myself, maybe I’m just not experienced enough or in the target audience, but I’ve never had any reason to use it (I just invoke the Python I want directly, typically activating the desired env first) or recommend others do so (as it only installed on one particular platform and distribution, whereas I typically try to make documentation as broadly applicable as practical).

1 Like

Is there a conda-forge API that can be used to query what’s available (an equivalent of the PyPI simple and JSON APIs)? I’d love to write some comparison scripts, but I’m not keen on trying to scrape websites for the information (not least because I don’t even know what website I should scrape). I couldn’t find anything on the conda-forge docs describing an API.

Edit: Just to be clear, I’m after an API that works on a python.org distribution of PyPI, and can be used from my own code. So using the query facilities in conda isn’t the answer I’m looking for.

  • cross platform standardization (py vs python3)

I think all platforms should standardize on py. It’s one of the pieces that “just work” for me. Maybe it’s just me, no idea.

The Miniconda download site only lists Python up to 3.9 for me. And it installed its own bundled Python here, not just a package manager. I assume it can not manage packages for a “normal” Python. :person_shrugging:

You can just use the same API that Conda itself uses to get the package info; it should be pretty much ideally suited for your use case. Each of the top-level subdirectories (linux-64, win-64, noarch, etc.) of the main channel page has a current_repodata.json listing the names and key metadata (version, build number, build identifier, license, arch/python, size, build timestamp, etc) of the current versions of every package, and a repodata.json with every version of every package (for example, here’s the current_repodata.json` of `noarch`.

You can request this and convert it to a common format (dict, pandas df, etc) in 1-3 lines of code; for example, requests.get('https://conda.anaconda.org/conda-forge/noarch/current_repodata.json').json()["packages"].keys() will list the name of every Conda-Forge noarch package. Of course, you’ll need to perform PEP 503 normalization to match the packages names semi-reliably, and it still won’t be 100% due to differences in naming conventions and base/sub/meta packages between the platforms, but that should be a decent starting point, possibly with some heuristics depending on your needs.

1 Like

Thanks. Is the structure of the JSON documented anywhere? In particular, I’m unclear as to what all the packages listed in https://conda.anaconda.org/conda-forge/current_repodata.json are, as there is no architecture component to the URL. Is that “everything”, with the “subdirs” value for each package indicating which architectures are available?

The initial question I’d like to answer is “what packages/binaries are on PyPI that aren’t available on conda-forge?” and I really don’t want to misrepresent conda-forge because I’m ignorant of how it holds its data.

One point I would make in the “conda vs PyPA” debate is that because the PyPA is focused on standardising the data and APIs we support, there’s a clear demarcation between what tools can rely on as standard vs what’s implementation-defined and subject to unexpected change - is there any equivalent “standard APIs” idea for conda?

With regard to the py launcher, I’m unfortunately quite ignorant due
to not being a Windows or Mac user. None of the various Linux
distributions nor Unix derivatives I use seem to come with it. I
also install CPython from source in order to have a variety of local
minor versions for developing against and testing, but none (not
even my 3.12.0a2 build) seems to add a py executable. Is it supposed
to be included from a different source tree, or is there a
compile-time option I’m missing to create it?

And more importantly, what about py makes it a standard?

2 Likes

The closest I’ve found is here, though its a little light on the details. As very much a non-expert, I personally found it pretty self-explanatory; the one key clue I had was remembering seeing a repodata.json / current_repodata.json mentioned in Conda output, and found what you were looking for with a bit of poking around.

There isn’t such a URL, AFAIK; as mentioned the package data (current_repodata.json/repodata.json) is found in each of the subdirs that have an arch (or noarch) component, and the arch is also included in the JSON at the top level and for each package.

If a package is listed under the current_repodata.json an arch subdir (or noarch), it is available for that arch.

As mentioned, you can get an iterable of the names of every package under a given arch by running the one-liner requests.get(f'https://conda.anaconda.org/conda-forge/{arch}/current_repodata.json').json()["packages"].keys(). For practical purposes, taking the union of noarch and linux-64 is likely to get you the vast majority of unique packages.

As noted, the main challenge is just matching the names fairly reliably, but at least going in the direction Conda given PyPI, PEP 503 normalization + a few basic hueristics (e.g. stripping python-, IIRC) is likely to get you the great majority of the way there, if not 100% perfect.

It’s probably most useful to consider the top 5000 or so PyPI packages, since there’s always going to be a very long tail of uncommon packages that aren’t on CF.

I’m not really sure what the specific guarantees are, you’d have to ask someone more specifically involved in that. But FWIW, there is at least one very popular third party implementation (mamba, plus conda forks and possibly others), and I’ve had no problems using versions of conda that are many many years old with current CF; basically, the only things that could change is the endpoint URL and the keys/structure of the repodata.json file (and a few other ancillary data files, like channeldata.json) and they couldn’t change that in a backward-incompatible way without many years of notice since it would break existing condas.

PEP 397 standardises the py launcher for Windows, but it is not distributed (or standardised) on Unix. @brettcannon has created a Unix equivalent at GitHub - brettcannon/python-launcher: Python launcher for Unix, but it’s not “official” at this point.

2 Likes

Hmm, are you saying here that a package foo on PyPI (built from a project which names the project foo in its metadata) may not be called foo (up to normalisation) in conda? That’s a rather severe failure in terms of conforming to the agreed standards - and while the whole point of this exercise (from my perspective at least) is to understand where and why the conda ecosystem makes different choices than the PyPA standards, I’m struggling to imagine a reason why conda would not use the name that the project developer chose for their project…

This is where I don’t feel the conda project has engaged much with the wider packaging community - if there is a reason that the existing standard on project naming (which has been around for much longer than the PyPA) doesn’t suit conda’s needs, why would they not bring that up and ask for a discussion on how we could change the standard to take into account their use case? It’s not as if the rest of the community is going to know to ask, after all…

Of course, I may be completely misunderstanding the situation here. Again, that points to a lack of understanding each others’ positions, which again I’d like to see us try to address.

You’re probably going to want to present a bit more detailed rationale than “it just works for me” for pushing forward a major change to Python’s user-facing interface that you seem to be suggesting… :slightly_smiling_face:

That’s just the default version of Python that happens to ship by default with thebase environment, which is basically meaningless for most practical purposes, since it can easily be updated to whatever version you like (conda install python=3.11), and per standard best practice, the base environment is only used for conda itself (and maybe some common tools, if you’re pushing it). Otherwise for installing/using packages you pretty much always want to use environments that get seeded with the latest version of Python by default, or whatever version you specify on creation. For those reasons, those installers don’t need to get updated too often; note the timestamp on that page—nearly a year old.

If you want to see the status of CF migration to a new Python version, check the status page I linked above, and to check what version of Python itself or other specific packages are available on CF, defaults or other channels, search them on Anaconda.org and look under the Files tab (or you.can use conda info, etc).

It can’t manage packages for a non-conda-based Python distribution, but it can manage Python itself as just another package. The bundled Python is just the one used by conda itself (which is written in Python), if not for that it wouldn’t even need to come with Python as it also supports R, JS and other languages and can install them too.

I see, so there is progress on standardizing py for various ways
Python is distributed on Windows, and one of the (many) proposals
for a vision here includes making all other platforms work more like
Windows. The readme for Brett’s python-launcher project provides a
decent rationale, though at the back of my mind I can’t help but
think it sounds a bit like “the problem is there are multiple
competing names for the interpreter, so the solution is to add yet
another” (a la XKCD 927).

The main distinguishing feature for python-launcher seems to be its
ability to autodetect the existence of a .venv subdirectory in the
current working directory and use that instead of the global system
environment, have I got that right? Is that the same thing py does
on Windows then?

I think this nicely articulates the repressed pain that most intermediate python users who want to share their with with others live with. I explicitly try to avoid non stdlib packages in anything I consider a ‘script’, as soon as there are third party packages it becomes an ‘application’ in my mind and I need to worry about how to distribute it in such a way that non-programmers can use it without having to manually create a venv, activate it, install deps etc.

my application workflow

I generally build a wheel and provide a shell/batch script to install it into a venv.

I declare a script (confusing second use of this term) in pyproject.toml which creates an executable for my users to run the software with no need to know about the venv.

The same can be achieved with pipx (which I love) but sometimes it’s not appropriate to assume the user can install pipx.

2 Likes