The proposed workflow is familiar and common in conda. I’m ok with it.
I think there are two missing workflows:
Simple Scripts
Publishing
Simple Scripts
Writing simple .py scripts is a common workflow. @pf_moore once posted about this where sometimes you have scripts lying around in a scratch folder. However, the moment you add a third-party tool, the complexity spikes as you now think about dependency management, virtual environments and even making full packages. Can posy help alleviate this overhead? I’d suggest some manner of standardizing third-party dependencies in simple scripts. Example: hatch appears to have a solution to this problem.
Publishing
Publishing packages is a common intermediate workflow. I believe posy is staying clear from the “publish to PyPI” features seen in tools like poetry and flit. However, it may be worth considering standardizing a common UI that integrates with publishing backends based on existing standards. Example: > posy publish could be a common command that flit/poetry/etc publishing tools hook into. Additional args can be passed to the backend tool (in a similar manner to how pipx can pass extra args to the underlying pip backend).
I think including these two workflows could broaden the spectrum of users posy can reach.
Since it seems like a few people (understandably!) missed it buried in the long thread above, I’ll just quote this again:
I’m not opposed to supporting these at some point, though I don’t think they’re necessary for a MVP. Publishing in particular would be pretty simple: posy already knows how to speak PEP 517, so it could use that to invoke your package’s build system and generate sdist/wheel artifacts, and then it’d just have to post those to PyPI’s upload API. For the “simple scripts” case I’m not sure what people actually want, e.g. – is it so bad to put some metadata in your scratch folder? don’t you still want lock files so your simple scripts don’t bitrot when you’re not looking? where do you put the lock file if it’s just one script? – but if there’s a simple workflow then sure why not.
I’m interested in the Pybi portion of this because we make use of pre-built, portable Python distributions as part of Python + Bazel. Is there a meaningful difference between Pybi and @indygreg’s python-build-standalone, or are they both accomplishing the same thing?
This might be unrelated, but there is some prior art to making CPython installable via wheels if you haven’t seen this before: https://github.com/jjhelmus/give-me-python. (Although I guess as-is, this doesn’t really sidestep the stated issue of “wheels currently need a pre-existing Python environment”.)
There are two things here: the “pybi” container format, and the pybi artifacts that I’m currently hosting at https://pybi.vorpus.org.
The container format isn’t particularly novel; it’s a zip file with metadata. The main “new thing” is the attempt to standardize it so we can have official pre-built CPython distributions on PyPI, make it easy for folks to try out alternative interpreters like PyPy/Cinder/CPython-nogil, build an ecosystem of tools that consume them, etc. So that part’s orthogonal to python-build-standalone: you could stick the python-build-standalone builds inside a pybi if you wanted to distribute them that way.
As for the builds at https://pybi.vorpus.org: they’re as vanilla and boring as I could make them. The Windows and macOS pybis are official binaries from the CPython release team that have been repackaged, and the Linux pybis re-use the infrastructure we built for the manylinux docker image and manylinux wheels. The python-build-standalone project OTOH is a bit more opinionated about tweaking things to support their goals, which produces various “quirks”. So my guess is that my pybis are more universally compatible with regular Python workflows, but obviously less useful as an input to pyoxidizer single-file python distributions and so on.
If you just want a python and don’t care too much about the details, and are already happy with python-build-standalone, then I don’t think there’s much reason to switch to my builds.
There is some tweaking in python-build-standalone to support goal of being re-linked into a downstream application, yes. But the “normal” distributions (not the Windows static or Linux musl) distributions don’t have as many quirks as one may think. The biggest technical difference IMO is aggressively building extensions as built-ins, including statically linking library dependencies. But at the point you are distributing all the 3rd party libraries yourself, static vs dynamic linking is effectively an implementation detail. (At the end of the day CPython just calls a PyInit_* function to initialize an extension - doesn’t really matter if that symbol is statically linked or loaded from a dynamic library.)
I think the biggest difference between python-build-standalone and pybi is that PBS was purpose built for PyOxidizer. It was always assumed that distributions needed to be re-linked and there would be a [Rust] runtime component controlling the embedded Python interpreter. This allowed me to cut some corners with the usability of the distributions. Then some folks realized the distributions were generally useful and started using them. So I made the distributions more user friendly when people asked. It sounds like pybi started on the other side of the spectrum and therefore avoided some of the more user-hostile quirks that continue to plague PBS.
Looking at the pybi distributions, modifying python-build-standalone to produce a similar filesystem layout with a matching set of “quirks” seems very viable. I could easily see myself doing that if pybi gains traction.
IMO the biggest limitation with any distributed Python distribution is the sysconfigdata problem. A lot of build time state gets distributed into the _sysconfigdata_<platform>.py file, which is read by sysconfig and various build tools (like distutils) to figure out important things like how to compile extension modules. This means that extension module building likely blows up if the run-time environment doesn’t match the build environment. You either have to a) allow people to reproduce the build environment (e.g. container images) b) make everyone use wheels so they never have to build extensions c) dynamically derive extension building configs on the host machine. “c” is effectively “reimplement large parts of CPython’s autconf.” python-build-standalone and these pybi both suffer from this problem. Fortunately, Windows and macOS can be largely immune from this problem due to platform homogeneity. But it is a nightmare for Linux and similar OS.
While I’m here and we’re talking about bootstrapping Python, I’d like to remind people of the existence of PyOxy, a single file Python distribution [built with PyOxidizer]. If you rename the pyoxy binary to python, it behaves as you’d expect. Even contains pip out of the box. So it is technically viable to extend a Posy binary to contain a Python interpreter and be self-bootstrapping. Or you could build all of Posy’s functionality in Python and reuse the tried-and-true Python packaging tools without having to reinvent wheels [in Rust]. Just pointing this out so people don’t get too hung up on the Rust implementation detail and lose sight on more important matters for end-users, namely ease of use.
Yeah, but unfortunately the weird way ELF does symbol lookup means that you really have to be distributing all the libraries yourself :-/. ELF has a global namespace, which is all the symbols in the main executable, and local namespaces, which are created for each dlopen’ed extension module. But, unlike every other namespace system, symbol lookup checks the global namespace first, and then local namespace. So if you want to avoid namespace collisions and segfaults when loading extension modules, you need to minimize how much stuff is linked into the main executable. (That’s why you have problems with PyQt – statically linking tkinter to libX11 would be fine on its own; it’s making tkinter a built-in module that breaks things.)
Of course none of this is a problem for pyoxidizer users who are making standalone binaries, and don’t need to worry about users trying to install random wheels they found lying on the sidewalk!
The nice thing about re-using the manylinux infra is that it’s composed of like 90% scar tissue for this kind of stupid esoteric issue. It also handles the libcrypt issue, e.g.
Yeah, I already ran into this:
Posy’s PEP 517 frontend has a gross hack to work around it for now, but hopefully it should be pretty easy to fix on the setuptools side.
Dealing with that is enough to get at least simple extension modules to build, but I 100% believe you that there are other issues lurking :-). Any chance you have a list?
PyPy works around the _sysconfig_data problem in its portable builds, so this is not insurmountable. I don’t want to derail this thread with the details, feel free to open an issue on the relevant repo and tag me ( mattip ) if help is needed.
Bit of a tangent, but I bring it back to relevance below: this notion of an excessively long import path causing problems also comes from the pre-3.3 import machinery, which didn’t cache the import directory listings, so the performance of each import was linear in the number of path entries, with the per-entry overhead being multiple failing stat calls.
The caching reduced the per-import overhead enormously, and the introduction of scandir made it faster to populate the cache in the first place.
The part of this history that is still relevant: even with the old import scheme, it wasn’t really local path entries that caused massive problems (even on spinning HDDs). It was NFS mounted drives where each stat call incurred network latency. Even with the modern caching, lots of network accesses can really slow down the process of populating the caches.
I think it’s fine to deem that a “later” problem though - flattening a posy environment into something more easily shared via a network mount wouldn’t be particularly hard to implement, after all.
Hate to disappoint, but importlib still uses listdir, and when I tried it scandir didn’t actually make any improvement (unsurprising on Linux, bit surprising on Windows, but it checked out).
Still, an importer that knows it’s dealing with a big set of paths could optimise in ways that the default does not. So it’s a perfectly good “later” problem.