[announce] Pybi and Posy

pylang · January 24, 2023, 10:55pm

The proposed workflow is familiar and common in conda. I’m ok with it.

I think there are two missing workflows:

Simple Scripts
Publishing

Simple Scripts

Writing simple .py scripts is a common workflow. @pf_moore once posted about this where sometimes you have scripts lying around in a scratch folder. However, the moment you add a third-party tool, the complexity spikes as you now think about dependency management, virtual environments and even making full packages. Can posy help alleviate this overhead? I’d suggest some manner of standardizing third-party dependencies in simple scripts. Example: hatch appears to have a solution to this problem.

Publishing

Publishing packages is a common intermediate workflow. I believe posy is staying clear from the “publish to PyPI” features seen in tools like poetry and flit. However, it may be worth considering standardizing a common UI that integrates with publishing backends based on existing standards. Example: > posy publish could be a common command that flit/poetry/etc publishing tools hook into. Additional args can be passed to the backend tool (in a similar manner to how pipx can pass extra args to the underlying pip backend).

I think including these two workflows could broaden the spectrum of users posy can reach.

njs · January 25, 2023, 6:21am

Since it seems like a few people (understandably!) missed it buried in the long thread above, I’ll just quote this again:

Nathaniel J. Smith:

I don’t want to derail the thread into conda/pypi discussions, but there’s a common misconception here I want to address.

The packaging ecosystem of PyPI/wheels/sdists/setuptools/etc. is really more of a “meta-packaging” system. Python gets used in tons of different contexts, and packages people upload are PyPI are generally useful in multiple contexts – so e.g. when I upload a Trio release to PyPI, I assume it might later end up in a Debian package, internal tooling inside macOS, a RenPy game, a conda package, a Blender plugin, Google’s monorepo, a Nix snapshot, etc etc. All of these environments make specific choices about how their Python environments work, how to manage dependencies, so on – and the PyPI/wheel/etc. ecosystem is designed to be flexible enough that all these different downstream ecosystems can consume our packages and adapt them to their situation. If I release a conda package, then only conda can use my code; if I release a PyPI package, everyone can use it (including conda).

So I’m going to upload Trio to PyPI. But that means Trio needs to use PyPI-style metadata, and in particular PyPI-style dependency declarations for the packages I want to use (since this is the dependency metadata that all those downstream ecosystems know how to consume and map to their own framework). And that means I need a way to take my PyPI dependencies, and run my tests against them, which means I need a way to build Python environments out of PyPI packages. (And also I really want direct access to the packages that other people are uploading to PyPI, so I can do stuff like push a feature to one package, and then immediately use that feature in another package. The downstream ecosystems will catch up on their own time, but first I need to do the work they want to catch up with!)

So for lots of end-users, conda and PyPI are more-or-less interchangeable – they can pick whichever they prefer based on whatever tradeoffs are important to them. But package maintainers need to work with PyPI, and since they’re the foundation of our whole ecosystem, they deserve good tools!

I’m not opposed to supporting these at some point, though I don’t think they’re necessary for a MVP. Publishing in particular would be pretty simple: posy already knows how to speak PEP 517, so it could use that to invoke your package’s build system and generate sdist/wheel artifacts, and then it’d just have to post those to PyPI’s upload API. For the “simple scripts” case I’m not sure what people actually want, e.g. – is it so bad to put some metadata in your scratch folder? don’t you still want lock files so your simple scripts don’t bitrot when you’re not looking? where do you put the lock file if it’s just one script? – but if there’s a simple workflow then sure why not.

jvolkman · January 25, 2023, 6:38pm

I’m interested in the Pybi portion of this because we make use of pre-built, portable Python distributions as part of Python + Bazel. Is there a meaningful difference between Pybi and @indygreg’s python-build-standalone, or are they both accomplishing the same thing?

dustin · January 25, 2023, 8:14pm

This might be unrelated, but there is some prior art to making CPython installable via wheels if you haven’t seen this before: https://github.com/jjhelmus/give-me-python. (Although I guess as-is, this doesn’t really sidestep the stated issue of “wheels currently need a pre-existing Python environment”.)

njs · January 25, 2023, 11:47pm

There are two things here: the “pybi” container format, and the pybi artifacts that I’m currently hosting at https://pybi.vorpus.org.

The container format isn’t particularly novel; it’s a zip file with metadata. The main “new thing” is the attempt to standardize it so we can have official pre-built CPython distributions on PyPI, make it easy for folks to try out alternative interpreters like PyPy/Cinder/CPython-nogil, build an ecosystem of tools that consume them, etc. So that part’s orthogonal to python-build-standalone: you could stick the python-build-standalone builds inside a pybi if you wanted to distribute them that way.

As for the builds at https://pybi.vorpus.org: they’re as vanilla and boring as I could make them. The Windows and macOS pybis are official binaries from the CPython release team that have been repackaged, and the Linux pybis re-use the infrastructure we built for the manylinux docker image and manylinux wheels. The python-build-standalone project OTOH is a bit more opinionated about tweaking things to support their goals, which produces various “quirks”. So my guess is that my pybis are more universally compatible with regular Python workflows, but obviously less useful as an input to pyoxidizer single-file python distributions and so on.

If you just want a python and don’t care too much about the details, and are already happy with python-build-standalone, then I don’t think there’s much reason to switch to my builds.

indygreg · January 26, 2023, 4:05am

There is some tweaking in python-build-standalone to support goal of being re-linked into a downstream application, yes. But the “normal” distributions (not the Windows static or Linux musl) distributions don’t have as many quirks as one may think. The biggest technical difference IMO is aggressively building extensions as built-ins, including statically linking library dependencies. But at the point you are distributing all the 3rd party libraries yourself, static vs dynamic linking is effectively an implementation detail. (At the end of the day CPython just calls a PyInit_* function to initialize an extension - doesn’t really matter if that symbol is statically linked or loaded from a dynamic library.)

I think the biggest difference between python-build-standalone and pybi is that PBS was purpose built for PyOxidizer. It was always assumed that distributions needed to be re-linked and there would be a [Rust] runtime component controlling the embedded Python interpreter. This allowed me to cut some corners with the usability of the distributions. Then some folks realized the distributions were generally useful and started using them. So I made the distributions more user friendly when people asked. It sounds like pybi started on the other side of the spectrum and therefore avoided some of the more user-hostile quirks that continue to plague PBS.

Looking at the pybi distributions, modifying python-build-standalone to produce a similar filesystem layout with a matching set of “quirks” seems very viable. I could easily see myself doing that if pybi gains traction.

IMO the biggest limitation with any distributed Python distribution is the sysconfigdata problem. A lot of build time state gets distributed into the _sysconfigdata_<platform>.py file, which is read by sysconfig and various build tools (like distutils) to figure out important things like how to compile extension modules. This means that extension module building likely blows up if the run-time environment doesn’t match the build environment. You either have to a) allow people to reproduce the build environment (e.g. container images) b) make everyone use wheels so they never have to build extensions c) dynamically derive extension building configs on the host machine. “c” is effectively “reimplement large parts of CPython’s autconf.” python-build-standalone and these pybi both suffer from this problem. Fortunately, Windows and macOS can be largely immune from this problem due to platform homogeneity. But it is a nightmare for Linux and similar OS.

While I’m here and we’re talking about bootstrapping Python, I’d like to remind people of the existence of PyOxy, a single file Python distribution [built with PyOxidizer]. If you rename the pyoxy binary to python, it behaves as you’d expect. Even contains pip out of the box. So it is technically viable to extend a Posy binary to contain a Python interpreter and be self-bootstrapping. Or you could build all of Posy’s functionality in Python and reuse the tried-and-true Python packaging tools without having to reinvent wheels [in Rust]. Just pointing this out so people don’t get too hung up on the Rust implementation detail and lose sight on more important matters for end-users, namely ease of use.

Great work on pybi and Posy, Nathaniel!

njs · January 26, 2023, 4:57am

Yeah, but unfortunately the weird way ELF does symbol lookup means that you really have to be distributing all the libraries yourself :-/. ELF has a global namespace, which is all the symbols in the main executable, and local namespaces, which are created for each dlopen’ed extension module. But, unlike every other namespace system, symbol lookup checks the global namespace first, and then local namespace. So if you want to avoid namespace collisions and segfaults when loading extension modules, you need to minimize how much stuff is linked into the main executable. (That’s why you have problems with PyQt – statically linking tkinter to libX11 would be fine on its own; it’s making tkinter a built-in module that breaks things.)

Of course none of this is a problem for pyoxidizer users who are making standalone binaries, and don’t need to worry about users trying to install random wheels they found lying on the sidewalk!

The nice thing about re-using the manylinux infra is that it’s composed of like 90% scar tissue for this kind of stupid esoteric issue. It also handles the libcrypt issue, e.g.

Yeah, I already ran into this:

github.com/pypa/setuptools

On a relocatable Python install, setuptools should compute the C include path correctly

opened 09:18AM - 21 Jan 23 UTC

njsmith

You can distribute CPython as a "relocatable" zip file, meaning, you just drop t…he zip file where-ever you want, and it works. For example, on Linux: ``` $ wget https://pybi.vorpus.org/cpython_unofficial-3.11.0-1-manylinux_2_17_x86_64.pybi $ mkdir cpython-3.11 $ cd cpython-3.11 $ unzip ../cpython_unofficial-3.11.0-1-manylinux_2_17_x86_64.pybi $ bin/python -c 'print("Hello world")' Hello world ``` (You can also try this on [macos](https://pybi.vorpus.org/cpython_unofficial-3.11.0-macosx_11_0_universal2.pybi) or [windows](https://pybi.vorpus.org/cpython_unofficial-3.11.0-win_amd64.pybi) if you prefer.) The resulting python knows where it is, where wheels should be installed, and all that; see `sysconfig.get_paths()`: ``` $ bin/python -c 'import sysconfig, pprint; pprint.pprint(sysconfig.get_paths())' {'data': '/tmp/cpython-3.11', 'include': '/tmp/cpython-3.11/include/python3.11', 'platinclude': '/tmp/cpython-3.11/include/python3.11', 'platlib': '/tmp/cpython-3.11/lib/python3.11/site-packages', 'platstdlib': '/tmp/cpython-3.11/lib/python3.11', 'purelib': '/tmp/cpython-3.11/lib/python3.11/site-packages', 'scripts': '/tmp/cpython-3.11/bin', 'stdlib': '/tmp/cpython-3.11/lib/python3.11'} ``` Almost everything works great. But! When you use setuptools to compile a C extension, it uses the wrong include path: ``` $ bin/python -m ensurepip $ bin/python -m pip install -U pip setuptools $ mkdir foo $ cd foo $ echo '#include <Python.h>' > foo.c $ cat <<EOF >setup.py from setuptools import setup, Extension setup(name="foo", ext_modules=[Extension("foo", sources=["foo.c"])]) EOF $ ../bin/python setup.py build_ext running build_ext building 'foo' extension creating build creating build/temp.linux-x86_64-cpython-311 gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/pyinstall/include/python3.11 -c foo.c -o build/temp.linux-x86_64-cpython-311/foo.o foo.c:1:10: fatal error: Python.h: No such file or directory 1 | #include <Python.h> | ^~~~~~~~~~ compilation terminated. error: command '/usr/bin/gcc' failed with exit code 1 ``` Notice that the gcc command line has `-I/pyinstall/include/python3.11`, which is a path on the machine where this Python interpreter was built, not the path where it's installed, or the path returned by `sysconfig.get_paths()`. I wasn't able to entirely follow where setuptools gets its include path from, but I think it's ultimately coming from `distutils.sysconfig.get_python_inc`, which invokes a number of strategies to find the include directory, but they're all based on static compile-time data, rather than the actual runtime location of the Python interpreter. This is the only problem I've managed to find so far with these relocatable interpreters, so it would be nice if we could fix setuptools to use the correct include path :-).

Posy’s PEP 517 frontend has a gross hack to work around it for now, but hopefully it should be pretty easy to fix on the setuptools side.

Dealing with that is enough to get at least simple extension modules to build, but I 100% believe you that there are other issues lurking :-). Any chance you have a list?

mattip · January 26, 2023, 5:36am

PyPy works around the _sysconfig_data problem in its portable builds, so this is not insurmountable. I don’t want to derail this thread with the details, feel free to open an issue on the relevant repo and tag me ( mattip ) if help is needed.

termim · February 1, 2023, 8:01pm

That puzzles me too. Does “python workflow” require something that Rust provides and Python does not?

termim · February 1, 2023, 8:07pm

PyInstaller, cx-Freeze or py2exe can do the same.

ncoghlan · February 2, 2023, 12:04am

Bit of a tangent, but I bring it back to relevance below: this notion of an excessively long import path causing problems also comes from the pre-3.3 import machinery, which didn’t cache the import directory listings, so the performance of each import was linear in the number of path entries, with the per-entry overhead being multiple failing stat calls.

The caching reduced the per-import overhead enormously, and the introduction of scandir made it faster to populate the cache in the first place.

The part of this history that is still relevant: even with the old import scheme, it wasn’t really local path entries that caused massive problems (even on spinning HDDs). It was NFS mounted drives where each stat call incurred network latency. Even with the modern caching, lots of network accesses can really slow down the process of populating the caches.

I think it’s fine to deem that a “later” problem though - flattening a posy environment into something more easily shared via a network mount wouldn’t be particularly hard to implement, after all.

steve.dower · February 2, 2023, 12:17am

Hate to disappoint, but importlib still uses listdir, and when I tried it scandir didn’t actually make any improvement (unsurprising on Linux, bit surprising on Windows, but it checked out).

Still, an importer that knows it’s dealing with a big set of paths could optimise in ways that the default does not. So it’s a perfectly good “later” problem.

sethdill · November 5, 2023, 6:46pm

I love sentences like this, where theory meets reality and they casually nod at each other and go about their business.