Installer creation based on distributions

jeanas · September 24, 2023, 9:49pm

I hope it’s okay that I post this in Packaging in spite of it being a user help question, though it may inspire tool improvements, who knows. (This was originally at https://discuss.python.org/t/tool-recommendation-for-creating-installers-with-contents-taken-from-wheels/32754.)

In a nutshell, the situation is: I have a complex application with several dependencies including extension modules. I have worked on packaging those dependencies that I control, so that all dynamic libraries are included, wheels contain exactly the needed data files, etc. And now I want to build an installer, so I try PyInstaller. The trouble is that it insists on running its own dependency discovery logic, by scanning imports in the source, and having its peculiar way of declaring data files and hidden dependencies. Ignoring all the painstakingly crafted packaging metadata.

I have raised this on PyInstaller: Automatically include data-files from unzipped eggs and wheels · Issue #2717 · pyinstaller/pyinstaller · GitHub. Maintainers do not seem very enthusiastic so far.

Now, I know how to use importlib.metadata to write hooks that do this, but I’d rather avoid maintaining them if possible. My question is: do you know a tool that will not try to be smart and will simply (well, “simply”) take a package, resolve dependencies like pip, and bundle the contents of all wheels (with the Python interpreter plus an installer icon and such)?

kknechtel · September 24, 2023, 10:10pm

Is it sufficient for your purposes to build a wheel (say, with build, or with Poetry or any other toolchain), and then have users install it with pipx?

jeanas · September 24, 2023, 10:13pm

No, my users are largely non-technical people who probably don’t know what a terminal is, and I need to give them a “normal installer”. Plus, pipx doesn’t install a shortcut for the app with an icon, file associations, etc.

pf_moore · September 24, 2023, 11:05pm

Unfortunately, I don’t. This is why I’m so enthusiastic about the idea that “packaging should include creating standalone applications” - the import-chasing machinery used by PyInstaller and other installer makers, seems quite fragile, and rooted in a time when package dependency data was very much in its infancy and nowhere near as reliable as it is now. So there’s a big cultural gap between the people writing tools to build application installers and the packaging community.

I suspect there’s also a minimalist “optimisation” motivation involved here - import chasing can strip out unneeded chunks of a large library, reducing the final application size significantly (think something like pywin32, where only a tiny fraction of the functionality might be used by any given application). But it also increases the fragility, if the application uses tricks like lazy loading.

This is precisely the audience that aren’t supported well by the existing console entry point based solutions used in “packaging” (as opposed to tools like PyInstaller). So yes, I think you have a valid use case that’s currently badly served by the Python packaging ecosystem. Sorry there’s no better news to offer here.

(One thought, BeeWare are apparently doing interesting things in the application building area. I’ve not looked into what they do myself, and it’s possible they’ve also adopted the import-chasing approach, but it might be worth checking them out).

hansgeunsmeyer · September 24, 2023, 11:15pm

I’ve never used an all-in-one GUI-based installer (except sometimes for installing Python itself on MacOS!), so also don’t know what would work. But there are alternatives to PyInstaller. Perhaps https://pyoxidizer.readthedocs.io/en/stable/ would be more suitable/usable?

See also: https://pyoxidizer.readthedocs.io/en/stable/pyoxidizer_comparisons.html

jeanas · September 24, 2023, 11:43pm

Paul Moore:

Unfortunately, I don’t. This is why I’m so enthusiastic about the idea that “packaging should include creating standalone applications” - the import-chasing machinery used by PyInstaller and other installer makers, seems quite fragile, and rooted in a time when package dependency data was very much in its infancy and nowhere near as reliable as it is now. So there’s a big cultural gap between the people writing tools to build application installers and the packaging community.

I suspect there’s also a minimalist “optimisation” motivation involved here - import chasing can strip out unneeded chunks of a large library, reducing the final application size significantly (think something like pywin32, where only a tiny fraction of the functionality might be used by any given application). But it also increases the fragility, if the application uses tricks like lazy loading.

Thank you for confirming this.

Reading Frequently Asked Questions - Briefcase 0.3.16

“Yes! Briefcase uses pip to install third-party packages into your app bundle. As long as the package is available on PyPI, or you can provide a wheel file for the package, it can be added to the requires declaration in your pyproject.toml file and used by your app at runtime.”

That sounds like it might be a candidate! I’ll try it, thanks again.

On the comparison page, I read

A current difference between the tools is that PyInstaller generally has better support for binary dependencies. PyInstaller knows how to find runtime dependencies and allows a lot of not-easy-to-build packages like PyQT to work out of the box. With PyOxidizer, you could need to add sufficient complexity to its configuration files to get things to work.

This is a PyQt app, so…

Out of curiosity, I did try to understand how data files work in PyOxidizer, but I’m having trouble wrapping my head around the documentation. My understanding is that you have to specify them manually, but I may be wrong. However, from what I understand of Packaging Files Instead of In-Memory Resources — PyOxidizer 0.23.0 documentation, PyOxidizer tries to turn the app into a single executable and does not do runtime unpacking of files, temporary filesystems or similar, and this prevents the package from using data files in a filesystem-based way. This too would cause endless trouble for the app in question.

hansgeunsmeyer · September 25, 2023, 12:15am

Ok - that’s disappointing. Yeah - I took a quick look at those pages, and even though they might have work-arounds, having to learn the configurations seems pretty tedious. So, then why not just do it from scratch and make a tarball of the complete virtual Python env (a minimal env, either venv or conda, including the interpreter), and just providing two minimalistic GUIs, one for unpacking that tarball (or you could have some self-extracting zip file or so) and one for kicking off the Python interpreter + app?

Actually - I see that I missed that line about Briefcase - Well, hopefully that works

kknechtel · September 25, 2023, 12:50am

FWIW, I see the PyBI proposal as useful for this case, but clearly not PEP 722 etc. The future I see is one where the developer builds wheels for some specific Python targets, which can be communicated to a tool somehow (maybe this is something that PipX could incorporate?). Then the tool would create something like a self-extracting archive (it can’t be a .pyz or anything else like that - by my reading, for this use case, we can’t assume Python is present on the client at all!), which:

unpacks a wheel and a PyBI specification from its own archive
uses the PyBI specification to obtain Python and create an isolated environment (or perhaps the tool reads such a specification ahead of time and directly includes it in the archive? After all, the result needs to be platform-specific anyway, as it has to be self-hosting executable on varying platforms)
installs the wheel in that environment
possibly creates shortcuts, .desktop files on Linux, or other wrappers, which invoke entry points of the installed wheel.

I don’t know whether it would be beneficial to create .deb, .rpm, .msi etc. files, or whether they (each considered individually) could be set up to follow this process.

[Edit: I looked around a bit and it seems like makeself could prepare an ordinary shell script with embedded binary data to solve the problem on Linux. I think formats like .deb and .rpm are meant for non-“portable” installations (i.e. actually putting stuff in /usr/bin) that might not be desirable.]

I guess it would be neat if such a tool, in the same breath, could also create a zipapp; when that gets run from any arbitrary Python, the main script could then determine by some logic whether the currently running Python is a suitable place to install a wheel contained within the zip (and just shell out to Pip to do so), or if it needs to create a new environment, or just what. (Of course, the subprocess.call, os.system or whatever invocation would explicitly invoke a Python executable for Pip by fully qualified path, and not rely on how the shell resolves python or pip.)

ofek · September 25, 2023, 4:46am

After the next minor release of Hatch I will publicly announce this (I was waiting because the installers that the release will introduce use it): GitHub - ofek/pyapp: Build self-bootstrapped Python applications

The next feature I plan to add is embedding dependencies as you state.

ofek · September 25, 2023, 4:50am

There are options to fall back to file system loading or only use file system loading. In these cases it’s basically a site-packages with a different structure.

jeanas · September 25, 2023, 9:10am

Whether this sort of “self-extracting archive” is what Joe User wants depends on the platform. On Windows it is more or less what you want, installation programs are usually executables that run a configuration assistant (which IIUC changes the Windows registry to install file associations). On macOS, it’s not – what users expect is a .dmg (basically a glorified archive format) that contains a .app bundle. The app bundle is not an executable, it’s a folder that follows a standard structure with pesky rules, with a metadata file describing file associations and such. On Linux, there is not really a standard for self-contained apps except AppImage (which hasn’t really gotten popular, AFAIK) but we have Flatpak these days, so it’s not necessary at least in my use case. (And Flatpak is so easy compared to setting up your own infrastructure for building installers and worrying about binary compatibility).

TL;DR: creating self-contained executables and creating installers are related but different use cases. Some tools cater for one of them, some both.

steve.dower · September 25, 2023, 9:39am

pynsist and briefcase (mentioned above, the Beeware one) are my usual recommendations, at least for Windows users.

Though personally I think it’s worth investing in building it into your own project’s installer (I don’t have a good public example, but there’s at least one Store app from Microsoft that has a secret copy of the embeddable Python distro inside ). The main trick is replacing python.exe (source) with your own executable, or else you get weird behaviour (e.g. search/Start menu and taskbar don’t behave properly).

I don’t have any particular recommendation for other platforms. They all allow self-extracting shell scripts though, so that seems to be the way people go.

kknechtel · September 25, 2023, 9:42am

Unfortunately IMX the end-user experience with Flatpak does not live up to expectations. While it’s supposed to reuse dependencies, in practice I found that it would duplicate major packages (think all of KDE) because of trivial version-number differences specified in the requirements (and it could be tiny applications requesting to run in KDE, when they might not even really need it). An isolated Python environment costs, but nowhere near that much.

ofek · September 25, 2023, 2:14pm

Just FYI because I messed up the installer for Hatch and tools at work until last week, users want a DMG file only for GUIs which then show up in one’s applications. For CLIs you absolutely DO NOT want that because there is no mechanism to add the executable to PATH. In that case, you would use a flat package (.pkg) installer:

bwoodsend · September 26, 2023, 8:13pm

[PyInstaller maintainer here]

Whilst, given a blank slate, I don’t think we’d ever go down the route of of import scanning again, don’t underestimate just how much size this can knock off. PyQt6 is ~200MB on Linux but most applications only use QtWidgets, QtGui, QtCore which import scanning can filter down to ~80MB. SciPy is another huge package from which you only ever need one or two algorithms for at a time. Then there’s stuff that shouldn’t be in packages at all such as numpy.testing which can be removed too. On the small applications side, not needing to bundle tkinter or openssl adds up to quite some savings too. I’m not convinced it’s worth it, but now that people have it, it’s hard to tell them that we’ll be ~doubling their application sizes just because we wanted some cleanliness. If the philosophy of one package to do one thing well became better appreciated and PyPI wheels were free of test suites and examples, then that ratio would reduce and I imagine that we would be able to justify the switch more easily.

I also want to point out that I don’t think dependency discovery is actually the biggest problem in packaging. Worse I’d say is creating a valid, codesignable and notorisable macOS .app bundle (for which package contents need to be dismantled and regrouped by type which is the opposite of how Python packages are laid out with .py files and data files and .dylibs all happily mixed together). Creating a launcher is also a much underestimated challenge – making one that appears to work isn’t too hard but making one that’s truly immune to locales, properly handles OS signals and Apple events, disables all of Python’s enviroment variable controlled modes (e.g. PYTHONDEVMODE), sets up LD_LIBRARY_PATH properly on UNIX (which is harder than it sounds because processes cannot change their own LD_LIBRARY_PATH – only subprocesses) – that’s a much more significant challenge.

It’s easy to think that, now that Python packages have proper metadata, a standalone app should just be a case of compiling Python with the magic relocatable flags set, running some form of pip install --target=relocatable/python's/site-packages/directory your dependencies, then bung the whole thing in a tarball but that’s not an accurate assumption.

pf_moore · September 26, 2023, 8:37pm

Welcome! It’s really nice to hear from the PyInstaller side of the ecosystem, thanks for taking the time to comment.

Thanks for the context. I wasn’t suggesting that import chasing was worthless - far from it. But it’s not always going to give those sorts of savings, and it does get tricky (as you’ll know all too well!) when dynamic loading or plugins get involved. What I regret is that there isn’t an option to say “don’t bother doing all that, I’m fine with a bit of bloat, just bundle up click, requests and rich (and their dependencies) and let me have that”.

And yes, I’m extremely conscious that building a standalone app once you have the Python code is the real problem here. That’s why I don’t want to try to reinvent all that - there’s a lot of knowledge embedded in PyInstaller, and other “standalone app” solutions, and it would be nice if we could reuse that in a broader range of “how do we collect the raw code together” scenarios.

Hopefully, we can find some way of getting a bit of discussion going at some point. I’m keen that the proposed “packaging council” try to look more at the whole area of building standalone, distributable apps, and if that happens, it would be great to learn from each other.

In the meantime, I don’t know the details of why PyInstaller isn’t suitable for the application @jeanas is developing. Presumably he’s tried, and couldn’t get things to work. And whatever advice he’s been able to get didn’t help. I think the conclusion here is that no, there isn’t really anything that does what PyInstaller does but starting from a naive “just bundle these packages up as they stand” perspective. As you say, it’s not as easy as it might seem at first, and I guess no-one has wanted to go down that route.

bwoodsend · September 26, 2023, 9:34pm

PyInstaller’s *.spec file (its build configuration script which is written in Python syntax – no relationship to spec files used by RPM based Linux distributions) has an a = Aynalysis(...) section which is the bit that does the scanning and produces various lists of files to include. There’s nothing I’m aware of really stopping someone from deleting that bit and replacing it with some importlib.metadata logic to generate those lists instead. Then, you’d get the best of both worlds – dependencies derived from package metadata but all the other parts of packaging remain PyInstaller’s problem.

That logic could be pushed into its own function taking a list of package names and, if we (PyInstaller) decide that we don’t want to be the ones to look after it (bear in mind that we have only two maintainers), then that importlib.metadata based function could be stuck in its own package on PyPI – a PyInstaller plugin to an extent.

bwoodsend · September 26, 2023, 9:46pm

I can also tentatively offer polycotylus to the suggestion pile. It does use project metadata like you want but, owing to the fact that it produces Linux packages e.g. .rpms, it supports Linux only by definition (and not even all the popular Linux distributions). It’s main reason to exist is that you can have system dependencies as well as PyPI ones and gets you past most of the ABI compatibility woes that cross distribution Linux packaging is so full of.

bwoodsend · October 16, 2024, 12:11pm

It goes without saying that making the dependency tree more ambiguous will more or less kill off any hope of this happening…

oscarbenjamin · October 16, 2024, 2:17pm

It is a long time since I used pyinstaller but I think I can remember fiddling around to get numpy or something to work because the import scanning didn’t capture what was needed (I might also be conflating this with py2exe or something though). I’m sure things have improved a lot since then but I certainly remember thinking that it would have been easier from my perspective if I could more directly say which things should be included.

The suggestions above imply that pyinstaller (or related installer/executable creators) should resolve the dependency tree. I would actually rather go one step further and say that I would want pyinstaller (or others) to take something more like the output of pip freeze and to take all wheels from a local directory. There could of course be tooling provided by pyinstaller or otherwise to generate the pip freeze output or to obtain the wheels. I would want to start from the position though that I say exactly what packages go in and pyinstaller assembles the result without attempting any kind of automatic discovery or file filtering.

I could then use any other tool to obtain the wheels and to decide exactly which packages and versions I want. I would also want to be able to alter the list of packages somewhat manually rather than just accepting the output of a resolver so I would prefer these to be separate steps possibly using separate tools. Likewise assembling the wheels can be a separate step where wheels are built locally or downloaded from PyPI and slimming the wheels down by removing unnecessary parts can be done as a separate step if desired.

Do any of the options above provide this sort of bundling where you just give it the output of pip freeze and a directory of wheels?

It is a long time since I used pyinstaller (or anything similar) so maybe it already provides this now…