Packaging complex software in a portable way?

JanWielemaker · August 20, 2023, 9:07am

I’m rather new to Python packaging. There is an imitative in the Prolog world by XSB and SWI-Prolog to come to a de-facto standard Python interface, where Python may run embedded in Prolog or Prolog may run embedded in Python. That is progressing nicely, but installing the Python package requires Prolog installed. So, ideally we also provide a package for Prolog itself, such that a single pip install gives you a working Python package that embeds Prolog on any (major) platform, i.e., Linux, Windows and MacOS. But, Prolog systems are complex beasts to build as it has a lot of dependencies on external C libraries, requires a lot of configuration and a build process that interleaves C and Prolog build steps. For SWI-Prolog this is all orchestrated using cmake.

I have a few questions on best practices

Ideally, we probably have a way to connect to an installed Prolog system or build Prolog. How should we deal with that?
We may build from source, or we may use pip to download and install some installer. On some platforms (e.g., Linux), building from source is attractive. Notably on Windows it is not, in particular because MSVC produces way slower binaries than MinGW. MacOS is in the middle. Can we support both transparently to the user, i.e., using an installer on Windows and build on Linux?
If we build from source, how do we get the dependencies? On some Linux you can get all of them as Linux packages, but you may need to install them using apt, dnf, … On MacOS you can build them yourself or use Homebrew or Macports. On Windows some come with MinGW, etc. It seems to be an option to have these available as pip packages. I see little evidence of C libraries being available as pip packages without the intend to provide a Python interface though.

Does anyone has ideas on this? Even better if someone with a Python background is willing to help making SWI and XSB (and hopefully more) Prolog systems much more easily available to the Python community.

jeanas · August 20, 2023, 9:33am

What do you mean by “connect”?

PyPI and other packages indices accept both source distributions (an sdist is basically a .tar.gz of the source code with some metadata added), and wheels (a wheel is a precompiled package with some compatibility tags that indicate things like “works on any platform”, works on Windows, etc.). You could publish an sdist and a Windows wheel.

pip is not meant as a package manager for packages that you can use to build C extensions (e.g., the wheels you can find on PyPI often won’t have C headers). The Conda ecosystem is more like this if that is what you prefer.

For PyPI, the usual approach is to get the dependencies from a package manager when you build the wheels (a process that you, as a package author, control), and bundle them into the wheel so that end users don’t need them (using tools such as delocate, delvewheel and auditwheel, or more recently repairwheel).

JanWielemaker · August 21, 2023, 7:30pm

Thanks. I learned some things

I mean, if we want to make SWI-Prolog available as a Python module and the system already has SWI-Prolog installed we do not want to build SWI-Prolog. Instead, we want to build the Python module such that it uses the installed SWI-Prolog version. That is how the current (source only) Python package works: It finds SWI-Prolog over PATH, runs it to get the relevant configuration information and then builds the Python module such that it embeds the found SWI-Prolog instance.

Thanks. That answers some questions. The Conda system seems more suitable, but I have the impression that the pip system is more widespread and the primary aim is to make it totally trivial to get SWI-Prolog embedded into Python on at least Linux, Windows and MacOS. Probably the Linux setup will also deal with *BSD and many other POSIX like systems.

I’ll go back to the drawing board with this info.

JamesParrott · August 21, 2023, 8:39pm

Hi Jan,

It’s really awesome you’re still working on SWI-Prolog after 36 years. Nice one!

This is beyond my knowledge, but you’ve already found the documentation for writing C extensions for Python, and embedding it haven’t you? Extending and Embedding the Python Interpreter — Python 3.11.4 documentation It’s Python FFI and ABI territory.

Best regards,

James.

gwerbin · August 21, 2023, 10:00pm

I think you’re running into a problem that is shared by a lot of programming language tooling, not just Python: how to include dependencies that are often considered “system level”?

As far as I know in Python, there are two approaches:

Tell users that it’s required in documentation, and assume it’s present at run time (maybe including a helpful error message if it’s not).
Bundle a shared library in with the Python files in the distribution.

My experience with this sort of thing is mostly as a user and not a developer, so others can speak more to the best ways of doing those things, and when you would choose one or the other.

I for one would be very interested to see some good industrial-strength Prolog bindings in Python!

JanWielemaker · August 22, 2023, 10:16am

Yes. See GitHub - SWI-Prolog/packages-swipy: Python interface for SWI-Prolog, which is bundled with SWI-Prolog as a git submodule. It builds the interface that embeds Python into Prolog if it can find the Python library. It can be used as a stand-alone module that acts as a Python package.

The code is progressing fine. Packaging is the issue …

Thanks. I get that now. I was hoping pip could do more platform independent dependency management

gwerbin · August 22, 2023, 12:34pm

To be fair, bundling is a form of dependency management!

But realistically, this is a problem shared by pretty much every programming language. I think it’s a testament to the flexibility and usefulness of Python that people have such high expectations of its packaging and package distribution system. But ultimately those expectations need to meet with the reality that this is an unsolved problem everywhere, not just in Python. You will see the same in NodeJS, Ruby, Perl, etc.

JanWielemaker · August 24, 2023, 11:35am

… and for (SWI-)Prolog One day hope there will be a meta package manager as in the end you want e.g. unixODBC libraries and headers. Most systems have these, only you need to find the name of the package and the command to run the package manager … Conan is such a beast, but it replaces the system package manager, forcing you to rebuild everything. That is (IMO) an overkill and it seems to lack a good vision on keeping different packages compatible.

Anyway, to get a simple pip install swi-prolog, is it an option/feasible to make this use an installer such as the Windows .exe installer or MacOS .dmg installer? And,
for popular Linux distros install the dependencies using apt/dnf/… and build from source? I know it can be hacked in setup.py. What I do not know is how it would fit in how things are expected to work in the Python world?

And, while I know it can be hacked into setup.py, if it makes sense at all I’d be interested in best practices. Any good examples?

gwerbin · August 24, 2023, 2:35pm

One suggestion here would be to publish two packages: one that is just a bindings layer, and another that actually contains a bundled SWI itself. The former can detect if the latter is available either in the current environment or system-wide, and throw a somewhat helpful error if neither is found. So users can pip install swi-prolog if they already have SWI on their system, or pip install swi-prolog[bundled] if they need the bundled version, which would pull in a separate package swi-prolog-bundled or similar. For some prior art, see how the Psycopg project does it and how they handle the LibPQ dependency: Installation - psycopg 3.2.0.dev1 documentation

Scientific libraries are a good place to look for inspiration because they so often depend on external libraries. Other examples that I’ve used personally include PyArrow (Apache Arrow, Shapely (GEOS), Fiona (GDAL) and PyProj (Proj).

As for wishful thinking, you might be interested in Conda and Spack, which are oriented towards science and research, but do more or less strive to be a “portable” package manager, allowing you to install almost an entire OS (as far down as a C compiler) in isolated per-project environments. Also the Pkgsrc package manager, itself being only “system-level” is reasonably portable, and has a binary distribution funded by a third party. Finally, there’s NixOS which is portable-ish and supports isolated dev environments. And Guix but that’s Linux-only.

JanWielemaker · August 25, 2023, 6:26am

Thanks @gwerbin. Especially for the pointers.

As for the wishful thinking, all these seem to add a new independent package manager next to the OS one. This while the OS ones generally have large teams verifying the package dependencies. They do a great job and especially for Linux you have a wide choice between conservative, aggressive, versioned or rolling
releases.

I would like to see one that sits on top of the OS one as the OS one is typically doing a good job for 99% of the dependencies you need. It is just a nuisance to document or script this for all possible package managers, as the names and granularity of the packages vary while they are built from the same source. Probably you could assemble all that if, for each package in each package manager, you’d have the package content (file list) as well as the source and version.

gwerbin · August 25, 2023, 3:25pm

At least in the case of Conda, the packages are binary and you don’t need to install a completely parallel system if you don’t want or need that much isolation, in fact most environments don’t do that. The binary package format and high level of flexibility is why it got so popular for data science and machine learning.

Some other prior art here is the Mac package manager Homebrew, which specifically strives to use as much as possible that is provided by MacOS. That’s in contrast to MacPorts which is much more eager to pull in dependencies that are already available in MacOS (with XCode at least) like Ruby and Git. There is also a Linux port which I have used successfully a couple of times, but you need to be careful about setting PATH so as not to break your system by putting Brew-installed programs ahead of /usr/bin.