How to clone and compile C++ code into binaries within the package during `pip install`?

(I have this question yesterday on Stack Overflow, How to clone and compile C++ code into binaries within the package during `pip install`? - Stack Overflow, and an answer suggested me to ask it here instead)

I need to make a Python package openface, that will act as an API of some existing C++ research code OpenFace, that I don’t own.

I have forked the existing code base into openface-backend, and rewrote the installation script install.sh to not require sudo, such that after cloning the repository and installing the system requirements, it compiles the code and download assets build/bin.
At that point, one can run ./bin/foo --bar /path/to/abc --baz /path/to/xyz and it all works.

Regarding the Python package, for a starter it will be very basic.
For the executable ./bin/foo, I will have a corresponding function foo():

import subprocess
import shlex

BINARY_PATH = "/path/to/bin/foo"

def foo(bar: str, baz: str):
    args = f"{BINARY_PATH} --bar {bar} --baz {baz}"
    try:
        process = subprocess.run(
            args=shlex.split(args),
            check=True,
            stdout=subprocess.PIPE,
            stderr=subprocess.STDOUT,
            text=True,
        )
        print(process.stdout)
    except subprocess.CalledProcessError as error:
        print(error.stdout)

I would like that when installing the package with pip, such as pip install 'git+https://github.com/GuillaumeRochette/openface.git', the openface-backend repository would be downloaded and compiled, and that the binaries are to be moved somewhere in the package.

To try and make it work, I have followed the instructions from another answer.

This is my setup.py:

from setuptools import setup
from setuptools.command.develop import develop
from setuptools.command.install import install

import subprocess
import shlex


def compile():
    args = (
        'bash -c "'
        "git clone https://github.com/GuillaumeRochette/openface-backend.git"
        " && "
        "cd openface-backend"
        " && "
        "bash install.sh"
        " && "
        "mv build/bin .."
        " && "
        "cd .."
        " && "
        "rm -rf openface-backend"
        '"'
    )
    try:
        process = subprocess.run(
            args=shlex.split(args),
            check=True,
            stdout=subprocess.PIPE,
            stderr=subprocess.STDOUT,
            text=True,
        )
        print(process.stdout)
    except subprocess.CalledProcessError as e:
        print(e.stdout)


class CustomDevelop(develop):
    def run(self):
        develop.run(self)
        compile()


class CustomInstall(install):
    def run(self):
        install.run(self)
        compile()


if __name__ == "__main__":
    setup(
        cmdclass={
            "develop": CustomDevelop,
            "install": CustomInstall,
        },
    )

While the code is getting downloaded and compiled when doing python setup.py install within that repository, this is not happening when doing pip install 'git+https://github.com/GuillaumeRochette/openface.git'.
I don’t know if it is the pyproject.toml which is missing something, or if my setup.py needs some changes or if this is something totally different.

@GuillaumeRochette I recommended you to ask here with the framing that you seek advice on what is the best way to get started on building Python bindings for an existing C++ library. I assume that what you are trying to achieve has already been done for other C++ libraries, so maybe someone here might be able to give you advice. Maybe someone can recommend you some other Python projects you could draw inspiration from.

The approach you are taking right now seems quite far away from a viable solution and from good practices of Python packaging in general.

Unless we are talking about two different OpenFace projects, OpenFace already has a Python interface in a package named openface. If you have a package that uses the OpenFace C++ API to implement some higher level functionality that what is available via the openface Python package, I would recommend to name it something else.

I agree that the way you are trying to implement this seems far from best practices.

The way I would implement this is to use meson-python with a Meson project for your package which uses OpenFace as a dependency from the system with a fallback to a Meson subproject.

I am aware that this is not a good solution, one could even call it horrible.

However, the code base from the backend is not mine, and it is not following any good coding practice either, e.g. it is constantly writing the output to disk into CSV files, rather than at the end of the program; the assets (i.e. model weights) have their locations hard coded w.r.t the locations of the binaries, etc…

I do not have the time to re-write the whole thing from scratch, I am simply trying to mitigate the damage and wrap their standalone software within a Python API in order to be useful to the research community.

Yes, I am talking about this OpenFace and not that openface, but I was not able to include more than two links in the original post because I am a new user.

Thank you for the suggestion about meson-python, I will look into this now

These are not really the topics I am the most familiar with, so I hope I will not put you on false leads.


Regarding the approach you have taken now…

As far as I know, and as I already mentioned on StackOverflow, pip install does not trigger setuptools install command, so it is not the command I would customize. I am not sure exactly which one would be best, I think I would start with the build_py command.


Now, I believe it is pointless to have the Python package ensure that the C++ library is installed. When writing the packaging for your Python project, you should assume that the C++ library is already installed.

At least in the “PyPI world”. Maybe in the “conda world”, you can set some kind of dependency relationship between the C++ library and its Python bindings.

In any case, conda or not, if it were me, I would scrape all the “clone and compile C++ code” out of this.

As you’ve said, the current solution using setup.py commands, externally-invoked bash shell commands, manually cloning the repo, etc. is all really, really horrible hack.

The problems

To start off, as @sinoroc implied on the SO question, the reason the setup.py commands don’t work is because the solution you found was unfortunately half a decade out of date and does not work with modern Python packaging. For more information on all that, see this de-facto canonical blog post on it:

Second, all that manual bash hackery is likely going to be very fragile, require the user to manually install CMake, a C compiler and various other dependencies, not work cross-platform, may not work when the package is built in an isolated environment, as is nowadays standard when you build a package with pip, build, etc. Plus, its going to mean more work for you to maintain.

Third, I don’t really understand why the build process clones the repo—isn’t it already part of whatever you would distribute that contains the indicated setup.py (source repo, sdist, etc)?

On top of that, as you mention your method of invoking OpenFace through a subprocess is, as you say, extremely rudimentary, costly and limited, and its going to be very tedious and unmaintainable for you to manually replicate the interfaces of the functions you need, and for your users to pass things as strings.

So, TL;DR, it is going to be a lot of work to do it this way, for an end result that is not likely to do much of what you’re looking for.

The solutions

What would work much better is using a tool like pybind11 to generate the C++ → Python bindings automatically, and then build it using Meson-Python or Scikit-Build; the latter might be preferable in your case given it wraps CMake, which OpenFace is already using, and is also commonly used with pybidn11.

Of course, you could set that up yourself, but fortunately others have already done so; a quick search revealed e.g. [WIP] pybind11 wrapper for LandmarkDetector by a-hurst · Pull Request #858 · TadasBaltrusaitis/OpenFace · GitHub and Allowed to publish python bindings? · Issue #898 · TadasBaltrusaitis/OpenFace · GitHub , and several others. Instead of starting from scratch, you could start with their existing mostly working solutions, and then iterate as needed on that.

Important legal considerations

IANAL, but the project you linked is proprietary, owned by a large US private university and not free or open source, and the terms of its bespoke license explicitly prohibit distribution of derivative works (without paying several tens of thousands of dollars per copy, which also indicates they have plenty of motivation, money and lawyers to pursue legal claims against you). Depending on the specific details of how you intend to package this, it is likely copyright infringement in most legal jurisdictions to publish PyPI wheels with OpenFace linked in, and possibly also to publish a wheel or sdist without it that links to it at build time (at least US caselaw is presently unsettled on whether dynamic linking (including subprocess calls) constitutes a derivative work, and thus infringement, but again IANAL).

Also, while GitHub’s license gives you explicit permission to fork any public repo, it does not give you permission to modify it and distribute the changes as you are, which as mentioned their non-free license explicitly prohibits. Therefore, you forking, modifying and making public your changes (on GitHub, or through a PyPI download of your repo) is also likely copyright infringement.

Furthermore, you could also be in trouble over trademark, as they claim the trademark “OpenFace” and explicitly prohibit use of it without permission, as you’re doing on PyPI (and possibly GitHub, depending on legal interpretation). To note, the other (more popular, and free/open source) OpenFace project appears to have publicly used it first, but both are developed by CMU researchers and owned by CMU, so that doesn’t really help you here.

Therefore, if you have not already and want to continue with this, you should either reach out to the authors for explicit written permission for what you’re doing, or contact a licensed attorney for legal advice (which this is not).

3 Likes

Looking at how Matplotlib does this for freetype and qhull maybe useful to you: