How do we get out of the business of driving C compilers?

I had a long response to this but I think you already cover most of it. =)

One thing I see maybe conflated here is building extensions can be its own topic that should be solvable on its own regardless of the larger packaging system questions.

How we compile and link something should normally be independent from how we package up the binary outputs of that.

I’m all in favor of setuptools or whatnot getting out of the business of invoking compilers. It should just generate an input to the relevant platform specific native build system. The only real thing of value it can communicate to that is where the Python headers and static or dynamic runtime library live.

3 Likes

+1 to extension could be its own topic. Stealing from Cargo’s approach, the work can be reduced to allow user to declare “hey I want something non-Python, please run this hook during build, and I’ll do what I need to do and tell you how to handle the results”. The Python packaging tool don’t need to handle them itself.

1 Like

If by build tools you mean things such as CMake or Meson, then I’d argue you should punt as much as possible to them, including Python-specific tasks such as compiling bytecode etc.

So what you are left with is an elaborate frontend to orchestrate the generation of CMake or Meson configuration to build the package at hand.

And perhaps you don’t even need the frontend and this can all be done as Meson plugins written in Python (does Meson have plugins?).

(this is only for the build side, though)

2 Likes

Yes I do.

Yep, that’s what I was thinking.

OK, so what are we after here?:

  • Promote backends like scikit-build, mesonpep517, enscons, etc. and such over setuptools for people to use to try and simplify the story for today?
  • Tzu-Ping’s idea of figuring out an API that would be necessary to execute the C build tool and then get what was needed out of it so that’s as abstracted out as necessary so as to lean on C build tools as much as possible (basically “run this, and then copy these files to here and give them these names”)?

Basically are we looking for pyproject.toml API such that Flit could call out to one of these C build tools front-ends and still package up the resulting wheel without having to care about C code in the first place? Or is there a different abstraction we may care about? I’m not sure how baseline we want to go with this in terms of there’s a common solution for all Python code or still allowing tools to do their own thing so Flit and other tools can be as opinionated as they want.

Aside: I found this issue and it seems that Flit was entertaining a similar idea inspired by Cargo. Maybe @takluyver has thoughts on this?


The API should likely be specified parallel to PEP 517, so a PEP 517 backend can interface to it (to build wheels), but a frontend can also access needed metadata (for e.g. editable/develop install) without proxying them in PEP 517.

Continuing the Cargo analogy, here’s how it handles the build script:

  • Locate the build script (defaults to build.rs and can be configured through. This can be left to implementations (using the tool field in pyproject.toml) for now.
  • Compile it with build dependencies.
    • This is sort of analogous to the build environment set up by PEP 517 frontend. We can use PEP 518’s build-requires to declare them. Backend doesn’t need to worry about this, and can invoke the build script directly.
  • Run to build.
    • Cargo does not make any efforts ensuring whether the build tool (e.g. CMake) actually exists. I guess that’s the most reasonable thing to do, like Setuptools also makes no effort ensuring compilers exist.
    • It might be worthwhile to emit better error messages, like how Setuptools improved from the cryptic Unable to find vcvarsall.bat. That’d mostly involve creating wrapper Python modules to call those tools (instead of straight subprocess). Still this is not necessary for now, either.
  • Interact with the build script.
    • This can be done in either like Cargo or PEP 517.
    • Cargo way: One entry point with results categorised via specially prefixed stdout
    • PEP 517-like: Specially-named functions for each information kind.

A PEP 517-like interface might look like this (illustrative only, likely missing pieces):

  • get_paths_triggering_build(config_settings=None)

    Return a list of strings specifying items on the filesystem, relative to the project root. Frontend is expected to call build_for_dev if any of them is modified later than the last build.

  • build_ext(build_directory, config_settings=None)

    The hook is expected to write files into build_directory, and return a list of 2-tuples. Each 2-tuple represents a file to include in the package:

    • The first item is the path to include this file in the wheel (same as the first column in RECORD).
    • The second item indicates where to find that file, relative to build_directory.

    The frontend is expected to pass a consistent value of build_directory across each call to this function. The hook should expect the directory already containing previously-built files, and may choose to reuse them if it determines they do not need to be rebuilt.

1 Like

Hmm… To be clear, you’re not suggesting that we add more functions to PEP 517 API, right?

IMO we are where we need to be w.r.t. letting tools do all the stuff you’re describing so I’m a bit weary of adding things to PEP 517.

1 Like

Definitely not, I proposed in the first paragraph that this be specified in parallel to PEP 517 exactly so we don’t need to touch it :slightly_smiling_face:


Edit: I realised “specify” is probably a bad choice of verb, since we need not to have a specification. Maybe I should use “design”.

Ah, right. I’ve been skimming to catch up on all the discussions here right now so the clarification perhaps didn’t read as it’s been written. :upside_down_face:

But to be clear the sort of “PEP 517 like interface” that you talk of would be called in the same way as PEP 517 functions? That is, in a subprocess run in an isolated environment? I’m wary of adding more interfaces of this type, but unlike @pradyunsg my concern isn’t about the interface bloat so much as the increased number of subprocess calls. Subprocesses are costly (very much so on Windows, less so but I believe still non-trivial on Linux/Unix) and I’ve had experience of tools (non-Python) that use them without due thought, and they are painful to use. For a Python example, look at the runtimes of the pip test suite :frowning:

I am honestly not sure. I guess a build backend can invoke them in-process (since it’s already running inside a build environment managed by the frontend), but then it needs to deal with the possibility of a rogue build script modifying the backend’s behaviour when it shouldn’t. Maybe we can let backends decide for themselves.


Edit: And to be exactly clear, I only meant the functions are designed like PEP 517 (have specifically designed signatures for each kind of request, instead of always spitting out everything and let the caller choose what it wants).

1 Like

See the pyproject.toml I created for my mesonpep517 example. mesonpep517examples/pyproject.toml at master · FRidh/mesonpep517examples · GitHub
It looks a lot like flit, not?

I think what we have now with PEP 517 seems good enough. In time we may choose to promote some metadata from tool:<build-system> to top-level, but that’s irrelevant for this discussion.

Maybe we need to build up more experience using these build-systems first.

Another aside: the only thing that has stopped me from making a proper MSVC wrapper library is the “requirement” that I also wrap everything else and support building a wheel (look up my pyfindvs project to see the closest I’d get).

If the API was defined to do only the build, and ideally in a way that makes simple builds simple and complex builds use external (cmake, scons, MSBuild, etc.) configuration, I’ll happily invest in the Windows support.

1 Like

It is, and that’s part of the problem. How many tools really need to come up with a way to specify the version number of a project? How many times do we need to implement the code to pull in files appropriately for constructing a wheel? For me, the building of extension modules is a separate thing from packaging up files into a wheel (hence the whole purpose of this topic :smile:). IOW I want a way to have a wheel-building front-end and a compiling back-end (e.g. Flit as a front-end for how to I specify metadata, meson as a back-end for any compiling I need to do).

So is the strawman proposal in How do we get out of the business of driving C compilers? - #6 by uranusjr what you’re after?

So I’m starting to view this as a wheel-building front-end (which we have an API for in PEP 517) with an optional compiling back-end (whose API doesn’t exist but @uranusjr gave us a strawman for). That way we start to minimize repetition in metadata specifying, wheel-building, etc. and have a very clear separation of concerns for how to prep the files for a wheel and producing the wheel.

IOW an example would be the call chain of pip → flit → meson and all we are talking about is an API to make it standardized on how flit can call meson so that people can more easily switch in the compiling back-end they want instead of having the metadata/wheel-building aspect also be tied into the compiling part. Pip wouldn’t even need to care about this new API unless there was a reason we found for it to care (e.g. maybe editable installs would play into it?). So to pip it’s just calling flit, it just so happens that flit is calling meson to compile extension modules to end up in the final wheel.

This would also potentially let people keep configuration for setuptools while letting setuptools slowly get out of the compiler game by guiding people to use a different compilation back-end.

So the thing that comes to mind for me on this is ee are talking about compiling C code, so the cost of spawning a process is going to be greatly overwhelmed by the cost of the compiler. But otherwise we could say it’s up to the tools choose whether it must be done in a separate process or it should be done that way. If I remember correctly PEP 517 has that subprocess specification because we didn’t trust tools to not mess up. But if we are willing to be a bit looser on that requirement in this instance then people can just do what they think is best.

I will also say I hope people are not invoking their compilers that frequently in a row with no substantial change to their code as the pip test suite. :wink:

1 Like

This is something that is typically handled by the build-system. E.g., with Meson you need to explicitly list your sources, globs are not allowed. That way they can use timestamps.

The build system will want to be told where the source is, a directory where it can build files in, and a directory where files are installed to. The last folder will then be used by say the new flit to build the wheel.

Then of course there are the non-Python dependencies such as compilers and libs. As @uranusjr mentioned we should not bother with those and let the build system declare what they need and error as they usually do when they are not present.

For the core “compile stuff” API, yes. But helper APIs like @uranusjr’s get_paths_triggering_build really shouldn’t be run in a subprocess. Maybe it’s just calling them “PEP 517 style” hooks that gave me that impression - the fact that the hooks should run in an isolated subprocess (so that they can freely change global state) is important for PEP 517, because we were trying to protect against legacy backends making assumptions, but probably not so important for completely new APIs like this one.

So as long as the new spec is explicit about the fact that the hooks can be called in-process, and doesn’t over-emphasise the superficial similarity with PEP 517, it’s probably just a documentation matter.

Not quite. Really, I want to have all my own configuration in some file (presumably pyproject.toml) and just be triggered like a CMake invocation:

  • source root directory (contains pyproject.toml)
  • build root directory (temporary, might have been used before)
  • output root directory (looks like the installed layout)
  • rebuild=True/False flag
  • other named arguments that may/may not be respected (maybe “required_extra” and “optional_extra” dicts so tools can fail on the former?)

Then my own configuration will be read from pyproject.toml tables (one per target, all with my tool name included in the name), and since my tool would be Windows only, you’d have to provide separate configuration from other platforms (I refuse to be limited to the capabilities of GNU/make, so feel free to define an input spec here and I’ll work around it :wink: ).

But an API that specifies the directories and whether or not an incremental build is permitted is all the triggering I need.

Though perhaps we should be allowing a series of tools here so that Cython can just be its own step and the compile backend only knows about the .c file? And if this is an arbitrary sequence, it’s probably better off being handled by the PEP 517 backend as its configuration, rather than trying to enable it for everyone from the front end.

The idea behind this function is to let the caller (e.g. the new Flit) to decide whether to invoke the build system at all. How the build system handle the call is defined in its own configuration (e.g. Mesonfile), and independent to this function. I guess this is not strictly needed, but IMO it can be useful for more ad-hoc, premitive “build systems” that are basically a bunch of subprocess calls to the compiler.

From what I can tell the strawman proposal kind of meets the need though. Source root directory is CWD (as in PEP 517), and build root directory is given as the first parameter. Instead of output directory, output artifacts are listed in the return value, and the caller would copy them to the correct location (for wheel building) as appropriate. Is is possible to always permit incremental builds (why not)?

I want to push back on this a little. I’ve packaged a couple of extension modules that wrap third-party C and C++ libraries. One of the most valuable things setuptools does for me in that context is tell me the name of the compiler that should be used, and the right set of command-line switches to feed that compiler, when compiling the third-party library, so that it will actually work when linked with the extension module.

Now, of course I do use the third-party library’s own build tooling to build it, I don’t need setuptools to run the C compiler for me, but “the right compiler and command line switches” is quite a bit more complicated than just “where the Python headers and the runtime library live”, and may require preserving much of distutils.ccompiler.CCompiler and its subclasses.

2 Likes