Clarification on a wheel's header data

Hello, I was reviewing the details of installing a wheel’s data files PEP 427 -- The Wheel Binary Package Format 1.0 | Python.org and the destination of header files feels ambiguous. Although none of the designated data directories (purelib, platlib, headers, scripts, data) have explicit targets mentioned and the PEP only says “… the contents of these subdirectories are moved onto their destination paths.” the others all follow from Python’s sysconfig. Although sysconfig does have a notion of an include path and this could be inferred as a possible target for headers, sysconfig also has platinclude. Which one should be used if they are different? And should distributions be allowed to provide both platform dependent and independent headers?
It seems that distutils, setuptools, and pip each craft their own path for headers, even if otherwise pulling from sysconfig, which ends up being the same as the include path.

Some possible thoughts for how to move forward:

  • clarify in PEP-427 that headers are assumed to be platform (in)dependent and should be co-located with other headers of the same type. It does not need to become explicit about that destination
  • another way to reduce ambiguity would be to rename the headers directory to include which would call attention to which type of header files are expected, based on the definition in sysconfig. Alternatively a headers key could be added to sysconfig that is an alias for one of (plat)include
    • this is a rather heavy handed way to make this distinction, and will take possibly years to disseminate across the ecosystem. OTOH this seems like the only place that python differs between an include path and a headers path. And with distutils finally on its way out it makes even less sense why the difference exists
  • do wheel files need support for both include and platinclude style headers?
    • probably not based on the 1.0 spec coming up on 10 years and it not being asked for. However as soon as we say they must be treated as either one or the other users may start coming forward asking for the opposite. It also completes the relationship with sysconfig paths (excluding core locations).

The C language .h files are known as headers.

The available distutils paths or categories at the time wheel was made are the ones wheel should use. I understand sysconfig added more lately. But if you tried adding them to a wheel some installers would raise KeyError

The spec makes no claim about the headers or any other installable path, it only gives examples, and I believe this is deliberate. The wheel spec simply lets you tell the installer that you want to put some files in a certain installable path. The installer may or may not support that installable path, and this depends on the mechanism it uses to install the files. Previously, that mechanism used to be distutils, but that has now changed to sysconfig. There are some discrepancies between the available keys in distutils and sysconfig (well, maybe actually not as of https://github.com/python/cpython/pull/24549), but as distutils is deprecated, the distutils-only paths, which is the case of headers, should also be considered deprecated.
In hindsight, maybe this could have been handled a bit better in the spec.

The spec does not define any install scheme path or path semantics.

As headers is a install scheme path only present on distutils, and not sysconfig, this is what you should be doing already. I think some wheel installers are implying that the headers path is include, but the build backend should be naming that directory include, or perhaps platinclude if that is the case, in the first place.
My interpretation is that headers is legacy and effectively deprecated with the deprecation of distutils.

They don’t need to. It depends on the install scheme of the target interpreter. A custom vendor python may add new paths to the install schemes, and wheels may install to those paths.
In practice, we are talking about normal Python distribution, like CPython, PyPy, etc., which uses include and platinclude, so yes.


TLDR: The wheel spec simply provides a mechanism to install files to the interpreter install scheme paths. It does not define which paths are supported or semantics.

This is a bit of a mess, but I hope my reply has helped clarify things :sweat_smile:

According to the comment in sysconfig:

        # On POSIX-y platofrms, Python will:
        # - Build from .h files in 'headers' (which is only added to the
        #   scheme when building CPython)
        # - Install .h files to 'include'

Based on that, and the fact that it only seems to be set for in-tree builds, I think there’s some other misinterpretation going on here.

To be clear, this comment is referring to CPython’s own installation, and nothing to do with wheels or packages. I don’t see anything in sysconfig or distutils.sysconfig that looks like a good option for installing header files (one example: none of the paths will install within a venv on Windows), so it’s more likely that this particular case needs a definition (and also a motivating user scenario).

1 Like

sysconfig’s includes are always on the sys.base_prefix on all platforms, for all schemes. The value does not change for virtual environments and aliasing headers to include would not accomplish anything if you were looking for a place to put your headers. pip throws them in a folder with the distribution’s name under include if you’re brave foolish enough to install Python packages globally and in a non-standard include/site folder in virtual environments.

I think you may be misunderstanding my position in this process: I am trying to create a wheel installer and want it to be able to install any .data/ child directories. I assume, based on the PEP that this may include a directory called headers that must go somewhere. I am not in a position to rename these directories.

Are you saying that headers is no longer a valid wheel data directory? In that case I think the PEP very much needs an update.

Further, are you saying that include and platinclude are already valid data directories? Again, I would say this requires an update to the PEP. In general I thought that including any directories not explicitly named in the package format would make the file not a wheel. Are stdlib and platstdlib valid targets for wheel data?

It’s definitely starting to!

I do not have the required background to understand why headers was an install scheme path in distutils, but it was.

I have ran into this before, and I considered proposing the addition of install scheme paths for user package includes, ones that would depend on base/platbase and not installed_base/installed_platbase, but did not get around to it.
Currently, the way you put headers on the system is via include and/or platinclude, which, as you noted, is not very suitable.

Well, if the path is not supported in sysconfig.get_path, then you should either raise an error, or have a special handling for headers. The special handling could be falling back to distutils, which would be something like the following:

import distutils.dist

distribution = distutils.dist.Distribution({
    'name': 'package-name',
})
install_cmd = distribution.get_command_obj('install')
install_cmd.finalize_options()
headers_path = install_cmd.install_headers

It’s legacy, not supported by sysconfig.

Yes. Anything in sysconfig.get_paths() is a valid directory, though there are some paths that you probably shouldn’t be installing to, even though it is supported.

In practice you can only use these five keys SCHEME_KEYS = ['platlib', 'purelib', 'headers', 'scripts', 'data'] and it would take additional work to use the new sysconfig keys (but how useful are they? …)

The opposite, it would take additional work to keep using the distutils keys as it is deprecated and will be removed in 3.12. The fact that pip is still using distutils is irrelevant, new implementations should be using sysconfig, and pip will move to sysconfig soon anyway.

This is not a question of distutils vs sysconfig functionality, distutils is deprecated and is essentially just technical debt at this point.

With that said, I will open a bug on bpo to discuss adding a separate path to sysconfig for site package headers. Using include/platinclude is bad but the only possible approach right now. I am not sure if this would be able to get into 3.10, but we’ll see.

It’s a great question whether that path works, if it’s something that made more sense before venv or of there’s a better place now.

I added that comment fairly recently to explain some of the magic we needed to add when making distutils.sysconfig use sysconfig. There’s no design except it should work like it did before 3.10.

As I understand it:

  • include is the (public) name used in sysconfig
  • headers is the term used in wheels, distutils and setuptools.
  • these mean the same thing, unless you’re building CPython itself from source. (Then, one is in the source and one is the installed location.)
1 Like

I have opened a bug.

https://bugs.python.org/issue44445

1 Like

Cross-posted from the bug, since I’m kind of redirecting the discussion back to packaging tools rather than core CPython:

distutils.sysconfig doesn’t expose the headers path either, it’s only there as a default value for the install command (Lib/distutils/command/install.py).

So it doesn’t seem unreasonable to provide a recommendation on where to put shared header files and let installers do their own calculation. If an installer wants to install into another environment, it can’t rely on sysconfig anyway, so we need the spec as well as any implementation. And if we don’t have the spec then people will have to reverse-engineer the implementation anyway, so we may as well start with the spec.

Now, whether we actually need or want packages dumping all their headers in one directory is a different question. At least on some platforms they’ll also need to import libraries too, and tools like Cython have different files yet again. Many existing project keep these files inside their package and offer the path on request (e.g. pybind11), so perhaps we actually want to standardise pkg-config-like metadata instead?

Yes, I was going to say this bug is a good step, but I still feel a change to the PEP is needed. Even if site-include and site-platinclude currently existed my original question still stands: which should header (from wheel files, not CPython) be expected to go?

And is the list of possible data directories not closed? There was discussion elsewhere that although the PEP did not preclude multiple .data directories it was in the spirit of the PEP, and should be invalid. I was taking that same spirit to subdirectories of .data. Since the PEP bothered to list out possible subdirectories I took it as canonical and complete. But comments in this list seem to imply that the actual list is whatever sysconfig.get_path_names() outputs (for whatever python implementation is used).

The spirit of the wheel PEP is that .data/* should be eventually allowed but no one has implemented wheel with that in mind. It would be nice to figure out what installers should do with new paths they don’t know about (leave that part of the .data/ directory in site-packages and warn) but since no one has implemented wheel in that way, it would require further effort no matter what the PEP says.

The initially supported paths are taken from distutils.command.install.

How is this different from any of the other keys? I don’t think such spec exist. AFAIK, the installer only has two options, either it looks into distutils/sysconfig of the target, or it hardcodes the paths given that it knows the implementation it is targeting.

IMO, this should be no different than site-packages.

Will they? Can you give some examples? Currently, there is no mechanism for this.
Some packages need libraries at runtime, but they usually do dlopen or a similar, or just require you to have the native dependencies present.

This is drifting a bit off-topic, but I was thinking we could have a separate directory for libraries, which would be present on the library lookup (on Linux this could be done by setting LD_LIBRARY_PATH). That would allow us to provide wheels for low-level dependencies, like libusb for eg., which Python modules could then use. It also plays well with distros, as we would simply skip that dependency as we already provide those libraries in /usr/lib.

That makes sense if we go into the “every package provides isolated data” direction, but I am not sure it if it is worth. I am struggling to come up with pros for that, so please let me know if you have relevant use-cases. If we go with the shared include and libs directories, we don’t need to add such complexity :stuck_out_tongue:

The Windows compiler toolset uses import libraries (in the form of .lib files) to provide the code for referencing dynamically loaded libraries (.dll files). You don’t reference the DLL during build, you reference the import library. Distributing the actual DLL is an exercise left to the reader (the import library does not need to be distributed with the build output that used it).

You can think of it as a statically linked library that does the dynamic resolution into a dynamically loaded library. Every single static reference to a DLL on Windows uses it, so I guess I’ll throw numpy under the bus as usual and say them. :slight_smile:

And yes, distribution of dynamically loaded modules other than Python extension modules is off-topic right now. However, if a standard for installing development references into Python environments is being designed, it has to take into account files that are required at build but are not C includes.

It’s already possible today, it’s fully backwards compatible, and packages are already using it. That means we can write a doc recommending it and we’re done.

Implementing a new approach that is not backwards compatible and requires existing tools to adapt is more work, and has to justify its worth.

(I know that you know that the shared site-packages directory is the source of many many issues, so I won’t get snarky at you on that one. All of those issues would arise again if we added new shared directories.)

The relevant use cases are any Python packages that have to include header files for other extension modules to use during their build. (Yes, that’s the same use case as this thread is about, which is why I brought it up here.)

So, to be extra clear: include and platinclude contain CPython’s headers, but not (necessarily) headers of third-party packages. (Put another way, these are the equivalent of stdlib/platstdlib and we don’t have an equivalent of purelib/platlib – at least until bpo-44445 is fixed.)

Is that correct? If so, I’ll go clarify the sysconfig documentation.

Correct.

At least on Windows, the norm would be to put third-party package headers in their own directories and reference more include paths when compiling. (The line gets blurred when there’s a distributor in the mix, because they might bundle more headers in there, but third-party extensions would not.)

So I’d drop the “necessarily” and just say that it’s where the CPython headers are. But if it’s also the normal place on other platforms for third-parties to put headers, then perhaps there should just be a platform-specific clarification in the docs.