Identifying & parsing binary extension filenames

For a couple projects I’m working on, I need to be able to identify binary extension files and extract the names of the modules they provide. importlib.machinery.EXTENSION_SUFFIXES only gives me what I need for the current Python and the machine it’s running on, whereas I need to work with extension files for any Python. There doesn’t seem to be a library for this already, so I’ve had to investigate this on my own, and I’ve come to you to double-check my findings.

What I’ve determined so far:

  • Linux/manylinux binary extension modules have names of the form {module}.{implementation}-{abi}-{arch} (Can the gnu part vary, or is that attached to linux?). Example:

  • macOS binary extension modules have names of the form {module}.{implementation}-{abi}

  • Windows binary extension modules have names of the form {module}.{impl_abbrev}{abi}-{win}.pyd where win is win_amd64 or win32 (or other values?). Example: foo.cp38-win_amd64.pyd

  • Certain Linux & macOS extension modules (ones built for Python 2?) have names simply of the form {module}.so.

Is there anything important I’ve missed? Would r'(?:\.[-A-Za-z0-9_]+\.(?:pyd|so)|\.so)\Z' be an appropriate regex for matching any & all binary extension module file extensions?

Further research has turned up module names of the form (for both macOS and Linux) and foo.pyd. Is there any place that all of this is documented?

I think these are all implementation defined and we never actually had a spec.

@pitrou, did we have a spec last time we played with these? Or just a debate somewhere on the issue tracker?

There’s PEP 3149 for the POSIX case, though the convention outlined there doesn’t match reality: it lacks the {platform} part, e.g. x86_64-linux-gnu. The Windows case does not have a PEP or mailing-list discussion AFAIK.

The current state was done in (changesets 03a144bb6ac3d7631a3bdb895e2a1f2d021fb08b, d3899c1a962f4f06f52199d1e5e4b921843e587b and 3b8124884c3655b4cf2629d741b18c1a38181805).