C-API for initializing statically linked extension modules

At Meta, our Python “applications” (conceptually similar to a packaged venv) are statically linked binaries, with the main exe containing the Python interpreter, as well as native extensions statically linked into it.

To allow application code to import statically linked extensions (since there’s no .so per extension module), we install a StaticExtensionFinder, lookup the module init function address inside the main exe by module name (the mapping is “burned” into the exe during the build), and mess around with various private CPython stuff to get an initialized module.

Unsurprisingly, using “private CPython stuff” is fragile, and indeed it broke when I started integrating Python 3.12. I ended up fixing it for 3.12 by patching our CPython 3.12, adding a dedicated C-API for calling a module init function via a function pointer “with context” (used here).
The reason I couldn’t use _PyImport_SwapPackageContext directly (without patching CPython, even with Py_BUILD_CORE defined) is that it’s declared here with no PyAPI_FUNC, so the symbol is never exported.

All of this is a long-winded context for the main question(s) – is this use-case (a way to load statically linked extensions) considered “supported”, and if so, should there be public (stable or unstable) APIs to support it? Could PyImport_CallInitFuncWithContext that we added internally be that API?
Even if it’s not “fully supported and blessed”, would it make sense to slap a PyAPI_FUNC on _PyImport_SwapPackageContext to make it possible to do this without patching CPython? (like it was possible in 3.10, thanks to _Py_PackageContext being declared as PyAPI_DATA(const char *))

3 Likes

IMO, supporting this would be nice; the hard part is writing the tests.

If we expose new API for this, I’d rather avoid the context entirely.

Do you need the context at all? Per the source comment:
/* Package context is needed for single-phase init */
Indeed, it’s for calling the init function; for multi-phase init, that should only return PyModuleDef_Init(PyModuleDef), which doesn’t use the context.
Even when it’s used, its purpose is to adjust the module’s __name__.

(It’s been a long time since I looked at this, please verify what I’m saying!)

4 Likes

You’re right, it does seem related only to single-phase init (based on the commit that originally introduced it). I guess we added it because there are still a bunch of extensions that use single-phase init? (I see some internal test cases that use Pybind11 in relation to that change)

To see how much this is used, you could modify _PyImport_ResolveNameWithPackageContext to warn/error if it makes a change.

We don’t have 3.12 used widely enough to get any meaningful signal from this exercise (and that function didn’t exist in 3.10), but a different experiment (just drop the package context swapping) does cause many failures in our CI. sampling a few of them, many look like this (with modified names):

ModuleNotFoundError: No module named 'internal.cpp_extension.using_pybind11'; 'internal.cpp_extension' is not a package

and these seem to be exclusively about internal C++ Python extensions that use Pybind11.

Ah.
I guess it would be best to put PyUnstable_ on _PyImport_SwapPackageContext. It’ll need docs and a test to support it.

It would be cleaner to design a better API, but since we already have multi-phase init, the effort would be better spent porting pybind11 to that (and possibly removing any blockers on CPython’s side).

1 Like

Maybe @henryiii could comment on that?

We’ve been using PyImport_AppendInittab for our static modules. Is there a reason this wasn’t used at Meta? Is it just the wrong thing to do?

2 Likes

Disclaimer: I’m not the expert on the evolution of our internal approach.

I don’t see an obvious reason against it. It might be a bit limiting to do it before Py_Initialize() (e.g. if there are many static modules, it might add startup time to append all of their init functions, while many of them might end up never used at runtime), but I’ll probably need to try it out to be sure.

PyImport_ExtendInittab is probably better if you’re concerned about performance there. Our static build users are usually comparing against shared builds on shared filesystems that can’t keep up with oodles of MPI ranks all requesting module files at the same time; there may be easy perf wins hiding in that gap.

We originally were using PyImport_ExtendInittab, but moved to using a std::unordered_map to improve startup time and lookup speed. When using PyImport_ExtendInittab we found with perf testing that we were spending a lot of time doing string comparisons.

2 Likes

Perhaps the backing structure could be replaced with a PyDict (alike; the interpreter itself isn’t guaranteed available AFAIK) instead of (presumably) an array of tuples? That would help everyone too.

I think we’d prefer to optimise importing from inittab over adding a brand new API.

3 Likes

Sure; why would the API need to change if the backing data structure is improved?

I wasn’t disagreeing with you (I also wasn’t endorsing the use of a PyDict, though that may work out to be suitable if we can create it late enough during initialization, and can handle any name collision errors being deferred until then).

3 Likes

By the way, it would be nice to move “inittab” configuration into a “stateful” API like PyConfig, or currently discussed PEP 741: Python Configuration C API, rather than “stateless” function calls such as PyImport_ExtendInittab().

1 Like

Right, the selected data structure was chosen to easily append data, but not really to optimize read access. That’s another reason to separate the API to populate this list, and the API to access it.

We would gladly contribute some form of optimized importing from inittab.

Obviously our existing implementation isn’t relevant, since it relies on std::unordered_map. I don’t know if PyDict could be used that early in the interpreter initialization though - my guess would be “no”, but I defer to the experts!

Something we are able to do with the std::unordered_map + “static extension finder” approach is to statically generate the std::unordered_map at build-time, so there’s never a need to do thousands of calls to append to inittab at startup. By constructing the data structure at build-time, we effectively make the “append data” runtime cost zero, and there’s only lookup cost at import-time.

If we go ahead with an optimized data structure backing inittab, would there be a way to “pre-populate” it at build-time (e.g. by swapping a C file).

Edit: perhaps the “populate inittab at startup” is not that big of a deal considering PyImport_ExtendInittab exists (and can be done efficiently enough).

2 Likes

For code used early in Python initialization and late during Python finalization, the internal Py_hashtable C API can be used.

1 Like

AFAIK, it’s currently possible for the user to overwrite PyImport_Inittab – make it point somewhere else. And the user could also read from it.

The inittab is copied into _PyRuntimeState by Py_Initialize, and AFAIK that copy is inaccessible.

It looks like you could change the internal structure in _PyRuntimeState. If you do, PyImport_Inittab should, IMO, still be added to it at runtime. But I don’t see a problem with Py_Initialize including additional sources, like a baked-in map.

1 Like