OK, I’ve played around and I think I know my way around the design space here.
There are 3 things I’d like to improve around the inittab. My draft solutions below build on each other, but that’s just for narrative effect; we can solve them individually too.
This topic addressed in #2.
Let me know if looks worth pushing through.
1. Allowing slots (PEP 793 followup)
PEP 793 added a new way to specify modules: slots rather than inittab’s initialization function.
While it’s not impossible to wrap slots in a PyModule_Def and an init function, there’s an interpreter-switching dance that CPython can avoid if we can feed it slots directly.
We can use a struct with a tagged union, roughly like this:
typedef struct PyImport_BuiltinInfo {
const char *name;
uint16_t type; // chooses the variant of the union below
union {
PyModuleDef_Slot *slots;
PyObject* (*initfunc)(void);
};
} PyImport_BuiltinInfo;
Then, we add PyConfig->extra_builtin_modules, which users can set to an array of these.
This covers use cases for PyImport_AppendInittab and PyImport_ExtendInittab, except:
- If the user needs to combine several tables, they need use their own growable-array implementation, and pass the final result to CPython.
- This does not contain CPython’s builtin modules. Unlike with
PyImport_Inittab, the defaults can’t be shadowed or removed.
AFAICS, this would be a surprisingly small change. The existing PyImport_Inittab is already copied to a private immutable space at runtime startup; at that point it can be converted to the new format and combined with extra_builtin_modules.
There’ll be a bit of challenge designing a PyInitConfig_Set* function for this, but I’d like to hold off on that until things get more concrete.
2. Custom lookup
Python uses a linear search to scan the array, which is fine if there are only a few entries. Should we optimize that for Meta/Google scale?
I don’t think so. Users are in a better place to tailor this to their needs.
Meta uses a C++ std::unordered_map. If you’re generating things statically (the intended use case), you might generate a switch-statement trie or something and let the compiler chew on it.
We don’t want to generalize that in CPython.
My proposal is to add two function pointers (and one data pointer) to PyConfig, which embedders can set to their own implementation:
// Look up an entry & copy it to caller-allocated *result.
// Return 1 on success, 0 on missing, -1 with exception set on other error.
int lookup_builtin_info(const char *name, struct PyImport_BuiltinInfo *result, void *arg);
// Implement `sys.builtin_module_names`
// (as an arbitrary iterable: deduping, sorting & converting to tuple
// is left to caller)
PyObject *get_builtin_names(void *arg);
// arbitrary data passed to the callbacks
void *builtin_callback_arg;
We also add two functions with the same signatures, which provide default/existing behaviour (i.e. handle stdlib/core builtin modules, PyImport_Inittab, and extra_builtin_modules proposed above). The embedder is expected to call those for fallback/base behaviour.
This would cover all use cases of extra_builtin_modules, making extra_builtin_modules redundant, but it’s harder to use. I’d be fine with leaving extra_builtin_modules out if it doesn’t add enough value.
Note that there is no API to add a module to the underlying collection. The embedder can provide that if they want, answering questions like:
- Should new entries override old ones, or should act as “setdefault”?
- Should new entries be
static (outlive the PyConfig), or does the implementation copy them?
- Can entries be added after the first lookup?
If we do add extra_builtin_modules, you can use it to add one-off modules that become part of the suggested fallback behaviour.
3. Do frozen modules too
Looking around, I noticed that frozen module lookup is quite similar to these.
The current code uses 4 arrays of modules (some optional, some internal), with ad-hoc logic to enable them. It already uses “look_up_frozen” & “list_frozen_module_names"lookup” functions as its abstraction, since Eric Snow’s cleanup back in 2021; the change would be moving the user-configurable part to replacing a function rather than editing an array.
And AFAIK, projects like Cython (and their users) would benefit from us making builtin modules more similar to frozen (or Python) ones.
So, the third idea is:
- allow
struct _frozen as another variant in PyImport_BuiltinInfo union
- add a
type argument to lookup_builtin_info/get_builtin_names, to allow selecting either builtin or frozen modules, and rename them appropriately. (BuiltinImporter and FrozenImporter are separate, but at the import.c level, the difference can be similar to the difference between slots-based and initfunc-based modules.)
I’m not proposing to (initially) allow specifying builtin modules using struct _frozen, or frozen ones using slots/initfunc. But it would make sense for me to combine the lookup configuration (i.e. only add two PyConfig functions, not four).