PEP 741: Python Configuration C API (second version)

eric.snow · February 26, 2024, 10:48pm

If we are lumping all config together then yes. However, as I’ve said, I think we should expose them are “state” rather than “config”.

“argv” is a concept for the main program. Having an interpreter-specific argv seems strange to me. That said, with subinterpreters you can easily modify sys.argv before running you code there. There’s no need for the config to be involved.

That seems okay for something like sys.get_config() or for non-embedders using the C-API. However, for embedders I think it’s important to keep a firmer separation between the parts of the runtime.

I think I’ve got a good sense of my own thoughts now.

First of all, it seems like the PEP is trying to accomplish two main things:

expose the initialization config to users after initialization
do so in the limited API in a way that is backward compatible to 3.13+

Is that right?

Aside from that, I have concerns about encouraging the concept of a monolithic config vs. focusing on each part of the runtime individually.

Conceptually, the runtime can be divided into 5 distinct parts and initialization/config may be split along those lines:

host process resources
global runtime state
interpreter state
thread state
granular components (e.g. multiple interpreters, threading, atexit)

(Py_Main() is deeply connected and, from a conceptual standpoint, acts as an additional part.)

Under the status quo, the structural separation between these parts is mostly okay (though the granular components are heavily mixed in with the other parts). However, config/initialization/finalization is currently all tangled together. That translates into a consistent maintenance cost.

The situation is largely a consequence of organic growth over the decades (see PEP 432) but has gotten better since around 2017. There’s still a lot to be done though.

If we look at the way things are currently, initialization looks something like this (at least when Py_Initialize() isn’t sufficient):

user populates a PyConfig
Py_InitializeFromConfig() - a tangle of the following, in no particular order:
- initialize host process resources (partially from config)
- host process resources → global runtime state
- argv → config
- env vars → config
- config → global runtime state
- create main interpreter
- host process resources → main interpreter state
- config → main interpreter state
- create main thread state
- register host process callbacks (e.g. signal handlers)
- …
user starts using the C-API
- config → behavior
- state → behavior
- env vars → behavior

My ideal equivalent would be something like this:

user initializes host process resources
user populates configs
1. set custom values on PyConfig
2. PyConfig_Update() (or something like that)
  - (some) env vars → config
  - host process resources → config
3. set custom values on PyInterpreterConfig
initialize the runtime
1. Py_InitializeFromConfig()
  - PyConfig → global runtime state
create/init the main interpreter
1. Py_NewInterpreterFromConfig()
  - create new interpreter
  - PyInterpreterConfig → interpreter state
set up the main thread
1. PyThreadState_New()
2. PyThreadState_Bind()
3. register host process callbacks (e.g. signal handlers)
enable desired components (e.g. threading)
user starts using the C-API
- state → behavior

Py_Main() would effectively be something like:

prepare host process resources
1. PyMain_InitResourcesConfig()
2. initialize host process resources
populate configs
1. PyMain_InitConfig() (or something like that)
  1. PyConfig_Update()
    - (some) env vars → config
    - host process resources → config
  2. initialize PyInterpreterConfig
  3. initialize PyMainConfig
    - (remaining) env vars → config
    - argv → config
      3-6. same as above
run code or REPL from PyMainConfig

That mental model is unrealistic if we don’t achieve and preserve a strong separation between the parts of the runtime. Hence, I have reservations about encouraging the concept of a single monolithic config.