PEP 741: Python Configuration C API (second version)

If we are lumping all config together then yes. However, as I’ve said, I think we should expose them are “state” rather than “config”.

“argv” is a concept for the main program. Having an interpreter-specific argv seems strange to me. That said, with subinterpreters you can easily modify sys.argv before running you code there. There’s no need for the config to be involved.

That seems okay for something like sys.get_config() or for non-embedders using the C-API. However, for embedders I think it’s important to keep a firmer separation between the parts of the runtime.

I think I’ve got a good sense of my own thoughts now.

First of all, it seems like the PEP is trying to accomplish two main things:

  • expose the initialization config to users after initialization
  • do so in the limited API in a way that is backward compatible to 3.13+

Is that right?


Aside from that, I have concerns about encouraging the concept of a monolithic config vs. focusing on each part of the runtime individually.

Conceptually, the runtime can be divided into 5 distinct parts and initialization/config may be split along those lines:

  • host process resources
  • global runtime state
  • interpreter state
  • thread state
  • granular components (e.g. multiple interpreters, threading, atexit)

(Py_Main() is deeply connected and, from a conceptual standpoint, acts as an additional part.)

Under the status quo, the structural separation between these parts is mostly okay (though the granular components are heavily mixed in with the other parts). However, config/initialization/finalization is currently all tangled together. That translates into a consistent maintenance cost.

The situation is largely a consequence of organic growth over the decades (see PEP 432) but has gotten better since around 2017. There’s still a lot to be done though.

If we look at the way things are currently, initialization looks something like this (at least when Py_Initialize() isn’t sufficient):

  1. user populates a PyConfig
  2. Py_InitializeFromConfig() - a tangle of the following, in no particular order:
    • initialize host process resources (partially from config)
    • host process resources → global runtime state
    • argv → config
    • env vars → config
    • config → global runtime state
    • create main interpreter
    • host process resources → main interpreter state
    • config → main interpreter state
    • create main thread state
    • register host process callbacks (e.g. signal handlers)
  3. user starts using the C-API
    • config → behavior
    • state → behavior
    • env vars → behavior

My ideal equivalent would be something like this:

  1. user initializes host process resources
  2. user populates configs
    1. set custom values on PyConfig
    2. PyConfig_Update() (or something like that)
      • (some) env vars → config
      • host process resources → config
    3. set custom values on PyInterpreterConfig
  3. initialize the runtime
    1. Py_InitializeFromConfig()
      • PyConfig → global runtime state
  4. create/init the main interpreter
    1. Py_NewInterpreterFromConfig()
      • create new interpreter
      • PyInterpreterConfig → interpreter state
  5. set up the main thread
    1. PyThreadState_New()
    2. PyThreadState_Bind()
    3. register host process callbacks (e.g. signal handlers)
  6. enable desired components (e.g. threading)
  7. user starts using the C-API
    • state → behavior

Py_Main() would effectively be something like:

  1. prepare host process resources
    1. PyMain_InitResourcesConfig()
    2. initialize host process resources
  2. populate configs
    1. PyMain_InitConfig() (or something like that)
      1. PyConfig_Update()
        • (some) env vars → config
        • host process resources → config
      2. initialize PyInterpreterConfig
      3. initialize PyMainConfig
        • (remaining) env vars → config
        • argv → config
          3-6. same as above
  3. run code or REPL from PyMainConfig

That mental model is unrealistic if we don’t achieve and preserve a strong separation between the parts of the runtime. Hence, I have reservations about encouraging the concept of a single monolithic config.

3 Likes