If we are lumping all config together then yes. However, as I’ve said, I think we should expose them are “state” rather than “config”.
“argv” is a concept for the main program. Having an interpreter-specific argv seems strange to me. That said, with subinterpreters you can easily modify sys.argv
before running you code there. There’s no need for the config to be involved.
That seems okay for something like sys.get_config()
or for non-embedders using the C-API. However, for embedders I think it’s important to keep a firmer separation between the parts of the runtime.
I think I’ve got a good sense of my own thoughts now.
First of all, it seems like the PEP is trying to accomplish two main things:
- expose the initialization config to users after initialization
- do so in the limited API in a way that is backward compatible to 3.13+
Is that right?
Aside from that, I have concerns about encouraging the concept of a monolithic config vs. focusing on each part of the runtime individually.
Conceptually, the runtime can be divided into 5 distinct parts and initialization/config may be split along those lines:
- host process resources
- global runtime state
- interpreter state
- thread state
- granular components (e.g. multiple interpreters, threading, atexit)
(Py_Main()
is deeply connected and, from a conceptual standpoint, acts as an additional part.)
Under the status quo, the structural separation between these parts is mostly okay (though the granular components are heavily mixed in with the other parts). However, config/initialization/finalization is currently all tangled together. That translates into a consistent maintenance cost.
The situation is largely a consequence of organic growth over the decades (see PEP 432) but has gotten better since around 2017. There’s still a lot to be done though.
If we look at the way things are currently, initialization looks something like this (at least when Py_Initialize()
isn’t sufficient):
- user populates a
PyConfig
Py_InitializeFromConfig()
- a tangle of the following, in no particular order:- initialize host process resources (partially from config)
- host process resources → global runtime state
- argv → config
- env vars → config
- config → global runtime state
- create main interpreter
- host process resources → main interpreter state
- config → main interpreter state
- create main thread state
- register host process callbacks (e.g. signal handlers)
- …
- user starts using the C-API
- config → behavior
- state → behavior
- env vars → behavior
My ideal equivalent would be something like this:
- user initializes host process resources
- user populates configs
- set custom values on
PyConfig
PyConfig_Update()
(or something like that)- (some) env vars → config
- host process resources → config
- set custom values on
PyInterpreterConfig
- set custom values on
- initialize the runtime
Py_InitializeFromConfig()
PyConfig
→ global runtime state
- create/init the main interpreter
Py_NewInterpreterFromConfig()
- create new interpreter
PyInterpreterConfig
→ interpreter state
- set up the main thread
PyThreadState_New()
PyThreadState_Bind()
- register host process callbacks (e.g. signal handlers)
- enable desired components (e.g. threading)
- user starts using the C-API
- state → behavior
Py_Main()
would effectively be something like:
- prepare host process resources
PyMain_InitResourcesConfig()
- initialize host process resources
- populate configs
PyMain_InitConfig()
(or something like that)PyConfig_Update()
- (some) env vars → config
- host process resources → config
- initialize
PyInterpreterConfig
- initialize
PyMainConfig
- (remaining) env vars → config
- argv → config
3-6. same as above
- run code or REPL from
PyMainConfig
That mental model is unrealistic if we don’t achieve and preserve a strong separation between the parts of the runtime. Hence, I have reservations about encouraging the concept of a single monolithic config.