(FTR, I haven’t read through this thread yet. My comments here are based exclusively on having read through just the PEP.)
Thanks for working on this, Victor! I agree that there is room for improvement in runtime initialization (and finalization).
Overall, the proposed API seems consistent and effectively an iteration on PEP 587. However, I have some concerns, mostly centered on the motivation (users) for this new API and why this is the best approach.
Specifically:
- who are we trying to help with this API?
- how did they let us know their needs (and that the status quo wasn’t good enough)?
- how does the proposed API meet those needs?
- why does this need to be part of the limited API?
- what is the purpose of exposing config values after initialization has finished?
Overall, I think we need to be careful to not lock in on an API when we aren’t sure it’s what we need.
My Perspective
(expand)
FWIW, I have a slightly different perspective about runtime intialization/finalization.
init/fini/configs:
- at a fundamental level init (and fini) make sense as a set of granular phases
- PEP 432 demonstrates this idea, though I’d also like to see feature-related phases during interpreter init/fini
- each phase should have a distinct config this is used exclusively as input to the corresponding init function
- the config values should only be used to initialize runtime state, whether directly or indirectly
- there isn’t a strict 1-to-1 mapping between config fields and runtime state fields
- initialization should not modify the config (that’s the user’s job before calling the init func)
- once an init phase finishes, that config becomes irrelevant; the runtime never uses it
- current places where we are using the config after initialization, and even treating it like state, should be fixed (add corresponding state fields and set them during init)
- we would keep a read-only copy of the config around only for diagnostic purposes (as a snapshot of how the runtime was initialized)
- finalization should mostly be the reverse of initialization
users:
- embedders (ergo core devs) are the only users that use initialization API and thus the config
- embedders don’t care about the stable ABI
per-interpreter config:
- the global runtime config should be distinct from the config used to initialize interpreters
- many config values in
PyConfig
are relevant for the global runtime but not for individual interpreters (we shouldn’t store aPyConfig
on eachPyInterpreterState
)
Some past discussions:
- The global config should not be stored on each interpreter · Issue #91120 · python/cpython · GitHub
- PyInterpreterState.config.int_max_str_digits Should Not Be Modified · Issue #98417 · python/cpython · GitHub
- (Mostly) Stop Special-casing the Main Interpreter · Issue #109857 · python/cpython · GitHub
Responses to the PEP
(expand for inline comments on the PEP text)
Add a C API to the limited C API to configure the Python preinitialization and initialization, and to get the current configuration. It can be used with the stable ABI.
What are we trying to solve by adding this to the limited API and stable ABI? Embedders are the only ones that should care about the config and it becomes irrelevant once they call Py_InitializeFromConfig()
.
Add sys.get_config(name) function to get the current value of a configuration option.
What does “current” mean here? It implies the config might have changed. The config should not have changed from what was used to initialize the runtime. Instead, initialization should have used the config to initialize the runtime state. Thus it makes sense to support querying the runtime/interpreter state, not the config.
Do note that currently we actually do use some of the interp.config
fields directly during runtime operation (after initialization). However, that’s just a matter of no one being motivated to fix up those cases. (FWIW, this came up several years ago when someone started using interp.config
as state rather than treating it as const. This happened because the relevant code was using the interp.config....
field directly, so it was easy to not know that modifying that field directly wasn’t correct.)
Allow setting custom configuration options, not used by Python but by third-party code. Options are referred to by their name as a string.
The config is what we use to initialize the runtime. Supporting additional custom config options implies that the config matters after the runtime is initialized. I think we should avoid sending that message.
If such custom “config” options are desirable then they should be their own feature, separate from the runtime config.
This PEP unifies also the configuration of the Python preinitialization and the Python initialization in a single API.
What are the deficiencies in the current API? Don’t we already have a pre-config?
PEP 587 has no API to get the current configuration, only to configure the Python initialization .
Why does anyone need access to the config used to initialize the runtime? I expect what they actually want is to know the current value being used.
For example, the global configuration variable
Py_UnbufferedStdioFlag
was deprecated in Python 3.12 and usingPyConfig.buffered_stdio
is recommended instead. It only works to configure Python, there is no public API to getPyConfig.buffered_stdio
.
What is the corresponding value in the runtime state? We should be exposing that, rather than the config value.
Users of the limited C API are asking for a public API to get the current configuration.
What users? Why do they want it?
In the end, it was decided to not add a new
PyConfig
member to stable branches, but only add a newPyConfig.int_max_str_digits
member to the development branch (which became Python 3.12). A dedicated private global variable (unrelated toPyConfig
) is used in stable branches.
I would expect a dedicated field in _PyRuntimeState
and/or PyInterpreterState
, rather than a private global variable. The config field (in 3.12+) should only be relevant during initialization.
The Python preinitialization uses the
PyPreConfig
structure and the Python initialization uses thePyConfig
structure. Both structures have four duplicated members:dev_mode
,parse_argv
,isolated
anduse_environment
.The redundancy is caused by the fact that the two structures are separated, whereas some
PyConfig
members are needed by the preinitialization.
That depends on what preinitialization (_Py_PreInitializeFromPyArgv()
, Py_PreInitialize()
, or _Py_PreInitializeFromConfig()
) is meant to accomplish and how it relates to initialization.
The outcome of preinitialization should be either state stored somewhere or custom values set on PyConfig
. IIRC, PEP 432 was more clear about the distinction between preinit and init.
The idea of preinitialization relates to the small set of functions that users may use when populating the PyConfig
before initializing the runtime. Preinitialization sets up the bare minimum of state needed for that small set of functions. This would be more clear if those functions we only available before initialization and if they explicitly took a PyPreConfig *
as an argument.
In effect, the steps in initialization are:
- (optional) populate a
PyPreConfig
as desired - (optional) use that preconfig when calling functions that modify a
PyConfig
- populate the
PyConfig
- call
Py_InitializeFromConfig()
(FWIW, I’d like to see step 4 split into a number of granular steps, e.g. pre-main interpreter, plus per-interpreter steps related to enabling features.)
Python API:
sys.get_config(name)
As a mentioned earlier, configured values should be translated to state in _PyRuntimeState
or PyInterpreterState
. Once Py_InitializeFromConfig()
has returned, there should be no need to look at the config ever again.
I do agree it would be helpful to to expose more of that state for introspection. However, most of it is domain-specific and it would probably make sense to expose it either in a corresponding module or with a specific, dedicated sys
getter. In fact, we already do this for much of the state derived from the original PyConfig
, such as sys
/importlib
attributes related to the import system and things like sys.hash_info()
.
Perhaps it would be useful for users to have a single function like sys.get_config()
that returns the runtime state value that corresponds to the given PyConfig
field. However, there isn’t a 1-to-1 mapping from config fields to runtime state value in all cases, and that inconsistency might be confusing. It might also be useful to add a similar function that returns a dict populated with the full config (like _PyConfig_AsDict()
does).
In either case, though, the config should be no more than a diagnostic tool; its data should never be used in any code logic. The question to answer is: why would users ever need to look at config values or even know about PyConfig
and its field names? They shouldn’t, and adding something like sys.get_config()
would invite users to start factoring in the config to their mental model of Python, rather than focusing on actual runtime state, where we do want them to focus.
The C API uses null-terminated UTF-8 encoded strings to refer to a configuration option.
+1
The PyInitConfig structure is implemented by combining the three structures of the PyConfig API:
My first impression is that this would be a step backward. Even if it isn’t, I think we need a lot more feedback from embedders before we settle on the right approach. Again, PEP 432 has a lot of good ideas in the right direction. Having a distinct config for each initialization phase makes sense.
PyInitConfig
structure:
Opaque structure to configure the Python preinitialization and the Python initialization.
Making this opaque might make sense. That would certainly allow us to organize the contents however we like, without disrupting users in the future. However, how much does that matter? Would embedders even care? Perhaps we are making it opaque only for the sake of the limited API?
PyInitConfig_SetInt(config, name, value)
PyInitConfig_SetStr(config, name, value)
PyInitConfig_SetWStr(config, name, value)
PyInitConfig_SetStrList(config, name, length, items)
PyInitConfig_SetWStrList(config, name, length, items)
Basically, setting field values would become a runtime operation where our implementation of these functions would be responsible for checking the name and value type for correctness. Contrast that with the current situation, where using a non-opaque PyConfig
means the compiler can enforce types and field names. (We still have to check field values for correctness when a field is restricted to a sub-range of the declared type.)
It is possible to set custom configuration options, not used by Python but only by third-party code, by calling:
PyInitConfig_SetInt(config, "allow_custom_options", 1)
. In this case, setting custom configuration options is accepted, rather than failing with an “unknown option” error. By default, setting custom configuration options is not allowed.
What’s the motivation for this? We would basically be lumping non-CPython custom options in with the options we need for initialization. The idea seems problematic, especially since users would use the same API for get/set custom options as for CPython-init options. What’s the advantage to making custom options a part of the config, rather than a separate API? How does this benefit people embedding CPython?