PEP 741: Python Configuration C API

eric.snow · February 8, 2024, 7:52pm

(FTR, I haven’t read through this thread yet. My comments here are based exclusively on having read through just the PEP.)

Thanks for working on this, Victor! I agree that there is room for improvement in runtime initialization (and finalization).

Overall, the proposed API seems consistent and effectively an iteration on PEP 587. However, I have some concerns, mostly centered on the motivation (users) for this new API and why this is the best approach.

Specifically:

who are we trying to help with this API?
how did they let us know their needs (and that the status quo wasn’t good enough)?
how does the proposed API meet those needs?
why does this need to be part of the limited API?
what is the purpose of exposing config values after initialization has finished?

Overall, I think we need to be careful to not lock in on an API when we aren’t sure it’s what we need.

My Perspective

(expand)

FWIW, I have a slightly different perspective about runtime intialization/finalization.

init/fini/configs:

at a fundamental level init (and fini) make sense as a set of granular phases
PEP 432 demonstrates this idea, though I’d also like to see feature-related phases during interpreter init/fini
each phase should have a distinct config this is used exclusively as input to the corresponding init function
the config values should only be used to initialize runtime state, whether directly or indirectly
there isn’t a strict 1-to-1 mapping between config fields and runtime state fields
initialization should not modify the config (that’s the user’s job before calling the init func)
once an init phase finishes, that config becomes irrelevant; the runtime never uses it
current places where we are using the config after initialization, and even treating it like state, should be fixed (add corresponding state fields and set them during init)
we would keep a read-only copy of the config around only for diagnostic purposes (as a snapshot of how the runtime was initialized)
finalization should mostly be the reverse of initialization

users:

embedders (ergo core devs) are the only users that use initialization API and thus the config
embedders don’t care about the stable ABI

per-interpreter config:

the global runtime config should be distinct from the config used to initialize interpreters
many config values in PyConfig are relevant for the global runtime but not for individual interpreters (we shouldn’t store a PyConfig on each PyInterpreterState)

Some past discussions:

Responses to the PEP

(expand for inline comments on the PEP text)

Add a C API to the limited C API to configure the Python preinitialization and initialization, and to get the current configuration. It can be used with the stable ABI.

What are we trying to solve by adding this to the limited API and stable ABI? Embedders are the only ones that should care about the config and it becomes irrelevant once they call Py_InitializeFromConfig().

Add sys.get_config(name) function to get the current value of a configuration option.

What does “current” mean here? It implies the config might have changed. The config should not have changed from what was used to initialize the runtime. Instead, initialization should have used the config to initialize the runtime state. Thus it makes sense to support querying the runtime/interpreter state, not the config.

Do note that currently we actually do use some of the interp.config fields directly during runtime operation (after initialization). However, that’s just a matter of no one being motivated to fix up those cases. (FWIW, this came up several years ago when someone started using interp.config as state rather than treating it as const. This happened because the relevant code was using the interp.config.... field directly, so it was easy to not know that modifying that field directly wasn’t correct.)

Allow setting custom configuration options, not used by Python but by third-party code. Options are referred to by their name as a string.

The config is what we use to initialize the runtime. Supporting additional custom config options implies that the config matters after the runtime is initialized. I think we should avoid sending that message.

If such custom “config” options are desirable then they should be their own feature, separate from the runtime config.

This PEP unifies also the configuration of the Python preinitialization and the Python initialization in a single API.

What are the deficiencies in the current API? Don’t we already have a pre-config?

PEP 587 has no API to get the current configuration, only to configure the Python initialization .

Why does anyone need access to the config used to initialize the runtime? I expect what they actually want is to know the current value being used.

For example, the global configuration variable Py_UnbufferedStdioFlag was deprecated in Python 3.12 and using PyConfig.buffered_stdio is recommended instead. It only works to configure Python, there is no public API to get PyConfig.buffered_stdio .

What is the corresponding value in the runtime state? We should be exposing that, rather than the config value.

Users of the limited C API are asking for a public API to get the current configuration.

What users? Why do they want it?

In the end, it was decided to not add a new PyConfig member to stable branches, but only add a new PyConfig.int_max_str_digits member to the development branch (which became Python 3.12). A dedicated private global variable (unrelated to PyConfig ) is used in stable branches.

I would expect a dedicated field in _PyRuntimeState and/or PyInterpreterState, rather than a private global variable. The config field (in 3.12+) should only be relevant during initialization.

The Python preinitialization uses the PyPreConfig structure and the Python initialization uses the PyConfig structure. Both structures have four duplicated members: dev_mode, parse_argv, isolated and use_environment.

The redundancy is caused by the fact that the two structures are separated, whereas some PyConfig members are needed by the preinitialization.

That depends on what preinitialization (_Py_PreInitializeFromPyArgv(), Py_PreInitialize(), or _Py_PreInitializeFromConfig()) is meant to accomplish and how it relates to initialization.

The outcome of preinitialization should be either state stored somewhere or custom values set on PyConfig. IIRC, PEP 432 was more clear about the distinction between preinit and init.

The idea of preinitialization relates to the small set of functions that users may use when populating the PyConfig before initializing the runtime. Preinitialization sets up the bare minimum of state needed for that small set of functions. This would be more clear if those functions we only available before initialization and if they explicitly took a PyPreConfig * as an argument.

In effect, the steps in initialization are:

(optional) populate a PyPreConfig as desired
(optional) use that preconfig when calling functions that modify a PyConfig
populate the PyConfig
call Py_InitializeFromConfig()

(FWIW, I’d like to see step 4 split into a number of granular steps, e.g. pre-main interpreter, plus per-interpreter steps related to enabling features.)

Python API:

sys.get_config(name)

As a mentioned earlier, configured values should be translated to state in _PyRuntimeState or PyInterpreterState. Once Py_InitializeFromConfig() has returned, there should be no need to look at the config ever again.

I do agree it would be helpful to to expose more of that state for introspection. However, most of it is domain-specific and it would probably make sense to expose it either in a corresponding module or with a specific, dedicated sys getter. In fact, we already do this for much of the state derived from the original PyConfig, such as sys/importlib attributes related to the import system and things like sys.hash_info().

Perhaps it would be useful for users to have a single function like sys.get_config() that returns the runtime state value that corresponds to the given PyConfig field. However, there isn’t a 1-to-1 mapping from config fields to runtime state value in all cases, and that inconsistency might be confusing. It might also be useful to add a similar function that returns a dict populated with the full config (like _PyConfig_AsDict() does).

In either case, though, the config should be no more than a diagnostic tool; its data should never be used in any code logic. The question to answer is: why would users ever need to look at config values or even know about PyConfig and its field names? They shouldn’t, and adding something like sys.get_config() would invite users to start factoring in the config to their mental model of Python, rather than focusing on actual runtime state, where we do want them to focus.

The C API uses null-terminated UTF-8 encoded strings to refer to a configuration option.

+1

The PyInitConfig structure is implemented by combining the three structures of the PyConfig API:

My first impression is that this would be a step backward. Even if it isn’t, I think we need a lot more feedback from embedders before we settle on the right approach. Again, PEP 432 has a lot of good ideas in the right direction. Having a distinct config for each initialization phase makes sense.

PyInitConfig structure:
Opaque structure to configure the Python preinitialization and the Python initialization.

Making this opaque might make sense. That would certainly allow us to organize the contents however we like, without disrupting users in the future. However, how much does that matter? Would embedders even care? Perhaps we are making it opaque only for the sake of the limited API?

PyInitConfig_SetInt(config, name, value)

PyInitConfig_SetStr(config, name, value)

PyInitConfig_SetWStr(config, name, value)

PyInitConfig_SetStrList(config, name, length, items)

PyInitConfig_SetWStrList(config, name, length, items)

Basically, setting field values would become a runtime operation where our implementation of these functions would be responsible for checking the name and value type for correctness. Contrast that with the current situation, where using a non-opaque PyConfig means the compiler can enforce types and field names. (We still have to check field values for correctness when a field is restricted to a sub-range of the declared type.)

It is possible to set custom configuration options, not used by Python but only by third-party code, by calling: PyInitConfig_SetInt(config, "allow_custom_options", 1) . In this case, setting custom configuration options is accepted, rather than failing with an “unknown option” error. By default, setting custom configuration options is not allowed.

What’s the motivation for this? We would basically be lumping non-CPython custom options in with the options we need for initialization. The idea seems problematic, especially since users would use the same API for get/set custom options as for CPython-init options. What’s the advantage to making custom options a part of the config, rather than a separate API? How does this benefit people embedding CPython?