PEP 741: Python Configuration C API

davidhewitt · February 7, 2024, 9:04am

PyO3 doesn’t explicitly pre-initialize at the moment, and it would be easier to keep it that way.

I see, so it sounds like the Python allocator dependency can be removed and the remaining need for preinitialization is for locale decoding for this configuration.

Is it possible that the locale decoding could be done lazily during Py_InitializeFromInitConfig? That way preinitialization might be just an implementation detail in the initialization step which users don’t ever need to care about. I suspect the answer is yes, it just makes the implementation harder

I think it’d make this API easier to understand and remove the question of whether modifying preconfiguration after preinitilization should be an error or ignored.

Thinking further, actually I would just prefer if PyImport_AppendInittab() and PyImport_ExtendInittab() continue to be legal to use before calling new initialization API. Then PyO3 would not have to change our current API to populate the inittab.

davidhewitt · February 7, 2024, 9:05am

To me, yes . I think it also helps separate it from PEP 587 a bit.

davidhewitt · February 7, 2024, 9:18am

Some final comments related to the stable ABI, embedding, etc:

We struggle to link against libpython3 for embedding use cases because many Linux distributions appear not to ship it, @encukou is aware of this from this comment. So even if we limit ourselves to stable ABI symbols, we usually link against a version-specific shared object. (In principle users can override this but then I think they also struggle with the lack of libpython3.)
A frequent recommendation I give users who want to embed Python into Rust apps is to use PyOxidizer. This vendors a statically-built Python and uses PEP 587 initialization plus (I think) some private symbols to get to the desired end state. It takes a lot of the packaging complexities away, which is great for most users.

I’m sure it depends on implementation details and dark magic. cc @indygreg, you might want to check out and comment on this PEP in case it can make things easier for PyOxidizer.

ronaldoussoren · February 7, 2024, 10:18am

That depends on the option and on what our contract for a stable ABI is. Adding new options should be fine, but removing existing options is problematic. Especially for options that are actually used in 3th-party code.

Maybe just not expose the “legacy” options when we’d like to get rid of them.

IMHO in a stable ABI we should try to avoid breaking existing code, and that would require too keep options around with the same semantics (or at least be very conservative with changes). I’d also consider changing a stable ABI function to always return an error to be a stable ABI break, even if that doesn’t break code at the dynamic linking stage.

pitrou · February 7, 2024, 10:22am

That would certainly be a good idea. Using a custom allocator for these functions seems a bit gratuitous to me.

encukou · February 7, 2024, 12:06pm

True, but why remove them?
The setter functions don’t need to limite themselves to storing a value in a structure.

If someone calls PyInitConfig_SetInt("dev_mode", 1) on a future new CPython, but meanwhile we decided that dev mode is now unnecessary, the call can be a no-op. If we split dev_mode into 20 fine-grained options, the call can set all of them eventually.
Making it error out is the last resort.

ronaldoussoren · February 7, 2024, 12:26pm

Isn’t that still a compatibility break (e.g. I set legacy_windows_stdio to true to get a particular behaviour, removing that option would break my code).

In the end that’s something that should be spelled out: What are the stability guarantees for the stable ABI? That’s a much bigger question that this proposal of course, but IMHO we should think about this for this new API because the stability risks are primarily in the configuration data not so much in the new functions.

Maybe add language that options can be removed using the normal deprecation method, and add an API function that can be used to query if an option is valid (or some other way to make it easy to differentiate “this option name is not valid” from “the new value is not valid”).

encukou · February 7, 2024, 12:42pm

Well, “guarantee” sounds too strong for a volunteer-led project, so:

The current compatibility … expectation … or target? … is purely ABI-related: you won’t get missing symbols, memory corruption due to layout changes or reordered arguments, etc.

Behaviour is left to the general backwards compatibility policy, and I think that is the right place. I do think the policy could be stricter, but I see no reason why C-API should have its own policy.

pitrou · February 7, 2024, 1:51pm

This has nothing to do with the stable ABI per se, but with Python configuration options. The same option is also accessible using the PYTHONLEGACYWINDOWSSTDIO environment variable.

Whether we commit to preserve the meaning of configuration options accross versions is an open question, but please open a separate discussion thread for it. This is off-topic for this discussion IMHO.

ronaldoussoren · February 7, 2024, 2:15pm

“Guarantee” is too strong a word, but we currently promise this in our documentation:

To enable this, Python provides a Stable ABI : a set of symbols that will remain compatible across Python 3.x versions.

The Stable ABI contains symbols exposed in the Limited API, but also other ones – for example, functions necessary to support older versions of the Limited API.

The PEP proposed to add the new API to the limited ABI, and that’s why I started this subthread. An implication of the PEP can be that we promise to maintain the semantics of configuration options exposed through this API. It is OK if we only want to promise to follow the normal deprecation proces for the stable API, but either way needs to be documented as such.

vstinner · February 8, 2024, 4:38pm

I failed to find them. Do you recall which discussions/issues were asking for that?

vstinner · February 8, 2024, 6:49pm

Hi, I wrote a second version of the PEP. I created a new discussion,to have comments under the updated PEP: PEP 741: Python Configuration C API (second version).

eric.snow · February 8, 2024, 7:52pm

(FTR, I haven’t read through this thread yet. My comments here are based exclusively on having read through just the PEP.)

Thanks for working on this, Victor! I agree that there is room for improvement in runtime initialization (and finalization).

Overall, the proposed API seems consistent and effectively an iteration on PEP 587. However, I have some concerns, mostly centered on the motivation (users) for this new API and why this is the best approach.

Specifically:

who are we trying to help with this API?
how did they let us know their needs (and that the status quo wasn’t good enough)?
how does the proposed API meet those needs?
why does this need to be part of the limited API?
what is the purpose of exposing config values after initialization has finished?

Overall, I think we need to be careful to not lock in on an API when we aren’t sure it’s what we need.

My Perspective

(expand)

FWIW, I have a slightly different perspective about runtime intialization/finalization.

init/fini/configs:

at a fundamental level init (and fini) make sense as a set of granular phases
PEP 432 demonstrates this idea, though I’d also like to see feature-related phases during interpreter init/fini
each phase should have a distinct config this is used exclusively as input to the corresponding init function
the config values should only be used to initialize runtime state, whether directly or indirectly
there isn’t a strict 1-to-1 mapping between config fields and runtime state fields
initialization should not modify the config (that’s the user’s job before calling the init func)
once an init phase finishes, that config becomes irrelevant; the runtime never uses it
current places where we are using the config after initialization, and even treating it like state, should be fixed (add corresponding state fields and set them during init)
we would keep a read-only copy of the config around only for diagnostic purposes (as a snapshot of how the runtime was initialized)
finalization should mostly be the reverse of initialization

users:

embedders (ergo core devs) are the only users that use initialization API and thus the config
embedders don’t care about the stable ABI

per-interpreter config:

the global runtime config should be distinct from the config used to initialize interpreters
many config values in PyConfig are relevant for the global runtime but not for individual interpreters (we shouldn’t store a PyConfig on each PyInterpreterState)

Some past discussions:

Responses to the PEP

(expand for inline comments on the PEP text)

Add a C API to the limited C API to configure the Python preinitialization and initialization, and to get the current configuration. It can be used with the stable ABI.

What are we trying to solve by adding this to the limited API and stable ABI? Embedders are the only ones that should care about the config and it becomes irrelevant once they call Py_InitializeFromConfig().

Add sys.get_config(name) function to get the current value of a configuration option.

What does “current” mean here? It implies the config might have changed. The config should not have changed from what was used to initialize the runtime. Instead, initialization should have used the config to initialize the runtime state. Thus it makes sense to support querying the runtime/interpreter state, not the config.

Do note that currently we actually do use some of the interp.config fields directly during runtime operation (after initialization). However, that’s just a matter of no one being motivated to fix up those cases. (FWIW, this came up several years ago when someone started using interp.config as state rather than treating it as const. This happened because the relevant code was using the interp.config.... field directly, so it was easy to not know that modifying that field directly wasn’t correct.)

Allow setting custom configuration options, not used by Python but by third-party code. Options are referred to by their name as a string.

The config is what we use to initialize the runtime. Supporting additional custom config options implies that the config matters after the runtime is initialized. I think we should avoid sending that message.

If such custom “config” options are desirable then they should be their own feature, separate from the runtime config.

This PEP unifies also the configuration of the Python preinitialization and the Python initialization in a single API.

What are the deficiencies in the current API? Don’t we already have a pre-config?

PEP 587 has no API to get the current configuration, only to configure the Python initialization .

Why does anyone need access to the config used to initialize the runtime? I expect what they actually want is to know the current value being used.

For example, the global configuration variable Py_UnbufferedStdioFlag was deprecated in Python 3.12 and using PyConfig.buffered_stdio is recommended instead. It only works to configure Python, there is no public API to get PyConfig.buffered_stdio .

What is the corresponding value in the runtime state? We should be exposing that, rather than the config value.

Users of the limited C API are asking for a public API to get the current configuration.

What users? Why do they want it?

In the end, it was decided to not add a new PyConfig member to stable branches, but only add a new PyConfig.int_max_str_digits member to the development branch (which became Python 3.12). A dedicated private global variable (unrelated to PyConfig ) is used in stable branches.

I would expect a dedicated field in _PyRuntimeState and/or PyInterpreterState, rather than a private global variable. The config field (in 3.12+) should only be relevant during initialization.

The Python preinitialization uses the PyPreConfig structure and the Python initialization uses the PyConfig structure. Both structures have four duplicated members: dev_mode, parse_argv, isolated and use_environment.

The redundancy is caused by the fact that the two structures are separated, whereas some PyConfig members are needed by the preinitialization.

That depends on what preinitialization (_Py_PreInitializeFromPyArgv(), Py_PreInitialize(), or _Py_PreInitializeFromConfig()) is meant to accomplish and how it relates to initialization.

The outcome of preinitialization should be either state stored somewhere or custom values set on PyConfig. IIRC, PEP 432 was more clear about the distinction between preinit and init.

The idea of preinitialization relates to the small set of functions that users may use when populating the PyConfig before initializing the runtime. Preinitialization sets up the bare minimum of state needed for that small set of functions. This would be more clear if those functions we only available before initialization and if they explicitly took a PyPreConfig * as an argument.

In effect, the steps in initialization are:

(optional) populate a PyPreConfig as desired
(optional) use that preconfig when calling functions that modify a PyConfig
populate the PyConfig
call Py_InitializeFromConfig()

(FWIW, I’d like to see step 4 split into a number of granular steps, e.g. pre-main interpreter, plus per-interpreter steps related to enabling features.)

Python API:

sys.get_config(name)

As a mentioned earlier, configured values should be translated to state in _PyRuntimeState or PyInterpreterState. Once Py_InitializeFromConfig() has returned, there should be no need to look at the config ever again.

I do agree it would be helpful to to expose more of that state for introspection. However, most of it is domain-specific and it would probably make sense to expose it either in a corresponding module or with a specific, dedicated sys getter. In fact, we already do this for much of the state derived from the original PyConfig, such as sys/importlib attributes related to the import system and things like sys.hash_info().

Perhaps it would be useful for users to have a single function like sys.get_config() that returns the runtime state value that corresponds to the given PyConfig field. However, there isn’t a 1-to-1 mapping from config fields to runtime state value in all cases, and that inconsistency might be confusing. It might also be useful to add a similar function that returns a dict populated with the full config (like _PyConfig_AsDict() does).

In either case, though, the config should be no more than a diagnostic tool; its data should never be used in any code logic. The question to answer is: why would users ever need to look at config values or even know about PyConfig and its field names? They shouldn’t, and adding something like sys.get_config() would invite users to start factoring in the config to their mental model of Python, rather than focusing on actual runtime state, where we do want them to focus.

The C API uses null-terminated UTF-8 encoded strings to refer to a configuration option.

+1

The PyInitConfig structure is implemented by combining the three structures of the PyConfig API:

My first impression is that this would be a step backward. Even if it isn’t, I think we need a lot more feedback from embedders before we settle on the right approach. Again, PEP 432 has a lot of good ideas in the right direction. Having a distinct config for each initialization phase makes sense.

PyInitConfig structure:
Opaque structure to configure the Python preinitialization and the Python initialization.

Making this opaque might make sense. That would certainly allow us to organize the contents however we like, without disrupting users in the future. However, how much does that matter? Would embedders even care? Perhaps we are making it opaque only for the sake of the limited API?

PyInitConfig_SetInt(config, name, value)

PyInitConfig_SetStr(config, name, value)

PyInitConfig_SetWStr(config, name, value)

PyInitConfig_SetStrList(config, name, length, items)

PyInitConfig_SetWStrList(config, name, length, items)

Basically, setting field values would become a runtime operation where our implementation of these functions would be responsible for checking the name and value type for correctness. Contrast that with the current situation, where using a non-opaque PyConfig means the compiler can enforce types and field names. (We still have to check field values for correctness when a field is restricted to a sub-range of the declared type.)

It is possible to set custom configuration options, not used by Python but only by third-party code, by calling: PyInitConfig_SetInt(config, "allow_custom_options", 1) . In this case, setting custom configuration options is accepted, rather than failing with an “unknown option” error. By default, setting custom configuration options is not allowed.

What’s the motivation for this? We would basically be lumping non-CPython custom options in with the options we need for initialization. The idea seems problematic, especially since users would use the same API for get/set custom options as for CPython-init options. What’s the advantage to making custom options a part of the config, rather than a separate API? How does this benefit people embedding CPython?

eric.snow · February 8, 2024, 8:04pm

I’m thinking along these lines too.

Yeah, we should aim for a clearer separation between the capabilities of the CPython runtime and the embedding machinery (including what happens in Modules/main.c).

CAM-Gerlach · February 9, 2024, 12:02am

PEP discussion continues on the new round 2 thread per @vstinner 's comment; archiving this one to avoid others missing that and bifurcating the discussion. Thanks!