Add example increasing an initialization configuration option.
“Usage of the stable ABI” section: add more quotes.
Document that options side effects are not taken in account by PyInitConfig_Set*() functions.
What do you think of these changes?
With sys.get_config_names(), I’m not sure that it’s strictly needed to have a public PyConfig_Keys() function. One call call sys.get_config_names() in C, no? PyConfig_Get() is different: it’s designed to be efficient and I expect that it will be more commonly used, so having to import sys is not convenient.
I don’t pretend to fulfit all requests of everybody, since there are incompatible requests like “remove PyInitConfig_GetInt()” whereas others want this function I only tried to summarize the discussion.
I haven’t had a chance to get back to this discussion, partly because I haven’t felt like my thoughts about it are organized enough to be helpful. I still have a sense of unease though.
In the interest of giving myself a clearer picture, I took a couple days to do a rough analysis relative to the existing PyConfig and runtime init/fini: Python Config - Google Sheets. The spreadsheet isn’t especially thorough, but it has already helped me feel a little more confident about this stuff.
(FYI, I’ve also been putting together a rough, high-level greenfield API design for the runtime. That’s been helping me toward a mental place where I’ll feel like I can communicate clearly about PEP 741. I’ll probably reply properly then.)
Interesting. There are around 81 configuration options (PyPreConfig, PyConfig, PyInterpreterConfig). Should I include PyInterpreterConfig options in PyInitConfig? If yes, maybe some options should be read-only?
Each interpreter has its own sys.argv, and you can imagine to run an interpreter just to execute a command passed in argv, no? But you wrote “Main” in the “conceptual” column.
PEP 741 proposes to put everything all together and don’t bother if some config options no longer make sense (to be read by sys.get_config()) at runtime. It’s “just” an unified API to “configure Python options”.
Pretty much all my questions here are ones the PEP did not answer. If they were answered in discussion threads then please update the PEP with a summary of the information.
There are three groups of users here:
embedders using Py_InitializeFromConfig()
other embedders (including those using Py_Main())
everyone else
The first group of embedders should already have the config, so they shouldn’t need a copy of it later. That they might need it implies that the initialization API is flawed. I’d rather we fix those flaws than add a new API to expose the config.
The second group doesn’t already have the config, but arguably we should make it easier for them to be in the first group. If that isn’t an option then we need to make sure they have access to all the relevant runtime state.
(For both embedder groups, exposing the config API in the stable ABI certainly adds a wrinkle, but I’d argue against a single monolithic config API.)
The third group (other users) should be able to query the current state of the runtime. If there’s some part of that state they can’t access then we should fix that. This makes even more sense if we stop using the config after initialization, i.e. copy all needed values into the runtime state during init.
If there is still value to preserving and exposing the config used during init then to me it would make sense to have something like sys.get_config() which would return a dict with the config values. That could provide a fallback to users for values that don’t have explicit API.
(I would still argue that it would be more appropriate as something like sys.get_state() and avoid any relationship with “config”.)
Yeah, we should only modify state. The config should only represent how the runtime was initialized.
This is a strong argument for adding something like _PyRuntimeState.verbose and/or PyInterpreterState.verbose. It would be a mistake to allow users to modify the config to change some behavior.
That’s something we should fix. All such values should be copied into the runtime/interpreter state during init.
If you mean PyConfig_Get() would still be used to get the value from the runtime/interpreter state, then the function name should probably use a different word than “config”.
The only value would be to core devs debugging the runtime initialization code. However, it shouldn’t be necessary even for that. We could probably get rid of it.
I’m saying, why does each interpreter need to keep information about how the runtime was initialized? That’s unnecessary duplication.
Of course, the point is irrelevant if we don’t need to preserve any config at all.
That sounds like a challenging user interface. How will a user know which parts of the runtime state are exposed via this API and which are exposed through some other existing API? Would we expose all state through the new API?
I’m saying that the parts of the runtime state are currently all tangled together, whether in the config, initialization, or finalization. This is something we’ve been working on fixing, as you know. Having a single PyInitConfig API for the different parts feels like we’re giving up on working toward a cleaner distinction between them.
Again, though, the point is irrelevant if we stop using the config after initialization. Any new API would expose state instead.
I have additional things to say about this, but I’ll do it in a separate post.
Well, perhaps the code wanting to inspect the config is simply not the same one that set it?
Generally speaking, my experience is that APIs which allow setting a value but not reading it back later always end up frustratingly limited. Unless there’s a technical limitation that prevents reading back the value, please provide an API to read it. CPython is already storing the value somewhere, third-party code should not have to invent their own secondary storage to read it back.
Currently, many configuration options are not exposed at all in Python. Only some of them are exposed in the sys modules in different attributes such as sys.flags, sys.warnoptions and sys._xoptions. For example, subprocess._args_from_interpreter_flags() rebuilds the Python command line options from sys.flags, sys.warnoptions and sys._xoptions.
I’m not a fan of the sys.flags API which exposes 18 configuration options as object attributes. I prefer to go through a function call, it gives more freedom to compute a value, trigger a warning, raise an exception, etc. By the way, it already exposes a PyPreConfig member (utf8_mode), not only PyConfig members.
Adding sys.get_config() exposes all configuration options to not have to decide if users should access it or not. Users can pick any value and make their own choices.
If we are lumping all config together then yes. However, as I’ve said, I think we should expose them are “state” rather than “config”.
“argv” is a concept for the main program. Having an interpreter-specific argv seems strange to me. That said, with subinterpreters you can easily modify sys.argv before running you code there. There’s no need for the config to be involved.
That seems okay for something like sys.get_config() or for non-embedders using the C-API. However, for embedders I think it’s important to keep a firmer separation between the parts of the runtime.
I think I’ve got a good sense of my own thoughts now.
First of all, it seems like the PEP is trying to accomplish two main things:
expose the initialization config to users after initialization
do so in the limited API in a way that is backward compatible to 3.13+
Is that right?
Aside from that, I have concerns about encouraging the concept of a monolithic config vs. focusing on each part of the runtime individually.
Conceptually, the runtime can be divided into 5 distinct parts and initialization/config may be split along those lines:
(Py_Main() is deeply connected and, from a conceptual standpoint, acts as an additional part.)
Under the status quo, the structural separation between these parts is mostly okay (though the granular components are heavily mixed in with the other parts). However, config/initialization/finalization is currently all tangled together. That translates into a consistent maintenance cost.
The situation is largely a consequence of organic growth over the decades (see PEP 432) but has gotten better since around 2017. There’s still a lot to be done though.
If we look at the way things are currently, initialization looks something like this (at least when Py_Initialize() isn’t sufficient):
user populates a PyConfig
Py_InitializeFromConfig() - a tangle of the following, in no particular order:
initialize host process resources (partially from config)
host process resources → global runtime state
argv → config
env vars → config
config → global runtime state
create main interpreter
host process resources → main interpreter state
config → main interpreter state
create main thread state
register host process callbacks (e.g. signal handlers)
…
user starts using the C-API
config → behavior
state → behavior
env vars → behavior
My ideal equivalent would be something like this:
user initializes host process resources
user populates configs
set custom values on PyConfig
PyConfig_Update() (or something like that)
(some) env vars → config
host process resources → config
set custom values on PyInterpreterConfig
initialize the runtime
Py_InitializeFromConfig()
PyConfig → global runtime state
create/init the main interpreter
Py_NewInterpreterFromConfig()
create new interpreter
PyInterpreterConfig → interpreter state
set up the main thread
PyThreadState_New()
PyThreadState_Bind()
register host process callbacks (e.g. signal handlers)
enable desired components (e.g. threading)
user starts using the C-API
state → behavior
Py_Main() would effectively be something like:
prepare host process resources
PyMain_InitResourcesConfig()
initialize host process resources
populate configs
PyMain_InitConfig() (or something like that)
PyConfig_Update()
(some) env vars → config
host process resources → config
initialize PyInterpreterConfig
initialize PyMainConfig
(remaining) env vars → config
argv → config
3-6. same as above
run code or REPL from PyMainConfig
That mental model is unrealistic if we don’t achieve and preserve a strong separation between the parts of the runtime. Hence, I have reservations about encouraging the concept of a single monolithic config.
It’s used by argument parsing libraries such as argparse, and command line interface programs such as pip. For me, it sounds useful to run CLI programs in parallel.
From the C API point of view, for me it’s convenient to reuse the same PyInitConfig API to set argv. Having to create a Python list to set sys.argv seems less convenient to me in C than calling PyInitConfig_SetStrList().
I’m not convinced than argv is a “main interpreter” concept.
When embedding Python, these “distinct parts” are not really relevant. What developers want is a way to pass a few options with a simple “unified” API (convenient to use in C). IMO, from the point of view of someone embedded Python, knowing what uses a configuration option and where it will be stored is not really relevant.
PEP 587 introduces the concept of “preinitialization”, two initialization phases. According to past comments, Steve and Petr want to get rid of that in PEP 741, to hide this “implementation detail”, and so want a single “initialization phase” (in PEP 741).
What I don’t understand is how is PEP 741 design against what you wrote? PEP 741 is implemented on top of the existing Python implementation which have PyRuntimeState, PyInterpreterState, PyThreadState, multiple interpreters, etc.
You’re right that PEP 741 doesn’t allow to implement multiple “initialization phases” à la “PEP 432” (" Restructuring the CPython startup sequence"). It’s a design choice to make the API easy to use.
Do you have examples of other projects which have a concept similar to that? Outside Python itself (CPython internals), what are the benefits for end users embedding Python?
I feel that some people dislike PEP 741 since the API has too many functions (ex: you’re against PyConfig_Get()). It seems like your proposed API has more functions and more structures (ex: PyInterpreterConfig and PyMainConfig). It sounds even more complicated to use, no?
That’s not what he said. Eric said “main program”. If you want to run multiple main programs in a single process in separate interpreters, you can totally do it (however, most “main programs” assume they control the entire process state, and so will rely on things like the current working directory or their own locale settings, and so it’s often not safe to do this - for example, CPython assumes it owns all of that stuff, and so it’s often not safe to embed it ).
Creating a new configuration for a subinterpreter isn’t even part of this PEP, so it’s a distraction here anyway. We should certainly use the initialization configuration to provide the initial argv, since that is global state that has to get into the runtime somehow.
That’s what they want, but what they get is all of those things. Very few embedders really just want the equivalent of python -c "<string embedded in the binary>", they’re going to have calls back into C, and multiple points where they call back into Python code. As a result, they need to track all of those resources.
But the point isn’t that they need to do it all explicitly. We ought to be able to create useful high-level APIs to allow embedders to do it easily (including ourselves! Since python.c is an embedder of CPython). But we need to structure the low-level APIs around this set of things or else we are just trying to hide the mess with more abstraction.
The only way we’ll ever be able to fully deallocate all the memory CPython allocates is to handle these five things responsibly, by allowing embedders to control their lifetimes and by ensuring we know where parts of CPython fit into each.
What I really want is a clear separation between the five groups of things Eric listed. Preinitialization and initialization are covering most of the first three, but all mixed up. If we can get three clear steps to (1) change the process state to be “right”, (2) initialize global runtime state, and (3) create an interpreter state, then I’ll be happier with three steps than two or one.
Moving to one step is really just a way to get your proposed API somewhat closer to that without making you do too much work
It sure does have more functions and structures! And I hope we’ll be able to reduce this down (hopefully with good defaults so that few of those structures ever need to be touched directly). But “more” doesn’t always “more complicated”.
Conceptually what we provide with more steps is a clearer understanding of what’s actually going on. When the API is too high level, the user has no idea what’s actually happening, and that leads to more pain.
The initialization phases Eric is proposing (and which I fully support, though want to work on the details more) are basically:
set up process-wide settings, such as the current locale and TTY settings
set up Python runtime state, such as any locks or memory allocators
set up a Python interpreter, with settings such as importers and search paths
set up other services, such as atexit and signal handling
start a Python thread, associated with an interpreter and bound to an OS thread
It’s more steps, but they are easier to understand what each one does, and what it may take to clean them all up. It gives embedders choices, such as choosing not to modify their FILE *stdin settings if they want them a certain wait.
It also makes it more obvious how to create a second interpreter, or a second thread, and what the implications of those might be. Your proposed two functions don’t even begin to address this, but leave users holding some automatically created threads that they don’t know about and don’t know how to handle the interactions with.
Once that exists, then we can easily write the high level function that does many of these steps at once. However, we don’t think it’s responsible to lock in the design of that high level function right now. We want to design the lower level ones first and make it work well, and then simplify things. That’s the main point of our objection to this proposal - we believe the assumptions it is based on (the underlying design) are wrong, and so we don’t want to codify them in a long-term stable API when we know that they’re wrong.
The API should now be feature complete. I also tried to remove as many functions as possible to stick to the smallest API covering all use cases. Thanks to review, the API is now more pleasant to use, especially by using UTF-8 by default (ex: in PyInitConfig_SetStr()), rather than the locale encoding.
I worked on an implementation of sys.get_config(), sys.get_config_names() and sys.set_config(). Such API has a few issues:
Some options are exposed at 2 to 3 differences places in the sys module. It’s not really Pythonic to have different API for the same or similar things.
Such API is not really “Pythonic”: users would prefer a mapping-like API.
Exposing configuration options in Python means that users must now learn about them, understand their exact meaning, whereas this API is more designed for the C world. I would prefer to not document each option in the Python documentation (it’s fine in the C API documentation, it’s already there).
I chose to remove the Python API to limit the PEP to the C API.
If PEP 741 is accepted, I consider trying to implement it for Python 3.8-3.12 in the my experimental deadparrot project. Since the new code is “more or less” a wrapper to existing PyConfig API, “it should” be doable.
The tricky part will be to handle the case when Py_InitializeFromConfig() does exit the process. Maybe it just will be a limitation of the “backport” and you should avoid calling it with --help or other options which exit Python at startup.
Obviously, some options added in new Python versions will not be available on old Python versions. For example, safe_path was added to Python 3.11. Trying to get/set it in Python 3.10 and older will fail with an error.
A key problem with the current situation is the stability of the PyConfig structure. It has been changing in every version of python in some cases just by reordering the fields. This is OK if you target a single version of python and you work in C/C++. But if you want to be able to target multiple python versions with a single executable and/or you work in a different programming language (so you need to translate the headers) this is just a nightmare . And with the removal of the old config API in 3.13, this is a problem that is impossible to avoid. I just wonder why didn’t you just add new fields at the end of PyConfig and insert dummy fields in the place of removed ones to keep some level of version compatibility.
I am very much in favour of PEP 741, and I think it would provide stability and continuity for embedders that program in languages other than C or C++. I just wish the configuration API was originally designed as in this PEP.
It was decided that if an application embeds Python, it sticks to a Python version anyway, and so there is no need to bother with the ABI compatibility.
(…)
Since PyConfig was added to Python 3.8, the limited C API and the stable ABI are getting more popular.
The Steering Council is having a tough time evaluating PEP 741 (Python Configuration C API). We’re not sure how to proceed given where we are (a big proposal that apparently failed to reach consensus) and what the SC’s mandate is (find consensus for proposals), and we think we need some help moving this along. In particular we think it’s important to understand why the discussions haven’t led to consensus around this proposal, and what that means for the proposal itself.
We’d like to hear more about the disagreements and concerns that other participants in the discussion have, in particular users who are trying to use the stable ABI in embedded Python situations, as well as C API WG members. We’re not sure in what form that would be most productive, though. It could be here on discuss.python.org, or in private email to steering-council@python.org, or in a video meeting (during our office hours or separately scheduled). We’ll leave that up to whoever wants to participate.
Here are some of the concerns we have, which are not necessarily questions we expect anyone to have answers for:
It’s a big proposal, even if the basic premise (provide a Limited API/Stable ABI version of the PyConfig APIs for embedded interpreter configuration) is simple. It’s adding a lot of functions, with different error handling than other parts of the C API, and some with different memory management (e.g. PyInitConfig_GetStrList and PyInitConfig_FreeStrList). It’s also trying to cater to a lot of use-cases, even if the basic use-case is simple (embedding using the Stable ABI). On top of that, this is all by necessity targeted to the set-in-concrete API and ABI, which makes mistakes very difficult to correct.
The SC isn’t really set up to evaluate the technical merits of a proposal like this one. It’s not really the SC’s job, and it’s not why individual SC members get elected. Instead, we have to rely on, and try to find, community consensus for proposals. This is true even when, like now, individual SC members have technical insight and opinions on the proposal’s subject matter.
The API design is fairly complex, given all the different types involved (Int, Str, StrLocale, WStr, StrList, WStrList) and all the different options that can be set. It’s not clear that the use-cases listed in the PEP need all the complexity the PEP proposes. At the same time, it’s not clear that the PEP is enough to satisfy all the use-cases, either.
To what extent is embedding with the Stable ABI a realistic target? Is this C API really the only thing that prevents embedders from using the Stable ABI? The PEP’s testimonials seem to be lauding the stable ABI itself for extension modules, which doesn’t really seem to apply to this C API: embedders can still use extension modules that use the Stable ABI, after all.
Could we not do with a simpler API (e.g. only allow utf-8 strings as values, convert them to whatever Python needs internally)? Do we need to expose all the settings right away? Do we need this API to cater to all user requests?
Is there a way we could test-drive this API, or alternatives to it, without adding it to the Limited API/Stable ABI straight away? For example, a separate C library that doesn’t use the stable ABI itself but exposes the new APIs? (Perhaps as a separate library per major Python version, or as one library with weak symbols and copious runtime version checks.)
So, please, if you have opinions on this PEP you think the SC should be aware of, or ideas on how the SC should evaluate it, let us know. (For what it’s worth, we will continue to debate the PEP in the meantime.)
In defence of the proposed API, it is based directly on the current (non-Stable ABI) one, with simplifications as requested through the discussion. I don’t really think it can get much simpler, other than by pushing more details into function arguments rather than function names.
It could certainly be implemented now as non-Stable ABI functions for more validation, though personally I’d prefer to not be giving embedders a brand new API every other release.
This is the core point of my disagreement: I contend that the answers are “barely realistic” and “no”. (For background, I have previously worked on CPython embedded in OBS Studio and Minecraft Education Edition, as well as other projects on Windows and Linux that are not public, and have consulted on designs/implementations of others under NDA via $work.)
(Below is a long aside on embedding in practice, which does not directly address your questions, but is still relevant, if repetitive for those who have been following this discussion.)
The only scenario where the Stable ABI is helpful for embedders is where you did not actually embed a copy of CPython, but are trying to use one that was installed by someone else. In this case, the host application can’t rely on any libraries being available, or any overrides being absent. Essentially, the stability and security of your app is entirely abandoned, assuming you’ve even managed to use the limited API for the rest of your bindings.
(Of note, SWIG, which is commonly used by embedders, added support for the limited API at the same time as they added support for 3.12 - end of last year. So this might increase the number of embedders who could potentially use the limited API throughout.)
The three successful models for embedding that I’ve seen are:
vendor a copy of CPython
use your distro’s unvendored copy of CPython
use an installed Python via its python3 executable
In the first case, you know which version your app will be using, so the Stable ABI is irrelevant. In the second, the distro is taking responsibility for it, but you’ve also got a mechanism to request the exact version your app intends to use, so again, you know which version you want.
The third case has been more stable and reliable for projects that wanted to provide the user’s own Python back to them, for example, by hosting an interactive prompt within an app. Many of the issues that we (CPython) see from embedders are people struggling to fully imitate what the python3 binary does, and most would be resolved if they just ran the binary with their own script and then used runpy for user code. In the other direction, those who want the python3 binary to be more predictable for their app tend to find that it can only be achieved by bringing their own, or by specifically instructing users on which version and how to install it.
As for how to evaluate the PEP, I would suggest requesting examples of actual projects that have struggled to be distributed because they “had” to vendor CPython but didn’t really need to (that is, they would’ve worked reliably and predictably with any arbitrary version/configuration of libpython).
Proving the need for these APIs to be guaranteed stable for the lifetime of abi3, and that such a guarantee would enable applications to depend on CPython without having to vendor or pin CPython as a dependency, would justify the rationale. In my opinion, everything proposed by Victor is as good a design as we are likely to get, but the rationale is entirely unsupported.
A few thoughts mostly prompted by @thomas’s questions:
Is supporting multiple string input and output formats really necessary? If the config API only accepts and returns strings as UTF-8 encoded bytes, then the Locale-dependent APIs go away (including any need for eager pre-initialisation). Embedding apps can do their own thing for conversion of locale-encoded strings if they need that.
Are the WStr APIs really necessary? I presume the PEP includes them because wchar_t * is more convenient on Windows and char * is more convenient elsewhere (i.e. the same reason the current PyConfig API offers both “string” (wchar_t) and “bytes” (char) APIs interfaces for updating config fields) but it doesn’t explicitly state that. If these APIs are retained, then the PEP should state why they’re included, and also be explicit that the wchar_t APIs assume UTF-16-LE as the encoding for a 2-byte wchar_t and UTF-32 for a 4-byte wchar_t.
Rather than supporting retrieval of config lists as C arrays which need to be deallocated later, perhaps it would be sufficient to just expose an indirect null-terminated indexing API? (e.g. int PyInitConfig_GetStrListEntry(PyInitConfig *config, const char *name, size_t index, const char **entry)). Alternatively, maybe read access to the affected config settings could be skipped for now and the questions revisited later given concrete use cases for accessing them prior to runtime initialisation?
a name-based API is appealing in terms of embedding CPython in applications using languages other than C/C++ and in terms of making it straightforward to add new private config settings without causing ABI problems in maintenance releases. Gaining these benefits only requires defining the new name-based API, they don’t require jumping straight to adding that API to the stable ABI.
several of the embedders comments in this thread (now quoted in the PEP) report that embedding CPython wasn’t the problem that caused them to give up on the vendoring model. Instead, their problem was with extension modules that weren’t using the stable ABI, which this PEP definitely won’t help with (since the config API doesn’t affect extension modules, only embedding apps).
My overall conclusion: the quibbles above aside, I quite like the general design of the API as a unified mechanism for accessing initialisation config settings using a consistent mechanism that is more easily kept forwards and backwards compatible than the struct-based config API. It also cleanly supports aliasing the config settings in the future if we come up with better names for them (e.g. perhaps introducing categorised namespaces that make it clearer which aspects of the process and runtime state they affect)
However, like @steve.dower , I don’t see a clear benefit in jumping straight to adding it to the stable ABI. The fact it will inherently remain ABI compatible in maintenance releases even if new config settings need to be added is enough of a gain to make the idea worth considering, and once it proves itself over the course of a release or two, then the question of adding it to the stable ABI could be considered.