PEP 741: Python Configuration C API (second version)

I prepared a draft updating PEP 741 to address most requested changes: PEP 741: Add sys.get_config_names() by vstinner · Pull Request #3686 · python/peps · GitHub

  • Add sys.get_config_names() function.
  • Add PyInitConfig_HasOption() function.
  • Remove Py_ExitWithInitConfig() function.
  • Add “Fully remove the preinitialization” section.
  • Mention when the caller must hold the GIL.
  • Add example increasing an initialization configuration option.
  • “Usage of the stable ABI” section: add more quotes.
  • Document that options side effects are not taken in account by PyInitConfig_Set*() functions.

What do you think of these changes?

With sys.get_config_names(), I’m not sure that it’s strictly needed to have a public PyConfig_Keys() function. One call call sys.get_config_names() in C, no? PyConfig_Get() is different: it’s designed to be efficient and I expect that it will be more commonly used, so having to import sys is not convenient.

I don’t pretend to fulfit all requests of everybody, since there are incompatible requests like “remove PyInitConfig_GetInt()” whereas others want this function :slight_smile: I only tried to summarize the discussion.

I haven’t had a chance to get back to this discussion, partly because I haven’t felt like my thoughts about it are organized enough to be helpful. I still have a sense of unease though.

In the interest of giving myself a clearer picture, I took a couple days to do a rough analysis relative to the existing PyConfig and runtime init/fini: Python Config - Google Sheets. The spreadsheet isn’t especially thorough, but it has already helped me feel a little more confident about this stuff.

(FYI, I’ve also been putting together a rough, high-level greenfield API design for the runtime. That’s been helping me toward a mental place where I’ll feel like I can communicate clearly about PEP 741. I’ll probably reply properly then.)

1 Like

Interesting. There are around 81 configuration options (PyPreConfig, PyConfig, PyInterpreterConfig). Should I include PyInterpreterConfig options in PyInitConfig? If yes, maybe some options should be read-only?

Each interpreter has its own sys.argv, and you can imagine to run an interpreter just to execute a command passed in argv, no? But you wrote “Main” in the “conceptual” column.

PEP 741 proposes to put everything all together and don’t bother if some config options no longer make sense (to be read by sys.get_config()) at runtime. It’s “just” an unified API to “configure Python options”.

Pretty much all my questions here are ones the PEP did not answer. If they were answered in discussion threads then please update the PEP with a summary of the information.

There are three groups of users here:

  • embedders using Py_InitializeFromConfig()
  • other embedders (including those using Py_Main())
  • everyone else

The first group of embedders should already have the config, so they shouldn’t need a copy of it later. That they might need it implies that the initialization API is flawed. I’d rather we fix those flaws than add a new API to expose the config.

The second group doesn’t already have the config, but arguably we should make it easier for them to be in the first group. If that isn’t an option then we need to make sure they have access to all the relevant runtime state.

(For both embedder groups, exposing the config API in the stable ABI certainly adds a wrinkle, but I’d argue against a single monolithic config API.)

The third group (other users) should be able to query the current state of the runtime. If there’s some part of that state they can’t access then we should fix that. This makes even more sense if we stop using the config after initialization, i.e. copy all needed values into the runtime state during init.

If there is still value to preserving and exposing the config used during init then to me it would make sense to have something like sys.get_config() which would return a dict with the config values. That could provide a fallback to users for values that don’t have explicit API.

(I would still argue that it would be more appropriate as something like sys.get_state() and avoid any relationship with “config”.)

Yeah, we should only modify state. The config should only represent how the runtime was initialized.

This is a strong argument for adding something like _PyRuntimeState.verbose and/or PyInterpreterState.verbose. It would be a mistake to allow users to modify the config to change some behavior.

That’s something we should fix. All such values should be copied into the runtime/interpreter state during init.

If you mean PyConfig_Get() would still be used to get the value from the runtime/interpreter state, then the function name should probably use a different word than “config”.

The only value would be to core devs debugging the runtime initialization code. However, it shouldn’t be necessary even for that. We could probably get rid of it.

I’m saying, why does each interpreter need to keep information about how the runtime was initialized? That’s unnecessary duplication.

Of course, the point is irrelevant if we don’t need to preserve any config at all.

That sounds like a challenging user interface. How will a user know which parts of the runtime state are exposed via this API and which are exposed through some other existing API? Would we expose all state through the new API?

I’m saying that the parts of the runtime state are currently all tangled together, whether in the config, initialization, or finalization. This is something we’ve been working on fixing, as you know. Having a single PyInitConfig API for the different parts feels like we’re giving up on working toward a cleaner distinction between them.

Again, though, the point is irrelevant if we stop using the config after initialization. Any new API would expose state instead.

I have additional things to say about this, but I’ll do it in a separate post.

1 Like

Well, perhaps the code wanting to inspect the config is simply not the same one that set it?

Generally speaking, my experience is that APIs which allow setting a value but not reading it back later always end up frustratingly limited. Unless there’s a technical limitation that prevents reading back the value, please provide an API to read it. CPython is already storing the value somewhere, third-party code should not have to invent their own secondary storage to read it back.

Currently, many configuration options are not exposed at all in Python. Only some of them are exposed in the sys modules in different attributes such as sys.flags, sys.warnoptions and sys._xoptions. For example, subprocess._args_from_interpreter_flags() rebuilds the Python command line options from sys.flags, sys.warnoptions and sys._xoptions.

I’m not a fan of the sys.flags API which exposes 18 configuration options as object attributes. I prefer to go through a function call, it gives more freedom to compute a value, trigger a warning, raise an exception, etc. By the way, it already exposes a PyPreConfig member (utf8_mode), not only PyConfig members.

Adding sys.get_config() exposes all configuration options to not have to decide if users should access it or not. Users can pick any value and make their own choices.

Note: If draft PEP 726 – Module __setattr__ and __delattr__ is accepted, it will be easier to execute code when sys attributes are get and set. The PEP is waiting for a Steering Council decision for 3 months.

If we are lumping all config together then yes. However, as I’ve said, I think we should expose them are “state” rather than “config”.

“argv” is a concept for the main program. Having an interpreter-specific argv seems strange to me. That said, with subinterpreters you can easily modify sys.argv before running you code there. There’s no need for the config to be involved.

That seems okay for something like sys.get_config() or for non-embedders using the C-API. However, for embedders I think it’s important to keep a firmer separation between the parts of the runtime.

I think I’ve got a good sense of my own thoughts now.

First of all, it seems like the PEP is trying to accomplish two main things:

  • expose the initialization config to users after initialization
  • do so in the limited API in a way that is backward compatible to 3.13+

Is that right?


Aside from that, I have concerns about encouraging the concept of a monolithic config vs. focusing on each part of the runtime individually.

Conceptually, the runtime can be divided into 5 distinct parts and initialization/config may be split along those lines:

  • host process resources
  • global runtime state
  • interpreter state
  • thread state
  • granular components (e.g. multiple interpreters, threading, atexit)

(Py_Main() is deeply connected and, from a conceptual standpoint, acts as an additional part.)

Under the status quo, the structural separation between these parts is mostly okay (though the granular components are heavily mixed in with the other parts). However, config/initialization/finalization is currently all tangled together. That translates into a consistent maintenance cost.

The situation is largely a consequence of organic growth over the decades (see PEP 432) but has gotten better since around 2017. There’s still a lot to be done though.

If we look at the way things are currently, initialization looks something like this (at least when Py_Initialize() isn’t sufficient):

  1. user populates a PyConfig
  2. Py_InitializeFromConfig() - a tangle of the following, in no particular order:
    • initialize host process resources (partially from config)
    • host process resources → global runtime state
    • argv → config
    • env vars → config
    • config → global runtime state
    • create main interpreter
    • host process resources → main interpreter state
    • config → main interpreter state
    • create main thread state
    • register host process callbacks (e.g. signal handlers)
  3. user starts using the C-API
    • config → behavior
    • state → behavior
    • env vars → behavior

My ideal equivalent would be something like this:

  1. user initializes host process resources
  2. user populates configs
    1. set custom values on PyConfig
    2. PyConfig_Update() (or something like that)
      • (some) env vars → config
      • host process resources → config
    3. set custom values on PyInterpreterConfig
  3. initialize the runtime
    1. Py_InitializeFromConfig()
      • PyConfig → global runtime state
  4. create/init the main interpreter
    1. Py_NewInterpreterFromConfig()
      • create new interpreter
      • PyInterpreterConfig → interpreter state
  5. set up the main thread
    1. PyThreadState_New()
    2. PyThreadState_Bind()
    3. register host process callbacks (e.g. signal handlers)
  6. enable desired components (e.g. threading)
  7. user starts using the C-API
    • state → behavior

Py_Main() would effectively be something like:

  1. prepare host process resources
    1. PyMain_InitResourcesConfig()
    2. initialize host process resources
  2. populate configs
    1. PyMain_InitConfig() (or something like that)
      1. PyConfig_Update()
        • (some) env vars → config
        • host process resources → config
      2. initialize PyInterpreterConfig
      3. initialize PyMainConfig
        • (remaining) env vars → config
        • argv → config
          3-6. same as above
  3. run code or REPL from PyMainConfig

That mental model is unrealistic if we don’t achieve and preserve a strong separation between the parts of the runtime. Hence, I have reservations about encouraging the concept of a single monolithic config.

3 Likes

It’s used by argument parsing libraries such as argparse, and command line interface programs such as pip. For me, it sounds useful to run CLI programs in parallel.

From the C API point of view, for me it’s convenient to reuse the same PyInitConfig API to set argv. Having to create a Python list to set sys.argv seems less convenient to me in C than calling PyInitConfig_SetStrList().

I’m not convinced than argv is a “main interpreter” concept.

When embedding Python, these “distinct parts” are not really relevant. What developers want is a way to pass a few options with a simple “unified” API (convenient to use in C). IMO, from the point of view of someone embedded Python, knowing what uses a configuration option and where it will be stored is not really relevant.

PEP 587 introduces the concept of “preinitialization”, two initialization phases. According to past comments, Steve and Petr want to get rid of that in PEP 741, to hide this “implementation detail”, and so want a single “initialization phase” (in PEP 741).

What I don’t understand is how is PEP 741 design against what you wrote? PEP 741 is implemented on top of the existing Python implementation which have PyRuntimeState, PyInterpreterState, PyThreadState, multiple interpreters, etc.


You’re right that PEP 741 doesn’t allow to implement multiple “initialization phases” à la “PEP 432” (" Restructuring the CPython startup sequence"). It’s a design choice to make the API easy to use.

Do you have examples of other projects which have a concept similar to that? Outside Python itself (CPython internals), what are the benefits for end users embedding Python?

I feel that some people dislike PEP 741 since the API has too many functions (ex: you’re against PyConfig_Get()). It seems like your proposed API has more functions and more structures (ex: PyInterpreterConfig and PyMainConfig). It sounds even more complicated to use, no?

1 Like

That’s not what he said. Eric said “main program”. If you want to run multiple main programs in a single process in separate interpreters, you can totally do it (however, most “main programs” assume they control the entire process state, and so will rely on things like the current working directory or their own locale settings, and so it’s often not safe to do this - for example, CPython assumes it owns all of that stuff, and so it’s often not safe to embed it :upside_down_face: ).

Creating a new configuration for a subinterpreter isn’t even part of this PEP, so it’s a distraction here anyway. We should certainly use the initialization configuration to provide the initial argv, since that is global state that has to get into the runtime somehow.

That’s what they want, but what they get is all of those things. Very few embedders really just want the equivalent of python -c "<string embedded in the binary>", they’re going to have calls back into C, and multiple points where they call back into Python code. As a result, they need to track all of those resources.

But the point isn’t that they need to do it all explicitly. We ought to be able to create useful high-level APIs to allow embedders to do it easily (including ourselves! Since python.c is an embedder of CPython). But we need to structure the low-level APIs around this set of things or else we are just trying to hide the mess with more abstraction.

The only way we’ll ever be able to fully deallocate all the memory CPython allocates is to handle these five things responsibly, by allowing embedders to control their lifetimes and by ensuring we know where parts of CPython fit into each.

What I really want is a clear separation between the five groups of things Eric listed. Preinitialization and initialization are covering most of the first three, but all mixed up. If we can get three clear steps to (1) change the process state to be “right”, (2) initialize global runtime state, and (3) create an interpreter state, then I’ll be happier with three steps than two or one.

Moving to one step is really just a way to get your proposed API somewhat closer to that without making you do too much work :wink:

It sure does have more functions and structures! And I hope we’ll be able to reduce this down (hopefully with good defaults so that few of those structures ever need to be touched directly). But “more” doesn’t always “more complicated”.

Conceptually what we provide with more steps is a clearer understanding of what’s actually going on. When the API is too high level, the user has no idea what’s actually happening, and that leads to more pain.

The initialization phases Eric is proposing (and which I fully support, though want to work on the details more) are basically:

  • set up process-wide settings, such as the current locale and TTY settings
  • set up Python runtime state, such as any locks or memory allocators
  • set up a Python interpreter, with settings such as importers and search paths
  • set up other services, such as atexit and signal handling
  • start a Python thread, associated with an interpreter and bound to an OS thread

It’s more steps, but they are easier to understand what each one does, and what it may take to clean them all up. It gives embedders choices, such as choosing not to modify their FILE *stdin settings if they want them a certain wait.

It also makes it more obvious how to create a second interpreter, or a second thread, and what the implications of those might be. Your proposed two functions don’t even begin to address this, but leave users holding some automatically created threads that they don’t know about and don’t know how to handle the interactions with.

Once that exists, then we can easily write the high level function that does many of these steps at once. However, we don’t think it’s responsible to lock in the design of that high level function right now. We want to design the lower level ones first and make it work well, and then simplify things. That’s the main point of our objection to this proposal - we believe the assumptions it is based on (the underlying design) are wrong, and so we don’t want to codify them in a long-term stable API when we know that they’re wrong.

Hi,

I made a big update of PEP 741. Main changes:

  • Remove the Python API, like the sys.get_config() function.
  • Add PyConfig_Set() API to set a runtime configuration option. There are now “read-only” options.
  • Add PyInitConfig_HasOption() function.
  • Add a “Multi-phase init” section (written by @eric.snow) to Rejected Ideas.
  • Change type of int options bool (most options are just enable/disable flags).
  • Mention more explicitly when the caller must hold the GIL.
  • Remove Py_ExitWithInitConfig() function.

Read the PEP: PEP 741 – Python Configuration C API | peps.python.org

The API should now be feature complete. I also tried to remove as many functions as possible to stick to the smallest API covering all use cases. Thanks to review, the API is now more pleasant to use, especially by using UTF-8 by default (ex: in PyInitConfig_SetStr()), rather than the locale encoding.


I worked on an implementation of sys.get_config(), sys.get_config_names() and sys.set_config(). Such API has a few issues:

  • Some options are exposed at 2 to 3 differences places in the sys module. It’s not really Pythonic to have different API for the same or similar things.
  • Such API is not really “Pythonic”: users would prefer a mapping-like API.
  • Exposing configuration options in Python means that users must now learn about them, understand their exact meaning, whereas this API is more designed for the C world. I would prefer to not document each option in the Python documentation (it’s fine in the C API documentation, it’s already there).

I chose to remove the Python API to limit the PEP to the C API.

3 Likes

If PEP 741 is accepted, I consider trying to implement it for Python 3.8-3.12 in the my experimental deadparrot project. Since the new code is “more or less” a wrapper to existing PyConfig API, “it should” be doable.

The tricky part will be to handle the case when Py_InitializeFromConfig() does exit the process. Maybe it just will be a limitation of the “backport” and you should avoid calling it with --help or other options which exit Python at startup.

Obviously, some options added in new Python versions will not be available on old Python versions. For example, safe_path was added to Python 3.11. Trying to get/set it in Python 3.10 and older will fail with an error.

2 Likes

I added PyInitConfig_GetExitcode() function for Py_InitializeFromInitConfig() which can request Python to exit when parsing command line argument.

I submitted PEP 741 to the Steering Council: PEP 741 -- Python Configuration C API · Issue #237 · python/steering-council · GitHub

Reposting this, because I’m still -1 on the PEP until one of the above happens.

I also posted on the SC issue that I don’t believe they should be considering it at this time.

1 Like

A key problem with the current situation is the stability of the PyConfig structure. It has been changing in every version of python in some cases just by reordering the fields. This is OK if you target a single version of python and you work in C/C++. But if you want to be able to target multiple python versions with a single executable and/or you work in a different programming language (so you need to translate the headers) this is just a nightmare . And with the removal of the old config API in 3.13, this is a problem that is impossible to avoid. I just wonder why didn’t you just add new fields at the end of PyConfig and insert dummy fields in the place of removed ones to keep some level of version compatibility.

I am very much in favour of PEP 741, and I think it would provide stability and continuity for embedders that program in languages other than C or C++. I just wish the configuration API was originally designed as in this PEP.

2 Likes

I tried to explain in PEP 741 why PEP 587 didn’t take the stable ABI in account: PEP 741 – Python Configuration C API | peps.python.org Extract:

It was decided that if an application embeds Python, it sticks to a Python version anyway, and so there is no need to bother with the ABI compatibility.
(…)
Since PyConfig was added to Python 3.8, the limited C API and the stable ABI are getting more popular.