PEP 741: Python Configuration C API

vstinner · January 19, 2024, 4:53pm

Read the PEP 741: Python Configuration C API.

Abstract

Add a C API to the limited C API to configure the Python preinitialization and initialization, and to get the current configuration. It can be used with the stable ABI.

Add sys.get_config(name) function to get the current value of a configuration option.

Allow setting custom configuration options, not used by Python but by third-party code. Options are referred to by their name as a string.

PEP 587 “Python Initialization Configuration” unified all the ways to configure the Python initialization. This PEP unifies also the configuration of the Python preinitialization and the Python initialization in a single API.

Out of scope: set config at runtime

There is no API to set a configuration option while Python is running. Only to set the initialization configuration. Some people discussed it, but I didn’t see a clear willingness to have this feature.

Technically, it can be implemented. If we decide to add such API, I would suggest to make most options read-only, and add callbacks on others to validate values (reject invalid values) and execute code when an option is modified (ex: update/invalidate caches).

For example, do you want to be able to change bytes_warning option at runtime? I’m not sure that the Python ecosystem is ready to see this option changing between two lines of code.

It may be interesting to have a Python API (like sys.set_config()) to set some options at Python startup.

pitrou · January 19, 2024, 6:24pm

I think this is a good idea and the API looks generally well thought.

I would suggest a few changes:

Instead of PyInitConfig_SetStr taking a locale-encoded string, I would suggest PyInitConfig_SetUtf8 taking a UTF8 string (probably the preferred choice for most people?) and PyInitConfig_SetLocaleStr taking a locale-encoded string;
PyInitConfig_Exception would probably be less confusingly named PyInitConfig_HasError (we aren’t talking about proper Python exceptions, are we?);
PyInitConfig_GetError could probably return a const char* directly, rather than have a separate int return code that only duplicates the information.

Also, I do not understand what a config exit code is.

I’m not sure the allow_custom_options special config entry is really useful. Instead, you could arrange that values starting with some well-known prefix such as X- or vendor. are freely accepted.

Finally, it seems that PyConfig_Get and PyConfig_GetInt will only work if interpreter initialization was successful, since they set an exception on error? This means it’s not possible to inspect the current configuration while building it.

ronaldoussoren · January 19, 2024, 8:57pm

Are the configuration options themselves also a stable API?

The pep allows setting custom options. IMHO it would be better to specify that all names without a colon are reserved for CPython configuration and will be ignored as custom option. This would avoid breakage when adding new options in future Python versions.

encukou · January 22, 2024, 2:57pm

Thank you for writing it down as a PEP!

For the rationale: Note that the PyO3 project can target limited C API, bit this is not (yet) the default.

The “Deprecated legacy API” section lists API that was deprecated in Python 3.8, and then says it was deprecated in 3.11, 3.12 and 3.13. Which is correct?

What is the rationale for custom configuration options?

I see PyPreConfig and PyConfig are merged into PyInitConfig. Does that make the previous two obsolete? Can we do anything to avoid having so many different structs?
Would it be good to add PyInitConfig_GetPyConfig and PyInitConfig_GetPreConfig so that version-specific code can cooperate with limited-API code?
Or would it make sense to add preconfig and status fields to PyConfig, and use that instead of PyInitConfig (i.e. make it opaque in the limited API)?

What does it mean to “Get current configuration”? I assume that for some keys it returns the value used for config, but for others it returns a value from elsewhere (e.g. in sys.argv). Is that right? And if so, is it more useful than always returning the value used to initialize Python? (IMO, having PyConfig_Get only query info stored in a runtime-wide PyInitConfig struct would make it quite a bit clearer.)

The PyConfig_Get API is a tiny part of a Mapping protocol. Why should we not expose e.g. the set of available keys? (IMO, a dump of the config would be quite useful in an error report, but you’d need iter for that.)

vstinner · January 23, 2024, 3:11pm

I’m fine with having separated API for UTF-8 encoded and locale-encoded strings. In that case, I would prefer UTF-8 to be the default as PyInitConfig_SetStr, and PyInitConfig_SetLocaleStr for the locale-encoded string.

Implementating PEP 587 “PyConfig API” was a huge piece of work and I left the code to handle a few actions in the “initialization” for backward compatibility. For example, if you can Python with python --help, Py_InitializeFromConfig() will return “an exception” and the exit code set to 0. I need to investigate how doable it would be to be able to move this code into Py_RunMain() to make the API easier to use.

The problem is to catch typos in configuration option names. If you set module_search_path instead of module_search_paths, it will be silently ignored. Maybe we can do the opposite and reserve a prefix to all Python options, such as py:? For example, set py:verbose? Any name not string with py: would be a custom option.

You can only call these functions after Python is initialized. Is it an issue?

If you configure the Python initialization, you have a PyInitConfig structure that you can use to set options (but you cannot get options from it).

vstinner · January 23, 2024, 3:24pm

Adding new options is fine, but I’m not sure if removing options would be acceptable? Should we support deprecated/removed options forever? Or can we just fail with an error?

I don’t recall examples of removed configuration options. My hope is that legacy_windows_fs_encoding and legacy_windows_stdio options will go away at some point, but I’m not sure about it.

I wasn’t sure about that, I tried to explain that the limited API is optional. I will try to rephrase that.

The latter. Apparently, my phrasing is confusing. I mean that since PEP 587 was implemented in Python 3.8, some APIs started to be deprecated.

The Backward Compatibility section says that PEP 587 PyConfig API is still supported, there is no plan to deprecate it. For me, it’s almost a different use case (different constraints).

PyInitConfig is an opaque structure, it’s different. I don’t see how we can reduce the number of structures without affecting the backward compatibility.

That would be incompatible with the limited C API, since PyConfig and PyPreConfig members are not part of the limited C API. I would prefer to avoid that.

PyConfig is part of PyInterpreterState, whereas PyPreConfig is part of PyRuntimeState. The PyStatus is only used to report failures to the Py_InitializeFromConfig() caller, it shouldn’t be stored at runtime. Also, changing these structures might affect the backward compatibility.

When PyConfig members are only used to initialize configuration options, PyConfig_Get() gets the runtime configuration option value, not the PyInterpreterState.config value which was used for initialization.

For example, sys.path is always different than PyConfig.module_search_paths, since the site module is executed after sys.path is initialized from PyConfig.module_search_paths. PyConfig_Get("module_search_paths") gets sys.path.

Some options are not copied outside PyConfig, such as PyConfig.tracemalloc. Some options are copied but cannot be modified at runtime.

I think that it’s more useful to use what the user gets/sees at runtime, rather than what was used to initialize Python. That’s why it’s called PyConfig_Get() and not PyInitConfig_Get().

I prefer to write the smallest API, unless a strong use case to justify adding more options.

pitrou · January 23, 2024, 3:32pm

I’m sure we can deprecate options like we do with any other API that doesn’t have a purpose anymore. We can then remove them after a few releases if we want to.

pitrou · January 23, 2024, 3:39pm

I imagine it might be nice to inspect existing configuration before trying to modify it. In general, it seems weird and error-prone to have a getter API that requires more precondition than the comparable setter API

steve.dower · January 24, 2024, 4:42pm

My initial concern with this proposal is that I don’t believe there’s a real benefit to having initialization in the limited API.

While theoretically an embedder could use the limited API, they really ought to know the version of Python that they are using, and very likely need to know it at compile time.

Note that the APIs being discussed are only useful for embedders. Every other time we’ve had this discussion this point gets missed, so let me elevate it up front.

Certainly on Windows, every scenario where you would embed CPython is best done with your own private copy of it, and not by searching the user’s machine for an install. If for whatever reason you must use the user’s existing install, you should launch Python with your own script and import an extension module (also known as running out-of-process). The security implications of loading arbitrary code (with network access!) into your own process are terrifying, and we should not encourage it. Also, generally people want to load an existing install to get the existing 3rd party modules, but chances are the search paths will be wrong due to the excessively complicated system we have for calculating it.

I want to see a rationale for why we need version-independent embedding before we commit to a version-independent initialization API, regardless of how the API looks. That rationale isn’t in the PEP yet.

vstinner · January 24, 2024, 5:00pm

Adding an API to the limited C API was requested by different users:

FR: Allow private runtime config to enable extending without breaking the `PyConfig` ABI - #9 by ronaldoussoren
FR: Allow private runtime config to enable extending without breaking the `PyConfig` ABI - #10 by gpshead
FR: Allow private runtime config to enable extending without breaking the `PyConfig` ABI - #11 by LostTime76
[C API] PEP 741: No limited C API to customize Python initialization (PyConfig, PEP 587) · Issue #107954 · python/cpython · GitHub

Sure, I will try to summarize the rationale for adding such API to the limited C API.

steve.dower · January 24, 2024, 5:22pm

No, the first two requested struct stability within a major release, and the third wants an initialization API that doesn’t rely on C structures. The fourth points out that closing the issue about a limited API implies that there is now a limited API, which is not the case, but doesn’t provide any motivation or rationale for actually wanting it.

There are actual user requests on Discourse and GitHub if you want to find them. Probably the easiest way to find them is to look for my replies explaining why they will be happier if they embed their own copy of Python rather than doing what they think they want to do

Summarising “don’t break structure between 3.x.y and 3.x.(y+1)” doesn’t justify the proposal to not change the API between 3.x and 3.(x+N).

steve.dower · January 25, 2024, 12:46am

Getting further into the PEP, I’d really like to get away from PreInitialize and Initialize methods.

With a lot of the cleanup that @eric.snow (and others) have done for subinterpreters, we should be getting very close to having a separation between “initialize runtime” and “initialize the first interpreter”. This I believe makes more sense, and is less built around how libc works.

We may still be a release away from actually having that separation, but we can see it coming, and I would much prefer not to formalise the old nomenclature forever in the meantime.

The work that is currently done in PreInitialize is to enable the program to correctly parse locale strings. I believe that should be done in the host application (i.e. python.c) rather than in the runtime (i.e. libpython.so). In practice, we may still provide a helper function from libpython to do the parsing for standard variables, but embedders should be able to simply omit the global locale changes we make, and simply skip the environment parsing we do. But essentially, I want PreInitialize to become redundant.

vstinner · January 28, 2024, 1:55pm

That’s why there are two configurations:

“Python” configuration is to write a program which is almost like “python” but with minor differences, so configure locales and C stdio the same way. It’s to write an “application”.
“Isolated” configuration leaves the current process unchanged, don’t touch locales for example: it’s to embed Python in an existing application.

Obviously, people are doing things in the between, that’s why there are configuration options

encukou · January 29, 2024, 8:08am

That’s not my experience on Linux: I want the system Gimp, OBS or Tiled to default to the system Python – or, preferably, let me configure which Python to use.
With things like Flatpack, things are moving toward what you describe. But I don’t think we got to the point where Python should drop support for the “old way”. (And I’d still prefer being able to assemble self-contained sandboxes by linking rather than re-building, but I’m not quite in a position to steer toward that.)

Thinking about the GUI apps I mentioned: running scripts out-of-process gets quite complicated if the script wants to show a clickable button or live visualisation. It’s not impossible, but the necessary architecture is usually more expensive to maintain than, say, wrapping C++ objects.

I generally agree that we should do less of that. But how terrifying this is does depend on your security model. (IMO, it’s not worse than grabbing random stuff off PyPI.)

Yeah, but it was possible to get it working. And if somebody did, I can imagine them being upset when Python removes the API – especially without a full replacement.

steve.dower · January 29, 2024, 1:28pm

As a user, that’s your preference. From the POV of the developers trying to support that, it’s usually a nightmare that they come to regret. If we were better at compatibility between versions, maybe, but the rate that we add behaviour changes (not just API changes) means that they’ll spend way too much time trying to reproduce differences reported by users, and eventually they’ll prefer to have just had a fixed version for their integration.

I think we’re better off helping them integrate a single Python runtime more tightly into their app, and provide ways to then help users do what they want to do, such as installing additional packages or connecting into IDEs. This isn’t hypothetical either, I’ve actually worked with products who do this and have often tried both ways.

Out-of-proc is definitely more complicated up-front, and requires a bit more imagination when it comes to architecture. But it also quickly gets outweighed by the fact that your IPC machinery is going to be much easier to debug and maintain across multiple/arbitrary Python versions than an in-proc integration. Again, I’ve done all these approaches before, so I’m not hypothesising - I’ve seen the results of both approaches.

“Distributing an app that loads arbitrary DLLs” isn’t even the same people who would be grabbing random stuff off PyPI, so maybe it’s no worse from the POV of a developer, but it’s 100% different from the POV of a sysadmin deciding whether to allow your app into their network or not (they probably don’t allow PyPI either, which is fairly common).

An OBS that can load DLLs from a user’s download folder and implicitly grant them access to the stored login credentials for a social media account isn’t going to be popular with IT departments. We don’t want that to happen because we told them they should load whatever Python they can find.

You were lucky to get it working I don’t think even I could’ve made it work on purpose back when I was deep inside getpath all the time. Far easier to just launch the existing install, print out sys.path and inject it into the version you’ve got (and hope they’re compatible, but you were already hoping that so you’re no worse off).

steve.dower · January 29, 2024, 1:30pm

I know that’s why it was done like that, but I don’t think it should’ve been (and opposed it at the time, though my idea of how it should be is more fleshed out now).

libpython should only offer the “isolated” configuration. python.c should implement the “Python” configuration, and make that code easy for embedders to copy and adapt.

encukou · January 29, 2024, 2:49pm

Well, our experiences differ. IMO, /usr/lib64/libpython3.so is not quite an arbitrary DLL. No Downloads folder needed. And if an app can stick to stable ABI, there aren’t really that many behaviour changes to handle – especially if the Python scripting support is aimed at people familiar with Python. Of course, the situation changes as the integration matures, but IMO “bring your own Python” is a good first step to add Python scripting – and it’s also a good option to provide for advanced users.
Forcing app devs to become redistributors of CPython is very, very heavy.

Anyway: Your ideas aren’t bad, but IMO this should all be designed, communicated, and tried out before we remove the API to do it “the old way”.
That’s not the situation we’re in here. If we want to try this way, we should revert the removals first.

steve.dower · January 29, 2024, 2:58pm

I thought we were discussing adding stuff to the limited API? Yes absolutely, if anything has been removed, bring it back.

We ought to design properly before adding anything to the limited API, and I’d argue we should design properly before changing anything that isn’t a blatant bug in the current API. Embedders deserve some stability, and we should make the next change a significant improvement for them.

That’s fair, but you also said you want users to customise it. Selecting between system-installed versions is fine, IMHO, but that’s a far more limited scenario. And even then, I’d still contend that developers will have a happier time if they restrict it to a single version and avoid picking up all the various site-packages that will be there.

encukou · January 29, 2024, 3:13pm

+1
@vstinner Do you agree?

Except if they want to play with a newer Python version – perhaps to catch the new bugs.
Yes, loading random DLLs shouldn’t be the default, but it’s useful if you know what you’re doing. We should make it harder to exploit than, say, import ctypes, but not prevent it entirely.

steve.dower · January 29, 2024, 5:43pm

I mean, the developers of the app can do whatever they like. And if you’re looking for the kind of crashes that will happen when you swap out a CPython version without changing the app, I’m prepared to consider you a developer of the app.

The scenario is that the users are merely downloading, installing and using an app, and expect it to be reliable. OBS is a great example - the vast majority of users will never be “developers” of it, and probably most users of Python in it won’t be Python developers either, but will have copy-pasted something that they needed to work.