FR: Allow private runtime config to enable extending without breaking the `PyConfig` ABI

steve.dower · October 2, 2023, 8:32am

I much prefer the second API. Let programs parse their own config files and feed in name/values explicitly (I would like to switch ._pth and probably pyvenv.cfg handling to use this API if possible).

However, I would like the known arguments to be defined as constants, perhaps in a header file that you have to explicitly import. That way initialization code can do #ifdef checks rather than version comparisons.

#define PYINIT_HASH_SEED "hash_seed"
...
#ifdef PYINIT_HASH_SEED
PyInitConfig_SetInt(config, PYINIT_HASH_SEED, 10)
etc.

(This doesn’t break the stable nature of the API, because code compiled in a later version will embed the constant, and the implementation on the earlier version will fail to recognise it. The same could be said for a regular int enum, but I’m totally okay with string keys here, provided we don’t need to keep an extra copy of them around.)

encukou · October 2, 2023, 9:18am

FWIW, type & module slots have the same issue. We’re designing need a type-safe, forward compatible way of setting and getting “properties” at the C level; whatever we come up with should feel consistent with the future API for any other slots.
And the concept you propose sounds great! Just please put it in an approved PEP before implementing it so we can polish the API.

We’ll eventually also need getters, such as:

PyInitConfig_GetInt(config, key, int *result)
PyInitConfig_GetStr(config, key, char **buffer, size_t *length) - fails if length is too short, always sets length to the size of the string

I don’t see an issue with adding a text-parsing API first, and a fast type-safe one later. There’s even a middle ground that might be useful:

PyInitConfig_SetFromConfigString(config, key, const char* value) – parse a NUL-terminated string and set the value based on the result. Usable for any type of key.

vstinner · October 2, 2023, 9:33am

The difference between an API to configure Python initialization and an API to access PyTypeObject is that we can make some trade-offs in terms of performance overhead for the config API: it’s ok to use strcmp(). For PyTypeObject, nope, it must be as fast as possible.

Are you proposing to get configuration before the initialization, or after Py_InitializeFromInitConfig ()?

Before, is it really useful? If you want a specific value, just set it, no?

When I designed PEP 587, I proposed a way to: (1) read the config which included the “Path Configuration”, (2) modify the config including “adding a path to module_search_paths”, (3) “write” the config (initialize Python). Since Python 3.8, this part changed: the path configuration is no longer “read” before the actual Python initialization, it’s only computed “after” Python initialized.

After Python initialization, I am not sure if PyInitConfig would be a good fit to read the current “initialization” state.

There were long discussions about exposing PyConfig which went nowhere (no API was added):

Problem: private APIs to access PyConfig are gone (moved to the internal C API) in Python 3.13.

Maybe we need a new stable ABI / API to get “runtime” configuration, but it would have no “config” argument, since values would be read from “Python”:

PyConfig_GetInt(name: str)
PyConfig_GetStr(name: str)
etc.

For options accessed most commonly, we can consider adding specialized API to avoid name string, to provide best performance. I’m thinking about Cython which needs to access a few config options frequently.

Notice the difference between “init config” (before Python is “created”) and “config” (runtime config, while Python is running).

@encukou: Would PyConfig_GetInt() fit your use cases? Or do you think that PyInitConfig_GetInt() is needed?

vstinner · October 2, 2023, 9:38am

I gave up the “configuration file” API proposed by @gpshead when I started to think how such API would be used. What killed my idea is to handle strings… It’s very annoying to have to format a string for a configuration file I used TOML format as a reference. TOML requires that a quote character (") is escaped as \". It means that you cannot simply pass a string. Instead, you have a allocate a buffer (allocate len * 2 characters), parse the string and escape each quote Ok, maybe we can provide an API for that, sure. But then. There are list of strings. Again, it’s not that easy to format a list of strings.

Formatting data as a configuration file is quite complicated, whereas in C we are used to manipulate basic types like int and char*. The second API with SetInt() and SetStr() is closer to these basic types.

steve.dower · October 2, 2023, 10:02am

I wouldn’t even be upset if we took lists of strings as null separated, eg: "ITEM 1\0ITEM2\0 ITEM 3\0".^[1] It can be a bit of a pain to construct dynamically, but not as bad as having to write code to construct a dynamic list from static data.

There’s still an implied \0 at the end, which means it is double null terminated and you can’t pass an empty element. ↩︎

malemburg · October 2, 2023, 10:05am

FWIW and as a real use case: For eGenix PyRun, I need a way to access some of the Python config settings in Python and after having initialized the interpreter (essentially, all settings which can be set from the Python command line or via env vars). Both for reading and writing.

Before 3.11, I could do this via accessing the global config variables from C. Since 3.11, this no longer works. I can either hack my own APIs and patch the interpreter, but I’d prefer to use standard Python C APIs for this or at least have ways of setting the parameters via the sys module.

In the current eGenix PyRun, I have to initialize the CPython runtime using Python, since the frozen main entry point does not use the command line parsing of the regular main() function.

encukou · October 2, 2023, 10:55am

It’s a bit ironic – for PyTypeObject we could technically wrap values in PyObject, but don’t want to, for performance reasons. For init config, it can be a bit slow, but we can’t use PyObject yet.

I’m thinking about the general API, for all kinds of slots. IMO, we should design those together to make them coherent.
For init config specifically, you’re right that readers aren’t that useful. (But I bet someone will ask for them anyway…)

Well, the implied argument is PyThreadState or similar. If we’re adding a big chunk of new API, we might want to start making that explicit.
(And of course we’ll want to allow for signalling failure, and address ownership of any “string” result, too.)

vstinner · October 2, 2023, 3:48pm

While I can have some sympathy with such API, IMO passing an array of strings as size_t length, char** items is more convenient with programming languages which don’t use nul-terminated C string natively. See my example: you just pass strings, no build/free operation. You don’t have to build a new string concatenating other strings. Also, if you don’t pass the list length, how do you handle empty strings in the middle of the array. Are you at the end yet, or not yet?

In the past, Modules/getpath.c had a limitation, a path couldn’t contain ; character since Python stored an array of strings a string separated by ; even if quickly later, it will split the string at ; character to create an array. Problem: ; character is valid in paths and we got bug reports getpath.py now use arrays (it’s natural in Python), and so there is no such issue anymore!

vstinner · October 2, 2023, 3:50pm

For performance-critical structures like PyCodeObject, PyFrameObject and PyThreadState, so far, we added specialized getter/setter for a specific member.

Maybe depending on the frequency of usage of a member, we can provide a slot-based (using int or string key), or a specialized API for most commonly access members. We can start with an unique slot-based API, and then add specialized APIs depending on feedback. It’s not exclusive.

vstinner · October 2, 2023, 3:57pm

I’m well aware of the issue, that’s why I created an issue to design a new API to replace the removed API. A limiation of the old API is that it didn’t give access to all of the 66 PyConfig members, but only around 20 members. Also, it was hard to keep them consistent, since nothing prevents setting a global variables after Python initialization.

I also have concerns about consistency between PyConfig and sys attributes. That’s why I’m interested by PEP 726: Module __setattr__ and __delattr__: see PEP 726: Module __setattr__ and __delattr__ - #22 by vstinner discussion.

gpshead · October 3, 2023, 2:07am

Can you clarify what you mean by “and writing”. I get that you want to read the existing startup config. But writing to the init config after the interpreter has been initialized feels like unsupportable bad idea territory. Perhaps I’m not understanding what you meant.

encukou · October 3, 2023, 6:51am

Yup, that makes sense.
And if it’s not performance-critical, getattr/setattr works for these. (Except PyThreadState which tends to be exposed vie the sys module and similar places).

Well, we need it for “configuration” structs: runtime config, class/module slots. These:

Have several members of the same type
Are likely to be extended in the future (even in point releases – see the beginning of this thread)
Are not Python objects, so you can’t use getattr/setattr
Can survive the performance penalty of a slot ID lookup

malemburg · October 3, 2023, 10:04am

I will need to set some of the global runtime variables we had after the interpreter was initialized. Specifically, I’m currently setting these variables:

optimize level
verbose level
debug level
inspect flag
don’t write bytecode flag

(and I’d like to expand this list to all variables that can be set via the command line)

This has been working just fine up until 3.11, when Victor removed support for this. The global variables still exist in 3.11, but setting them after interpreter initialization no longer has any effect, so I will have to dig deeper and ideally would like to use proper C APIs for this.

Just to add more context: eGenix PyRun is a version of the CPython interpreter which freezes most of the stdlib into a single binary on Unix using the freeze tool. Because the frozen binary entry point does not run the usual command line parsing, I am emulating a lot of the startup logic in Python (which actually makes the whole thing much easier to understand and maintain, IMHO, much like importlib replaced loads of C code). I’m currently preparing a version to put up on Github for easier access. It’s been open source ever since I started the project, but was part of our internal repo.

vstinner · October 3, 2023, 2:09pm

Ok, I added a C API to get the runtime configuration:

int PyConfig_GetInt(const char *key, int64_t *value);
int PyConfig_GetStr(const char *key, PyObject **value);
int PyConfig_GetStrList(const char *key, PyObject **value);

Raise ValueError if the key doesn’t exist.
Raise TypeError if it’s the wrong type.
PyConfig_GetInt() raises OverflowError if the value doesn’t fit in int64_t: it cannot happen with the current implementation.

vstinner · October 3, 2023, 2:20pm

sys.flags is read-only:

>>> sys.flags.verbose = True
AttributeError: readonly attribute

I don’t know the rationale, but since it’s not possible to change these flags, why should it be possible at the C level?

Why trying to set these options at runtime, while you can set them at Python initialization?

verbose level, debug level, don’t write bytecode flag: they should be set ASAP, since many modules at imported at startup, so set them during Python initialization, no?
optimize level: compile() and compileall have optimize level. Why do you want to override the default?
inspect flag: does it have an effect to be set after Python starutp?

I don’t think that touching PyConfig is the right place, since PyConfig is used to populate Python objects like sys.flags. If you modify PyConfig.inspect, sys.flags.inspect is not updated.

I understand that before it was possible to override global configuration variable after Python startup. But well, maybe it was a bad idea The PyConfig change is an opportunity to revisit the design and decide how Python should be reconfigured at runtime, ot not.

If we want to allow changing some configuration at runtime, I would prefer to have a better API to “write into PyConfig”. We should provide some consistency, for example between PyConfig and the sys module.

The current private _testinternalcapi.set_config() API calls _PyInterpreterState_SetConfig() which tries to keep such consistency. But I dislike the granularity of this API: it rewrites all PyConfig members, and it breaks Python The main issue is that PyConfig contains an updated version of the Path Configuration (ex: sys.path). set_config(get_config()) removes all sys.path changes done by the site module and further changes. Again, you see, touching PyConfig is the wrong way to go.

In the past, it was also even discussed to remove PyInterpreterState.config because of these inconsistencies. Like PyConfig should only be used to create PyInterpreterState, and then thrown away.

vstinner · October 3, 2023, 2:33pm

I looked into the Python C source code to count the frequency of PyConfig members usage, counting line numbers. I didn’t check yet if these lines are part of “hot code” or not.

Even if PyConfig_GetInt() / PyConfig_GetStr() is slow, the result can be cached. I’m not sure if the performance is really a blocker issue. Also, string key => PyConfig member lookup can be optimized with a hash table if we consider that it’s a huge performance bottleneck. But I would prefer to avoid that

PyConfig members accessed in more than 1 line of code:

verbose: 6
filesystem_errors: 6
run_filename: 5
filesystem_encoding: 5
stdio_encoding: 4
optimization_level: 4
inspect: 4
bytes_warning: 4
use_environment: 3
run_module: 3
run_command: 3
program_name: 3
prefix: 3
interactive: 3
_install_importlib: 3
tracemalloc: 2
stdio_errors: 2
site_import: 2
perf_profiling: 2
parser_debug: 2
parse_argv: 2
legacy_windows_stdio: 2
hash_seed: 2
executable: 2
check_multi_interp_extensions: 2

PyConfig members accessed in a single line of code:

xoptions: 1
use_main_obmalloc: 1
use_hash_seed: 1
skip_source_first_line: 1
show_ref_count: 1
safe_path: 1
quiet: 1
pathconfig_warnings: 1
isolated: 1
int_max_str_digits: 1
install_signal_handlers: 1
_init_main: 1
home: 1
gil: 1
faulthandler: 1
code_debug_ranges: 1
check_hash_pycs_mode: 1
buffered_stdio: 1
base_executable: 1
argv.length: 1
argv.items: 1
argv: 1
allow_threads: 1
allow_fork: 1
allow_exec: 1
allow_daemon_threads: 1

vstinner · October 3, 2023, 2:43pm

With my proposed PyInitConfig API, we can accept custom options and store them in a separated hash table, and later expose them as a dict.

For example, if we add sys.get_config() which would return PyConfig options: we can add these custom options to that dict. So users can be free to use them for their on purpose.

It would be a private alternative to the existing public PyConfig.xoptions storage which is more public since it can be controlled by the user by passing -X options on the command line.

I would just request to have to set an opt-in option to announce that we are going to set “custom options”. So by default, typos in PyConfig members report an error. Example in C:

PyInitConfig_SetInt("accept_custom_options", 1);
PyInitConfig_SetStr("my_custom_key", "value");

And later retrieve it in Python:

my_custom_key = sys.get_config()['my_custom_key']  # str

Or it can be a dedicated API (overkill?):

my_custom_key = sys.get_custom_config()['my_custom_key']  # str

The type of these options depends on which PyInitConfig_SetXXX() function is used: intr, string or list of strings. Since Python doesn’t know these options, it cannot check their types. It’s up to the consumer of these options to handle the type.

@encukou: Yeah, I’m collecting user cases, and waiting for feedback on my proposed API, and then I will design a public API which should address all use cases at once:

Stable C API to set init config options and to set custom config options
Public C API to get a runtime config option
Python Python API to get runtime config options: sys.get_config() -> dict?

steve.dower · October 3, 2023, 3:25pm

Random side thought - wouldn’t it be great if we could reuse the HAMT implementation for this? It looks like it’s tied to PyObject*s, but maybe we’re close enough to creating a subinterpreter^[1] for initialization that we could create the values in that?

A basic enough one to not need any initialization settings, e.g. only frozen imports, only UTF-8, etc. ↩︎

vstinner · October 3, 2023, 5:18pm

Usually, when I need a hash table and I really cannot use a Python dict, I use the private _Py_hashtable C API. It’s a simple hash table implementation, good for most usage tracemalloc uses it internally.

malemburg · October 13, 2023, 6:04pm

I have explained this already numerous times: I have implemented the Python command line parsing and startup procedures in Python and thus need to set these from Python, after init.

Topic		Replies	Views
PEP 741: Python Configuration C API PEPs	54	2266	February 9, 2024
Add a #define UNSTABLE_ABI for tools and extensions like Cython and Numpy C API	14	576	February 8, 2023
Extending subinterpreters with sandboxing capabilities Ideas	6	431	February 9, 2024
Python 3.9.4 hotfix is now available Committers release	0	2066	April 4, 2021
PEP 689 -- Unstable C API tier PEPs	16	1568	February 15, 2023

FR: Allow private runtime config to enable extending without breaking the `PyConfig` ABI

Related Topics