FR: Allow private runtime config to enable extending without breaking the `PyConfig` ABI

eric.snow · October 16, 2023, 7:54pm

It kind of sounds like you would have benefited from PEP 432, with its multiple init phases.

Without that, it’s unclear how initialization is meant to operate relative to commandline args that necessarily are not parsed until after init finishes (since Python code cannot be executed until then). I suppose you could initialize twice. That would look something like:

Py_Initialize();
PyObject *parsed = process_cli_using_python_code();
PyConfig config;
populate_config(&config, parsed);
Py_Finalize();

Py_InitializeFromConfig(&config);
...

Regardless, changing some settings after initialization has finished shouldn’t much of a problem technically, though it would still violate some of the involved design. For other settings, it might actually be a problem changing them later. The PEP 432 phases might have helped with some of these, but probably not all of them.

CC @ncoghlan

vstinner · October 16, 2023, 8:58pm

It’s already possible to create a basic Python interpreter which does nothing and does not parse command line arguments, run a custom command line parser in Python, write the output somewhere, destroy the interpreter, and then create a second interpreter configured from this output.

The first step can use PyConfig.parse_argv=0 to not parse the command line.

ncoghlan · October 17, 2023, 2:30am

The runtime reconfiguration question seems like it has diverged quite a ways from @gpshead’s original topic of providing a more ABI-friendly alternative to adding new config settings as fields in a public C struct (which is a fair request, and I think @vstinner is on the right track to resolving it).

For runtime reconfiguration, I think it genuinely needs to be handled on a setting-by-setting basis. Victor made a valiant attempt to support full reconfiguration while working on the changes that were eventually written up as PEP 587, but those efforts were ultimately unsuccessful: you end up with unresolvable runtime data compatibility problems, especially where decoded text strings are concerned.

Thus PEP 587 omitting PEP 432’s ambition to allow arbitrary Python code execution during an intermediate initialisation phase: either that intermediate interpreter state ended up so limited as to be effectively useless, or else you risked situations where objects created during that intermediate state would be mishandled if they were left alive after initialisation finished.

That concern only affects some of the config settings, though. For others, changing them after startup is potentially reasonable, it’s just a question of whether a mechanism exists to allow it (and whether that mechanism is clearly documented and explicitly tested).

steve.dower · October 17, 2023, 12:45pm

I really think we ought to start thinking about a lot of these parsers and settings as being handled by the python.exe entry point, and not inherently part of the runtime. That would likely make it more obvious that in MAL’s case, the parsing also has to happen before the runtime/interpreter can be fully initialized.

I was hoping to spend last week trying to move more of initialization into python.c, but got distracted by other things.

But I don’t see anything wrong with making more of these strictly initialization parameters, and after the interpreter is created^[1] then they can’t be changed.

Some are runtime parameters, though not many, while most are interpreter parameters, and should be settable for each interpreter when its created. ↩︎

encukou · October 17, 2023, 1:33pm

Yup, that looks like a good direction to take: Python would read the init config struct once in Py_InitializeFromConfig, and then it wouldn’t reference that data again, leaving the user free to deallocate it or tweak it for another init call.
That would bring clarity re. which settings can be changed and which affect the running interpreter.

Of course we’ll need another API for changing the running interpreter…

vstinner · October 17, 2023, 1:48pm

I prefer to treat “init config” as read-only at runtime, since some options exist in multiple copies and I’m not sure that Python is always strict about keeping them consistent. For example, there are:

sys.flags.dont_write_bytecode (read-only)
sys.dont_write_bytecode (read/write)
PyConfig.write_bytecode (read-only)
Py_DontWriteBytecodeFlag (read/write, deprecated)

importlib uses sys.dont_write_bytecode in Lib/importlib/_bootstrap_external.py.

By the way, see also PEP 726 – Module __setattr__ and __delattr__: if this PEP is accepted, we will have more options to keep these copies consistent.

steve.dower · October 17, 2023, 3:35pm

Also PyInterpreter_NewFromConfig or some similar API (I’m sure Eric added one already, but I’m not 100% sure on the name).

There are a lot of fields in the current init config that don’t need to be identical for every interpreter, particularly when creating them from native code. I think we can almost argue that most of them are interpreter-specific and not runtime specific, especially if we move “decode environment variables to UTF-8” out of the “runtime” and into the entry point (memory allocators are already going to be fixed).

So the runtime init is literally just process-wide state (signals, etc.) and the rest is in the interpreter init.

IIUC, you can only have one interpreter per thread (specifically, one OS thread can have one Python thread state which has one Python interpreter). So what we really need is to be able to safely free an interpreter and create a new one on the same thread. If you want two in parallel, you would use a separate thread, or we make sure that setting the thread state for the current thread is reliable (which I doubt is possible in the general case, as the native TID leaks through pretty easily, but otherwise it could work).

encukou · December 1, 2023, 1:11pm

For the use case in the original post, it would be good to have API to distinguish error cases from PyInitConfig_Set* – at least:

the config key does not exist (this error should be ignored if you don’t know your Python has the new security option)
the value/type is invalid (this error should probably propagate to the user)

vstinner · December 1, 2023, 1:20pm

I think that the original feature request was different: @gpshead wanted to have the ability to set custom configuration keys which don’t exist in PyConfig. I wrote previously that my design would make it possible, but my implementation doesn’t support it.

You’re not supposed to set unknown config options. I suggest handling “unknown option” and “invalid value” the same way in the caller: log the error and exit. The code should be fixed or the value should be changed.

encukou · December 1, 2023, 1:38pm

Well, the request is to set keys that’s known in new Python versions (possibly with a backport of a security fix that adds the key – so you can’t rely on the version to detect if the key is present). For example, set max_str_digits to 0 to preserve previous behaviour.
IMO, for that case it makes a lot of sense to ignore “unknown key” only.

vstinner · December 1, 2023, 9:10pm

I updated my PyInitConfig PR. It’s quite big, so I extracted the changes just to add PyConfig_Get() and PyConfig_GetInt(): PyConfig_Get() PR.

API:

// Get a configuration option as a Python object.
// Return a new reference on success.
// Set an exception and return NULL on error.
//
// The object type depends on the configuration option. It can be:               
// int, str, list[str] and dict[str, str].
PyAPI_FUNC(PyObject*) PyConfig_Get(const char *name);

// Get an configuration option as an integer.
// Return 0 and set '*value' on success.
// Raise an exception return -1 on error.
PyAPI_FUNC(int) PyConfig_GetInt(
    const char *name,
    int *value);

Example:

    int get_verbose(void)
    {
        int verbose;
        if (PyConfig_GetInt("verbose", &verbose) < 0) {
            // Silently ignore the error
            PyErr_Clear();
            return -1;
        }
        return verbose;
    }

This API is to give again access to these configuration variables:

Py_DebugFlag
Py_VerboseFlag
Py_QuietFlag
Py_InteractiveFlag
Py_InspectFlag
Py_OptimizeFlag
Py_NoSiteFlag
Py_BytesWarningFlag
Py_FrozenFlag
Py_IgnoreEnvironmentFlag
Py_DontWriteBytecodeFlag
Py_NoUserSiteDirectory
Py_UnbufferedStdioFlag
Py_HashRandomizationFlag
Py_IsolatedFlag
Py_LegacyWindowsFSEncodingFlag
Py_LegacyWindowsStdioFlag
Py_FileSystemDefaultEncoding
Py_HasFileSystemDefaultEncoding
Py_FileSystemDefaultEncodeErrors
Py_UTF8Mode

vstinner · December 5, 2023, 5:47pm

Having to handle errors “just” to read a configuration option by its name can be annoying:

        if (PyConfig_GetInt("verbose", &verbose) < 0) {
            // Silently ignore the error
            PyErr_Clear();
            ...
        }

Later, if needed, we can consider to provide even more specialized function, such has: int PyConfig_GetVerbose(). With a specialized API, there would be no need to handle “unknown option” error case.

For now, I would prefer to only provide a minimalist API: only 2 functions, one returning a Python object, one returning a C int.

steve.dower · December 5, 2023, 6:30pm

Yes, error handling is annoying, but we need to be able to return errors.

Given the variety of reasons these functions could fail, particularly those that access sys, I don’t think we get away from having to set exceptions.

Personally, I’d much rather have exceptions be set than add a new public API for every possible case.

pitrou · December 5, 2023, 7:39pm

Well, you can always add error-swallowing functions for convenience, e.g.:

// Get a configuration option as a PyObject.
// If configuration option `name` exists, return a new ref to its value.
// Otherwise, return a new ref to `default`.
// This never raises an exception.
PyAPI_FUNC(PyObject*) PyConfig_GetOrDefault(
    const char *name,
    PyObject* default);

// Get a configuration option as an integer.
// If configuration option `name` exists and converts successfully to a C int,
// return the int value.
// Otherwise, return `default_value`.
// This never raises an exception.
PyAPI_FUNC(int) PyConfig_GetIntOrDefault(
    const char *name,
    int default_value);

vstinner · December 5, 2023, 11:08pm

Well, that’s basically what my example in the doc does.

In the API that we provide, I would prefer to let the caller decides how to handle the exception. The exception can be “the config option doesn’t exist” or something else. Ignoring “any error” doesn’t sound like a future-proof API to me.

steve.dower · December 6, 2023, 12:06am

It’s not bad, though. Either “any error” will always be fatal, in which case they all bubble out, or “any error” will be non-fatal in which case it’ll be cleared.

The only way to improve is to specify which errors will be raised under which circumstances, and that is not very future-proof (it’s hard to specify them without preventing us from adding new errors later on).

Distinguishing between “sys.verbose has not been set” and “sys.verbose was set to a non-int” isn’t really useful for a program. Both are pretty blatant error conditions. You could argue that a typo (“vrebose”) will be hidden, and that’s true, but only until the user tests their code (and I think we’re allowed to assume that users test their code - we don’t have to take all that responsibility on ourselves via the API design).

pitrou · December 6, 2023, 9:21am

I would certainly suggest that a non-int value for an int config option emit a warning… except that Python warnings can be turned into runtime errors, which necessitates error checking in the caller.

malemburg · December 6, 2023, 10:08am

Same here.

We should try not to add APIs anymore which clear errors – this can too easily clear errors which were set (and not tested for) in completely different parts of the code and then lead to data corruption.

vstinner · December 7, 2023, 7:07pm

About the non-int issue, PEP 726 – Module setattr and delattr is waiting for a Steering Council. It might be a solution to introduce errors in the sys module.

steve.dower · December 7, 2023, 8:22pm

We don’t need a solution - there’s already a need to do error handling everywhere you might get a PyObject*, including when it’s going to be converted to a native type. All that raising on assignment will do is move when the user sees the error. It’s not going to affect the API design of getting a value.