Problem statement
We need the ability to extend the configuration of the Python runtime within patch releases where we cannot change public structures and thus break a releases ABI. We don’t do this often, but security fixes can require adding configuration settings. A past example of this is the hash randomization feature. (a new, still embargoed, need for this is in the works)
Python 3.8 added our suite of PyConfig
based APIs via PEP 587 – Python Initialization Configuration | peps.python.org. This cleaned up a lot of things, good! But it has a downside: It resulted in a public C struct
full of configuration options (including a few fields awkwardly called “private”). This is struct PyConfig
currently seen in cpython/initconfig.h at main · python/cpython · GitHub.
We’re free to alter struct PyConfig
between minor releases so long as we don’t remove fields, it is not a cross-version stable ABI as far as I can tell. But when we need to add more configuration in a security patch release we’re back to resorting to ad-hoc out of band configuration mechanisms because struct PyConfig
must not be changed within a release.
Proposal
We could add an additional “extended config” concept. This should explicitly NOT be in the form of a public struct
. I suggest it take the form of a string containing newline key value pairs in a trivial format. Likely simply "key=value\n"
. A pointer to this extended text based config would be added to struct PyConfig
and parsed during Py_InitializeFromConfig
to fill values in wherever they belong. Along with a pointer to an opaque private struct defined in Include/internal/ that we’d be free to change even in patch releases.
Questions
-
Do we allow existing
struct PyConfig
field names to be set via this?-
using their struct field name or using their
-X
flag name for those that have one? - I propose text based settings always override fields during InitializeFromConfig. the ultimate goal of this is that people could use an entirely text based config instead of C struct fields. (Maybe we’d even want to encourage that very long term if deprecating a lot of the struct ever becomes desirable?)
-
using their struct field name or using their
For our users sake, we should probably flag unknown field names as a config error. BUT we need the concept of intentionally non-error causing value settings so that code can be written that works across Python versions without a huge pile of minor or patch release version check ifdef hell.
-
Should we allow a special
"unknownok:"
string prefix on the key name to allow setting of things that may not be an existing known key / feature in the currently running python release.
This could look something like this as a user
{
PyConfig config;
PyConfig_InitPythonConfig(&config);
PyStatus status = PyConfig_SetString(
&config, &config.text_based,
L"check_hash_pycs_mode=always\n" // could've been set in struct
L"unknownok:avoid_medusas_gaze=yes\n" // new security patch feature
);
if (PyStatus_Exception(status)) {
goto fail;
}
status = Py_InitializeFromConfig(&config); // text_based would be parsed and applied here.
...
}
Using "avoid_medusas_gaze=no\n"
could also have been used, if the author knew they could guarantee having a recent enough CPython available.
A key=
where key isn’t known would be an error. a unknownok:key=
where key isn’t known would be ignored. (a note could be emitted to stderr in verbose mode)
Internal changes to support this
// include/cpython/initconfig.h
struct _Py_private_config; // forward decl
typedef struct PyConfig {
...
wchar_t *text_based; // See https-link-to-docs.
...
struct _Py_private_config *_private_config;
} PyConfig;
// include/internal/initconfig.h
struct _Py_private_config {
bool avoid_medusas_gaze;
... // Existing PyConfig "_private" fields could move into here.
};
and obviously support for parsing, populating, and error checking called from Py_InitializeFromConfig
.
thoughts?
If done ultimately this would become a PEP.