FR: Allow private runtime config to enable extending without breaking the `PyConfig` ABI

Problem statement

We need the ability to extend the configuration of the Python runtime within patch releases where we cannot change public structures and thus break a releases ABI. We don’t do this often, but security fixes can require adding configuration settings. A past example of this is the hash randomization feature. (a new, still embargoed, need for this is in the works)

Python 3.8 added our suite of PyConfig based APIs via PEP 587 – Python Initialization Configuration | peps.python.org. This cleaned up a lot of things, good! But it has a downside: It resulted in a public C struct full of configuration options (including a few fields awkwardly called “private”). This is struct PyConfig currently seen in cpython/initconfig.h at main · python/cpython · GitHub.

We’re free to alter struct PyConfig between minor releases so long as we don’t remove fields, it is not a cross-version stable ABI as far as I can tell. But when we need to add more configuration in a security patch release we’re back to resorting to ad-hoc out of band configuration mechanisms because struct PyConfig must not be changed within a release.

Proposal

We could add an additional “extended config” concept. This should explicitly NOT be in the form of a public struct. I suggest it take the form of a string containing newline key value pairs in a trivial format. Likely simply "key=value\n". A pointer to this extended text based config would be added to struct PyConfig and parsed during Py_InitializeFromConfig to fill values in wherever they belong. Along with a pointer to an opaque private struct defined in Include/internal/ that we’d be free to change even in patch releases.

Questions

  • Do we allow existing struct PyConfig field names to be set via this?
    • using their struct field name or using their -X flag name for those that have one?
    • I propose text based settings always override fields during InitializeFromConfig. the ultimate goal of this is that people could use an entirely text based config instead of C struct fields. (Maybe we’d even want to encourage that very long term if deprecating a lot of the struct ever becomes desirable?)

For our users sake, we should probably flag unknown field names as a config error. BUT we need the concept of intentionally non-error causing value settings so that code can be written that works across Python versions without a huge pile of minor or patch release version check ifdef hell.

  • Should we allow a special "unknownok:" string prefix on the key name to allow setting of things that may not be an existing known key / feature in the currently running python release.

This could look something like this as a user

{
    PyConfig config;
    PyConfig_InitPythonConfig(&config);
    PyStatus status = PyConfig_SetString(
        &config, &config.text_based,
        L"check_hash_pycs_mode=always\n"  // could've been set in struct
        L"unknownok:avoid_medusas_gaze=yes\n"  // new security patch feature
    );
    if (PyStatus_Exception(status)) {
        goto fail;
    }
    status = Py_InitializeFromConfig(&config);  // text_based would be parsed and applied here.
    ...
}

Using "avoid_medusas_gaze=no\n" could also have been used, if the author knew they could guarantee having a recent enough CPython available.

A key= where key isn’t known would be an error. a unknownok:key= where key isn’t known would be ignored. (a note could be emitted to stderr in verbose mode)

Internal changes to support this

// include/cpython/initconfig.h

struct _Py_private_config;  // forward decl

typedef struct PyConfig {
    ...
    wchar_t *text_based;  // See https-link-to-docs.
    ...
    struct _Py_private_config *_private_config;
} PyConfig;
// include/internal/initconfig.h

struct _Py_private_config {
    bool avoid_medusas_gaze;
    ... // Existing PyConfig "_private" fields could move into here.
};

and obviously support for parsing, populating, and error checking called from Py_InitializeFromConfig.

thoughts?

If done ultimately this would become a PEP.

2 Likes

If I understood correctly, this new mechanism would be for users that:

  • embed Python in their app
  • are pinned to a specific minor version of CPython
  • need to update the runtime config in response to some [security] fix in a micro release

The next “minor” release would have the configuration added to PyConfig, right? If so, then this would be a mechanism for one-off fixes for a very small set of users. (The number of affected users isn’t necessarily important.) Something feels off about this situation and using PyConfig to solve it.

Regardless, the temporary and specific nature of this need suggests other options that might be more straight-forward:

  • old-school global variables (a la pre-587)
  • env vars

What about the private fields and data in our public struct? We shouldn’t do that.

You may be over-constraining your use scenario. PyConfig is indeed primarily, if not exclusively, for use by embedding Python (I didn’t look to see if subinterpreters use it; assuming not). But that doesn’t imply that an embedder is “pinned to a specific minor version of CPython”. An application embedding CPython can target building and working on numerous platforms embedding whatever CPython they provide. A desire for an application maintainer is to limit #if PY_BLAH_VERSION... soup while doing so.

Alternatives when a PyConfig field doesn’t exist to be set

What did you mean by “env vars”? I’m guessing:

The PYTHON* environment variables and command line flag argv are all parsed in the same step by the PyConfig_* initialization APIs (which fill in the PyConfig structure, though those can do anything and could fill in private globals as well).

I assume you’re suggesting that embedding code needing to enable features call C setenv() before configuring the Python interpreter? Or perhaps PyConfig_SetArgv() to add a “command line” flag as the way to set the configuration that isn’t in older releases PyConfig structures?

That’d be doable. Though setenv() has global & child process state consequences.

Core dev toil - backporting security fix pain

The need to add things in patch releases has come up multiple times over the years. Go back a decade and we have hash randomization, It added an environment variable and an associated flag (our norm). When a fix needing to do this kind of thing in our main branch looks different than the change to the most recent versions needing a backport, that adds maintenance burden and a chance of getting something wrong while backporting. Specifically I’m talking about needing to avoid modifying the PyConfig struct when backporting.

If we create a private config struct, the backport of this kind of thing becomes more routine once 3.12.x is in security fix mode.

alternative to ease backporting

Instead of having the fix in main even use PyConfig, just have it do the private global variable dance. As that is what the backports will require. A follow-on feature PR can move that global into PyConfig.

conflating two concepts?

I’m perhaps lumping two concepts together here.

  • Identifying a need for a non-public / private config struct.
  • A way to supply configuration that isn’t setting fields in a giant ever-growing public C struct.

With a private struct we can remove obsolete fields instead of leaving them in place unused for a decade lest we inflict #if PY_BLAH_VERSION... pain on our user code.

If we moved the public API entirely to a text based config we could specifically silently ignore old obsolete fields in a supplied config instead of requiring them to be conditionally compiled out based on version in user code. Hopefully easing maintainers lives a bit.

Use of a text based config could add a tiny startup time overhead. I don’t expect CPython’s own python launchers would use it internally if so, as we have access to all public and private fields directly and are by definition tied to our exact version.


This thread is a mix of me thinking out loud and trying to eludicate what feel like deficiencies I see in our API design while working on a bug. :slight_smile:

1 Like

We really should stop allocating public structs on the stack. If the PyConfig API would use heap allocation with helper function, then we would be able to extend the PyConfig struct and append new settings.

Instead of

PyConfig config;
PyConfig_InitPythonConfig(&config);

the API should be

PyConfig *config = PyConfig_New();
PyConfig_InitPythonConfig(config);
...
PyConfig_Free(config);
6 Likes

I agree with the need and I like the proposal. I also very much agree with Christian’s suggestion that it ought to be a heap allocated type.

Since we have a private _config_init field, we could add additional flags to that which would let us identify a heap-allocated (by us) structure vs a statically allocated one. That at least preserves compatibility for existing users, even if they don’t get the benefit of additional options.

There are also some values taken from environment variables at a late stage (i.e. in getpath.c) which ideally would have been part of the config struct (like PYTHONPATH already is). However, they can mostly be ignored if you can figure out which variables in the structure to set…

I would say feel free to factor in getpath.py to any changes here. It’s intended to be overridable more easily than the old implementation - perhaps even to the point where we allow embedders to pass their own script/bytecode - and it should be doing all the processing of config values (apart from those needed to get to a working bytecode interpreter). Now that we’re confident that it replicates the old getpath.c behaviour, and has tests, it can be modified to support whatever future we want for embedders :slight_smile:

1 Like

In 2019, when PyConfig API was designed, I proposed to store the structure size in the structure to make it future-proof in terms of ABI. Bu this idea got rejected: Mailman 3 PEP 587 (Python Initialization Configuration) updated to be future proof again - Python-Dev - python.org In short, the only use case is about embedding Python, and for this use case, it’s ok to require rebuilding Python.

It’s an interesting API.

It might be useful to support newline characters (ex: first line \n second) for some strings.

Multiple PyConfig members are lists (ex: xoptions, warnoptions, argv, etc.), it would be nice to have a way to write a list without excluding a character. For example, PYTHONPATH env var doesn’t allow specifying a path which contains : since it’s the character used a path separator, whereas : is a legit character in a Unix path. I don’t recall how formats like YAML support that. Writing a parser is non-legit. Maybe keep it simple and forget corner cases, I don’t know.

Configuration options have many possible origins:

  • (Now deprecated) Global Configuration Variables, such as Py_VerboseFlag
  • Command line arguments
  • Environment variables
  • Configuration files (pybuilddir.txt, pyvenv.cfg)
  • Value overriden by another member (ex: isolated=1 implies use_environment=0)
  • And a few other origins

What’s the prioprity of such new “text based” configuration? Does it have a the highest priority?

For the specific case of fixing a securtiy issue in Python stable versions, IMO it’s simpler and safer to add a private variable rather than modifying the public PyConfig structure.

1 Like

I don’t agree. For tools like py2app/py2exe/pyinstaller it it is pretty inconvenient to have to rebuild the launcher executable that’s used to start the packaged application when there’s a bug fix release of Python.

1 Like