C API: What should the leading underscore (`_Py`) mean?

The SC accepted PEP 689 (Unstable C API tier) with a catch:

We discussed having leading underscores for this API tier, and it was decided that a leading underscore was preferred.

This subtle point puts me in the unfortunate position of authoring a PEP I don’t agree with. That would be fine – I can compromise – but additionally I don’t fully understand what the leading underscore is supposed to mean, and what the desired end state is. It turns out that we don’t have a project-wide agreement on this.
Since this was only discussed privately and verbally, I’d like to rehash/clarify some of the arguments here. I’ll frame what I originally thought was status quo as a proposal. If it sounds as status quo to you… I don’t think you’re wrong.
This was also discussed on capi-sig, with limited engagement.


My proposal is:

Underscore-prefixed names in the C API are “hands-off” for third-party code. They may change at any time. They might be useful in today’s debugging session, but may disappear tomorrow. If you need itheir functionality, ask CPython to expose it
We will not break underscored APIs unnecessarily, especially in patch releases, and especially for ones that are used or weren’t changed in a long time. But they can change in incompatible ways if necessary.
If something with an underscore is documented (or meant to be used externally), that is a bug to fix: the underscore should be removed.

I this is formalized, you can grep your codebase for '\b_Py' to ensure you’re playing by the rules. (Or your linter can do it. Or a CPython dev looking for useful features to expose.)

Internally, CPython devs should try to be friendly even to people/projects that “break the rules”. The rules should provide third-party code an opportunity to be more future-proof, not be used as an excuse to break things. Not much should actually change in CPython development.
One example (of many possible exceptions): If someone asks to expose underscore-prefixed API and we agree it’s a good idea, the old API should be treated as public/frozen in all the old versions where it appears, so it can be used with appropriate #ifdefs (or pythonapi-compat). We should also consider keeping the old name around as a courtesy (esp. for code that predates this convention): an alias has very low maintenance overhead. (Other things being equal, it’s good to keep code working even if it doesn’t play by the rules.)

Now, what are other proposals (or versions of status quo)?
A common one (as far as I know – correct me!) is that a leading underscore is a “warning”, meaning roughly that “all authors/reviewers should read the documentation”. But the documentation for underscored functions is often missing, both for functions you can use and ones you shouldn’t.
Consider questions like:

  • _PyCode_GetExtra is mentioned in PEP 523, but not documented on docs.python.org. Can I use it?
  • _PyImport_AcquireLock is mentioned in a StackOverflow answer, but not on docs.python.org. Can I use it?
  • As a core dev, I find that (say, hypothetically) _PyArg_UnpackStack is no longer necessary in CPython. Where do I look to see if I can remove it or change its signature?

This situation can’t be fixed easily: there are hundreds of underscored functions exposed in the public headers (and used in the wild), and no good way to prevent adding new ones (of either kind).
Some of them are exposed for technical reasons: e.g. _Py_NewRef needs to be exposed even though we’d like users to never use it (though this particular function is not too dangerous to use.) But there’s no good way for CPython to mark something in a public/unstable API header as private.

What are your thoughts?


Back to PEP 689 (Unstable C API tier): my issue is that if a simple underscore means “warning, read the docs”, then PEP 689 is overengineered. There’s no need for an elaborate tier concept and opt-in mechanism.


(I’m aware of recent suggestions to throw away the current API and replace it by a new one. That would, of course, make all issues with the current C API and its docs, including this one, moot. Please consider that proposal off-topic in this thread.)

(And rather obviously: I’m speaking as myself here, not on behalf of the SC.)

2 Likes

+1 for keeping the _Py and _PY prefixes reserved for private, internal APIs.

Idea: Have we considered to make unstable APIs and private APIs opt-in at pre-processor time? This would force people to opt-into unstable and private APIs explicitly and make it harder for users to use them accidentally.

// make names of unstable APIs visible
#define Py_UNSTABLE_API 1
// make names of private APIs visible
#define Py_PRIVATE_API 1

#include <Python.h>

pyport.h

// make private and unstable APIs visible to core
#if defined(Py_BUILD_CORE) && !defined(Py_UNSTABLE_API)
#  define Py_UNSTABLE_API
#endif
#if defined(Py_BUILD_CORE) && !defined(Py_PRIVATE_API)
#  define Py_PRIVATE_API
#endif

code.h

#if defined(Py_UNSTABLE_API)
PyAPI_FUNC(PyCodeObject *) PyCode_New(
        int, int, int, int, int, PyObject *, PyObject *,
        PyObject *, PyObject *, PyObject *, PyObject *,
        PyObject *, PyObject *, PyObject *, int, PyObject *,
        PyObject *);

PyAPI_FUNC(PyCodeObject *) PyCode_NewWithPosOnlyArgs(
        int, int, int, int, int, int, PyObject *, PyObject *,
        PyObject *, PyObject *, PyObject *, PyObject *,
        PyObject *, PyObject *, PyObject *, int, PyObject *,
        PyObject *);
        /* same as struct above */
#endif

Yes, see PEP 689 for unstable API. Py_BUILD_CORE already exists for private API.

2 Likes

Seems like the only incompatibility between the proposals is this bit:

If you change this to “if something with an underscore is meant for third-parties to use reliably across [major/minor/micro]-releases, that is a bug” then there’s no inconsistency left.

Our own rules shouldn’t conflate what users/3rd parties should do - they should focus on what we will do. We can then infer good practices for 3rd parties, or they can infer them themselves.

For example, if our rules are “in the absense of documentation[1] an API may be modified in any release. Py* APIs require documentation and _Py* APIs do not require any documentation,” then the caller has all the information necessary to decide whether one of our APIs is safe for their context.

If you want to start writing the rules from the POV of the consumer, then you also need to write a separate set of rules for us. But we’re easier to predict, so I’d just start with rules for us and let consumers figure themselves out :wink:


  1. Feel free to specify where such documentation should be, to avoid people treating PEPs as docs. ↩︎

2 Likes

You already have to include headers directly from Include/internal to get these. They aren’t referenced at all from the other headers, so you can’t really miss it.

The stable API is still (regrettably) opt-in, and I’d hope in a complete rewrite we’d make it stable/limited by default, and use a Py_UNSTABLE_API define or similar to opt-out.

3 Likes

Not necessarily We leak several hundred private names (_Py*) in Python.h.

$ cpp -E -I. -I Include/ Include/Python.h | grep -o "_Py[[:alnum:]_]*" | sort -u | wc -l
426
$ cpp -E -I. -I Include/ Include/Python.h | grep -o "_Py[[:alnum:]_]*" | sort -u
_Py_add_one_to_index_C
_Py_add_one_to_index_F
_PyArg_BadArgument
_PyArg_CheckPositional
_PyArg_NoKeywords
_PyArg_NoKwnames
_PyArg_NoPositional
_PyArg_Parser
_PyArg_ParseStack
_PyArg_ParseStackAndKeywords
_PyArg_ParseTupleAndKeywordsFast
_PyArg_UnpackKeywords
_PyArg_UnpackKeywordsWithVararg
_PyArg_UnpackStack
_PyArg_VaParseTupleAndKeywordsFast
_Py_ascii_whitespace
_PyAsyncGenASend_Type
_PyAsyncGenAThrow_Type
_PyAsyncGenWrappedValue_Type
_Py_BreakPoint
...
```$ cpp -E -I. -I Include/ Include/Python.h | grep -o "_Py[[:alnum:]_]*" | sort -u
_Py_add_one_to_index_C
_Py_add_one_to_index_F
_PyArg_BadArgument
_PyArg_CheckPositional
_PyArg_NoKeywords
_PyArg_NoKwnames
_PyArg_NoPositional
_PyArg_Parser
_PyArg_ParseStack
_PyArg_ParseStackAndKeywords
_PyArg_ParseTupleAndKeywordsFast
_PyArg_UnpackKeywords
_PyArg_UnpackKeywordsWithVararg
_PyArg_UnpackStack
_PyArg_VaParseTupleAndKeywordsFast
_Py_ascii_whitespace
_PyAsyncGenASend_Type
_PyAsyncGenAThrow_Type
_PyAsyncGenWrappedValue_Type
_Py_BreakPoint
2 Likes

I’m not sure what we’re arguing about here (I seem to be having a particularly dense day, sorry) but I think I would like unstable APIs not to use a leading underscore. The underscore means “internal”. The unstable API (as I envisioned it long ago) is not internal, it is specifically for external use, it just doesn’t make promises about long-term stability. The PEP makes it clear that unstable, public APIs are still stable within one “minor” release (e.g. if it’s one way in 3.11.0, it can’t be changed in 3.11.n, but it can be changed in 3.12.0).

I would like the unstable API to use an underscore, so let me explain why.

I have need to poke at CPython internals quite a bit, because of how we at Google use Python and what CPython exposes. That’s my own responsibility, and we’ve worked to get rid of some of those uses (for example via PEP 587). I also see a lot of third-party code, and Google-internal code, written with the C API. I see a lot of it get copied around. There’s always code review happening, in our environments, but there’s only so much you can remember about what’s internal and what isn’t. I think it’s incredibly useful for anything that has caveats on use, from “this isn’t guaranteed to work in future releases but it’s probably fine” to “this is poking directly at internals and will probably need to be updated whenever Python is upgraded”, to be noticable as you are reading the code, so that you can look up the documentation and double-check if these caveats are acceptable for the situation.

The unstable API is really no different from internals here, for these purposes. It is the case that the unstable API offers a broader support baseline, but there’s still going to be considerations for the code author and code reviewers to consider: how likely is it, actually, that the specific API function will change? How problematic is that? What is the alternative, can we use a stable API instead? If so, at what cost? If not, why not? etc.

The fact that you have to explicitly opt into the stable API offers some protection, but it’s not fine-grained enough to say which functions you are opting into. It’s too easy to fall into the trap for enabling it for one thing, with careful consideration of its caveats, only to then also using another unstable API that you didn’t consider the cost of.

This isn’t quite a hypothetical problem, either. For example, for our zipimport replacement we’ve had to compile some thing with Py_BUILD_CORE enabled to get access to things we have to use to make importing extension modules work. I annotated what we were using it for, and then when we no longer needed that particular function, I would try to remove it – only to discover we were using other things that needed it, which I didn’t realise because I’d already added the Py_BUILD_CORE define at that point. (It’s not a problem in this case because we are intentionally poking at internals and also tightly control the Python versions we build with, but it shows why a mere define isn’t gong to offer much protection here.)

I do think we should offer this unstable API, to distinguish it from absolutely-unsupported uses of CPython internals, but I think we need to make them stand out enough that they aren’t used without a conscious decision to accept the cost of them. If C had something better than leading underscores or naming conventions that would be awesome, but as it is the easy solution is to use leading underscores for unstable API functions.

3 Likes

Let’s solve that by saying:

If the unstable API changes in an incompatible way, it’ll break at compile time.

We could also promise that when we remove unstable API, we provide an alternative way of doing (e.g. new arguments for code object, completely new API tracing/profiling, etc.). Exceptions would need a normal deprecation cycle, like any API.

As a user, by opting in at the file level, you’re saying you’re OK with revisiting the file for future CPython versions. When you need to revisit it, you’ll be alerted.
If you add unrelated unstable API calls, it shouldn’t be a problem. Unstable API should not be dangerous to use, it should just have higher maintenance costs.

There is of course the issue of CPython devs (as all devs) sometimes not recognizing what changes are incompatible – but that’s not specific to unstable/private API.

Thomas’s argument would be more convincing to me if we didn’t have such a longstanding tradition that public APIs, no matter how dangerous (e.g. PyTuple_SetItem, or for that matter Py_DECREF :slight_smile: don’t start with _. In fact IIRC when we did the “great renaming” back in the '90s we decided to use Py for public APIs and _Py for private ones. No consideration was given to danger (though at the time most APIs were probably effectively unstable :-).

I only know of one dangerous API that was given an _ because of the danger – sys._getframe(). I don’t think it sets enough of a precedent, and at the time we had not thought of a separate “unstable API” category, so we claimed it was internal to warn off casual use. (In fact I think it predates the stable ABI – it’s so old that even the Python 2 docs don’t have a “versionadded” tag for it.) And in fact it’s not dangerous – we just didn’t want to guarantee its existence on all platforms. In practice it has been very stable and safe.

I like Petr’s proposal to make unstable API changes break at compile time – since there aren’t any formally unstable APIs yet, we can easily make this a requirement for APIs to be included in the unstable API, and I think it’s not very onerous (it does mean that unstable fields will have to be renamed when their meaning changes).

I’m not sure we would need to use the deprecation cycle to remove an unstable API, when we don’t to change it. (I can sort of see that we shouldn’t just be removing the tracing functionality without offering an alternative, but I don’t think most unstable APIs are that critical.)

I’m not sure what that’s supposed to solve but it does not address my concerns at all – although this suggestion is frankly what I would have expected anyway, so it’s good to be explicit about it.

Let me try to come at my point of view another way. There are a lot of Python public/stable API functions. There are (I hope) going to be far, far fewer unstable API functions, and I expect they’ll serve very specific purposes. It is my expectation that most users (and most people reading that code) will not need to use unstable API functions. I don’t think it’s realistic to expect people to know that a function is part of the unstable API by checking the documentation each time. I don’t want people to accidentally use an unstable API function, but I also don’t want them to have to worry about whether something is an unstable API function.

I don’t think it’s fair to put the burden of figuring out if a function they don’t remember (and the Python C API has a lot of slightly different functions that are tricky to remember) on everyone using stable/public API’s only. I do think it’s fair to put the burden of realising whether a particular _Py* function is unstable or private on anyone using unstable or private APIs – they are already buying into the extra burden that comes with unstable or private API usage.

Because there are so many more stable API functions than unstable ones (or even private ones that people might want to use), it’s easier to flag all unstable/private API uses (by means of the leading underscore) and then realise which are unstable and which are private, than it is to flag all Python API calls and then check which are unstable.

A third option could be to designate a unique prefix for the unstable API like PyUnstable_*, or PyU_*? Remove the underscore since it is public, but indicate clearly it’s unstable. When transitioning from public/private to unstable one of the two will have to rename to transition anyway, so it shouldn’t be a problem to rename and make aliases. It does mean they won’t be alphabetically grouped with their stable counterparts, but I don’t expect there to be too many unstable APIs.

2 Likes

I like Spencer’s idea of finding a compromise.

I’ve come to the conclusion that I really want the formal concept of “unstable API” as described by the PEP, but I don’t care enough about the underscore to lean hard either way. Maybe the SC can just hold a vote and move on? I promise I will support whatever they choose.

3 Likes

No need for that – the SC already voted. Just need to solve the practical issues now – but while solving them, I found some confusion and thought up a proposal to make.

I’ve been told that it looks like I’m throwing a little tantrum (and I honestly thank you for the feedback!). I think I found the cause for that: the SC’s requirement looks like an insignificant tweak, but I see it as an additional design constraint – and with it, I can’t find a solution that feels like a net win for API tier definition. Sadly, I can’t just decide I don’t care enough about the underscore.

Perhaps I am too nitpicky to be the expert on API tiers. I’d be happy to delegate or collaborate. But if I am to implement the PEP I do need a nitpick-level understanding of how things should be.

The practical issue is that to implement the PEP, I’ll need to change the docs and devguide definitions of the internal API tier. And if the underscore isn’t the “private” marker any more, what is? In other words:

how should users do that? I feel a duty to make this straightforward and obvious, but I always keep coming back to wanting to say “underscore=internal”. Here’s my thought process:

What could define a precise boundary between unstable & private?

  • Availability in the headers? Doesn’t work, unfortunately: If a private function needs to be static inline (or macro) for performance, it needs to be exposed. We can’t use #define/#include to opt in or out of the private tier. (It can work for other tiers though.)
  • Docs? Docs are actually a bad way to define API tiers.
    • In a volunteer-driven project, features are unfortunately often undocumented or underdocumented. They go out of date over time.
    • What even is docs? Do PEPs count? HOWTOs? Source comments? Do I trawl through the whole docs for a note, or is the first entry in search enough? (Not trying to troll here – trying to find a way to make clauses like unless documented otherwise useful for a precise tier boundary)
  • A definitive list (exported to the docs)? It works for Limited API, but it’s nowhere near perfect yet, and the checks for that list are quite a maintenance burden. Too much burden for the unstable tier, IMO.
  • Should users look in the header file? That would technically work, but it’s ugly.
  • (Should users check if the opt-in disables their function? I’ll stop listing bad brainstorming ideas now.)
  • Name is the only usable marker I can see.

Yes, PyUnstable_ would work. Thanks, Spencer!
I subjectively find it ugly, but I don’t see other good solutions given the constraint. (I assume I can convince the SC to allow that rather than _PyUnstable with underscore – see below.)


I’m trying to make things work with the constraint, even though I don’t agree with it. Worse, the reasoning behind the constraint still doesn’t make sense to me. It doesn’t have to (the SC is there to ensure different viewpoints are represented, and I sure don’t have experience maintaining a giant monorepo) – but it sure would help if it made sense. So I’ll try prying further, even though I’m probably annoying everyone at this point.

The few functions with specific purposes point makes it unlikely that you’ll mix different “families” of unstable functions in a single file. To me, that’s what makes a .c-file-level opt-in enough. If you’re worried about introducing more unstable functions, your file might be too big.
I said this before, but: I see opting-in to unstable API as indication that you may need to revisit that file for future CPython versions. In that light, accidentally introducing other unstable functions still doesn’t seem like enough of a concern. They’re not dangerous, they just have ⅓ of the support time compared to public ones.

What is a big concern for me is that if we lose the “underscored=private” rule, users might accidentally introduce new private functions to their files, seeing that underscores are already used elsewhere (see the inability to opt-out of the private tier I mentioned above). That worries me a lot. I worry very much that people will read the opt-in #define as “now I can use underscored functions”. If they do, the PEP is a failure.

Um… that’s the bigger problem I’m trying to solve here! I’m trying to inch toward a world where that isn’t the case :‍)


And finally, I can’t resist adding some purely personal opinion.

I would love to have dead-simple “underscored=private” rule. You should be able to grep your 100GLOC codebase for \b_Py and pipe the result straight to Jira. You should be able to send a drive-by PR for removing _Py use, without any doubt about whether it’s fixing a bug.
We’re not in that perfect situation yet, but a big part of that is that we currently also use underscores for unstable API. The PEP was originally meant to strengthen the rule, not weaken it.

4 Likes