Yes, see PEP 689 for unstable API. Py_BUILD_CORE
already exists for private API.
Seems like the only incompatibility between the proposals is this bit:
If you change this to “if something with an underscore is meant for third-parties to use reliably across [major/minor/micro]-releases, that is a bug” then there’s no inconsistency left.
Our own rules shouldn’t conflate what users/3rd parties should do - they should focus on what we will do. We can then infer good practices for 3rd parties, or they can infer them themselves.
For example, if our rules are “in the absense of documentation[1] an API may be modified in any release. Py*
APIs require documentation and _Py*
APIs do not require any documentation,” then the caller has all the information necessary to decide whether one of our APIs is safe for their context.
If you want to start writing the rules from the POV of the consumer, then you also need to write a separate set of rules for us. But we’re easier to predict, so I’d just start with rules for us and let consumers figure themselves out
-
Feel free to specify where such documentation should be, to avoid people treating PEPs as docs. ↩︎
You already have to include headers directly from Include/internal
to get these. They aren’t referenced at all from the other headers, so you can’t really miss it.
The stable API is still (regrettably) opt-in, and I’d hope in a complete rewrite we’d make it stable/limited by default, and use a Py_UNSTABLE_API
define or similar to opt-out.
Not necessarily We leak several hundred private names (_Py*
) in Python.h
.
$ cpp -E -I. -I Include/ Include/Python.h | grep -o "_Py[[:alnum:]_]*" | sort -u | wc -l
426
$ cpp -E -I. -I Include/ Include/Python.h | grep -o "_Py[[:alnum:]_]*" | sort -u
_Py_add_one_to_index_C
_Py_add_one_to_index_F
_PyArg_BadArgument
_PyArg_CheckPositional
_PyArg_NoKeywords
_PyArg_NoKwnames
_PyArg_NoPositional
_PyArg_Parser
_PyArg_ParseStack
_PyArg_ParseStackAndKeywords
_PyArg_ParseTupleAndKeywordsFast
_PyArg_UnpackKeywords
_PyArg_UnpackKeywordsWithVararg
_PyArg_UnpackStack
_PyArg_VaParseTupleAndKeywordsFast
_Py_ascii_whitespace
_PyAsyncGenASend_Type
_PyAsyncGenAThrow_Type
_PyAsyncGenWrappedValue_Type
_Py_BreakPoint
...
```$ cpp -E -I. -I Include/ Include/Python.h | grep -o "_Py[[:alnum:]_]*" | sort -u
_Py_add_one_to_index_C
_Py_add_one_to_index_F
_PyArg_BadArgument
_PyArg_CheckPositional
_PyArg_NoKeywords
_PyArg_NoKwnames
_PyArg_NoPositional
_PyArg_Parser
_PyArg_ParseStack
_PyArg_ParseStackAndKeywords
_PyArg_ParseTupleAndKeywordsFast
_PyArg_UnpackKeywords
_PyArg_UnpackKeywordsWithVararg
_PyArg_UnpackStack
_PyArg_VaParseTupleAndKeywordsFast
_Py_ascii_whitespace
_PyAsyncGenASend_Type
_PyAsyncGenAThrow_Type
_PyAsyncGenWrappedValue_Type
_Py_BreakPoint
I’m not sure what we’re arguing about here (I seem to be having a particularly dense day, sorry) but I think I would like unstable APIs not to use a leading underscore. The underscore means “internal”. The unstable API (as I envisioned it long ago) is not internal, it is specifically for external use, it just doesn’t make promises about long-term stability. The PEP makes it clear that unstable, public APIs are still stable within one “minor” release (e.g. if it’s one way in 3.11.0, it can’t be changed in 3.11.n, but it can be changed in 3.12.0).
I would like the unstable API to use an underscore, so let me explain why.
I have need to poke at CPython internals quite a bit, because of how we at Google use Python and what CPython exposes. That’s my own responsibility, and we’ve worked to get rid of some of those uses (for example via PEP 587). I also see a lot of third-party code, and Google-internal code, written with the C API. I see a lot of it get copied around. There’s always code review happening, in our environments, but there’s only so much you can remember about what’s internal and what isn’t. I think it’s incredibly useful for anything that has caveats on use, from “this isn’t guaranteed to work in future releases but it’s probably fine” to “this is poking directly at internals and will probably need to be updated whenever Python is upgraded”, to be noticable as you are reading the code, so that you can look up the documentation and double-check if these caveats are acceptable for the situation.
The unstable API is really no different from internals here, for these purposes. It is the case that the unstable API offers a broader support baseline, but there’s still going to be considerations for the code author and code reviewers to consider: how likely is it, actually, that the specific API function will change? How problematic is that? What is the alternative, can we use a stable API instead? If so, at what cost? If not, why not? etc.
The fact that you have to explicitly opt into the stable API offers some protection, but it’s not fine-grained enough to say which functions you are opting into. It’s too easy to fall into the trap for enabling it for one thing, with careful consideration of its caveats, only to then also using another unstable API that you didn’t consider the cost of.
This isn’t quite a hypothetical problem, either. For example, for our zipimport replacement we’ve had to compile some thing with Py_BUILD_CORE enabled to get access to things we have to use to make importing extension modules work. I annotated what we were using it for, and then when we no longer needed that particular function, I would try to remove it – only to discover we were using other things that needed it, which I didn’t realise because I’d already added the Py_BUILD_CORE define at that point. (It’s not a problem in this case because we are intentionally poking at internals and also tightly control the Python versions we build with, but it shows why a mere define isn’t gong to offer much protection here.)
I do think we should offer this unstable API, to distinguish it from absolutely-unsupported uses of CPython internals, but I think we need to make them stand out enough that they aren’t used without a conscious decision to accept the cost of them. If C had something better than leading underscores or naming conventions that would be awesome, but as it is the easy solution is to use leading underscores for unstable API functions.
Let’s solve that by saying:
If the unstable API changes in an incompatible way, it’ll break at compile time.
We could also promise that when we remove unstable API, we provide an alternative way of doing (e.g. new arguments for code object, completely new API tracing/profiling, etc.). Exceptions would need a normal deprecation cycle, like any API.
As a user, by opting in at the file level, you’re saying you’re OK with revisiting the file for future CPython versions. When you need to revisit it, you’ll be alerted.
If you add unrelated unstable API calls, it shouldn’t be a problem. Unstable API should not be dangerous to use, it should just have higher maintenance costs.
There is of course the issue of CPython devs (as all devs) sometimes not recognizing what changes are incompatible – but that’s not specific to unstable/private API.
Thomas’s argument would be more convincing to me if we didn’t have such a longstanding tradition that public APIs, no matter how dangerous (e.g. PyTuple_SetItem
, or for that matter Py_DECREF
don’t start with
_
. In fact IIRC when we did the “great renaming” back in the '90s we decided to use Py
for public APIs and _Py
for private ones. No consideration was given to danger (though at the time most APIs were probably effectively unstable :-).
I only know of one dangerous API that was given an _
because of the danger – sys._getframe()
. I don’t think it sets enough of a precedent, and at the time we had not thought of a separate “unstable API” category, so we claimed it was internal to warn off casual use. (In fact I think it predates the stable ABI – it’s so old that even the Python 2 docs don’t have a “versionadded” tag for it.) And in fact it’s not dangerous – we just didn’t want to guarantee its existence on all platforms. In practice it has been very stable and safe.
I like Petr’s proposal to make unstable API changes break at compile time – since there aren’t any formally unstable APIs yet, we can easily make this a requirement for APIs to be included in the unstable API, and I think it’s not very onerous (it does mean that unstable fields will have to be renamed when their meaning changes).
I’m not sure we would need to use the deprecation cycle to remove an unstable API, when we don’t to change it. (I can sort of see that we shouldn’t just be removing the tracing functionality without offering an alternative, but I don’t think most unstable APIs are that critical.)
I’m not sure what that’s supposed to solve but it does not address my concerns at all – although this suggestion is frankly what I would have expected anyway, so it’s good to be explicit about it.
Let me try to come at my point of view another way. There are a lot of Python public/stable API functions. There are (I hope) going to be far, far fewer unstable API functions, and I expect they’ll serve very specific purposes. It is my expectation that most users (and most people reading that code) will not need to use unstable API functions. I don’t think it’s realistic to expect people to know that a function is part of the unstable API by checking the documentation each time. I don’t want people to accidentally use an unstable API function, but I also don’t want them to have to worry about whether something is an unstable API function.
I don’t think it’s fair to put the burden of figuring out if a function they don’t remember (and the Python C API has a lot of slightly different functions that are tricky to remember) on everyone using stable/public API’s only. I do think it’s fair to put the burden of realising whether a particular _Py*
function is unstable or private on anyone using unstable or private APIs – they are already buying into the extra burden that comes with unstable or private API usage.
Because there are so many more stable API functions than unstable ones (or even private ones that people might want to use), it’s easier to flag all unstable/private API uses (by means of the leading underscore) and then realise which are unstable and which are private, than it is to flag all Python API calls and then check which are unstable.
A third option could be to designate a unique prefix for the unstable API like PyUnstable_*
, or PyU_*
? Remove the underscore since it is public, but indicate clearly it’s unstable. When transitioning from public/private to unstable one of the two will have to rename to transition anyway, so it shouldn’t be a problem to rename and make aliases. It does mean they won’t be alphabetically grouped with their stable counterparts, but I don’t expect there to be too many unstable APIs.
I like Spencer’s idea of finding a compromise.
I’ve come to the conclusion that I really want the formal concept of “unstable API” as described by the PEP, but I don’t care enough about the underscore to lean hard either way. Maybe the SC can just hold a vote and move on? I promise I will support whatever they choose.
No need for that – the SC already voted. Just need to solve the practical issues now – but while solving them, I found some confusion and thought up a proposal to make.
I’ve been told that it looks like I’m throwing a little tantrum (and I honestly thank you for the feedback!). I think I found the cause for that: the SC’s requirement looks like an insignificant tweak, but I see it as an additional design constraint – and with it, I can’t find a solution that feels like a net win for API tier definition. Sadly, I can’t just decide I don’t care enough about the underscore.
Perhaps I am too nitpicky to be the expert on API tiers. I’d be happy to delegate or collaborate. But if I am to implement the PEP I do need a nitpick-level understanding of how things should be.
The practical issue is that to implement the PEP, I’ll need to change the docs and devguide definitions of the internal API tier. And if the underscore isn’t the “private” marker any more, what is? In other words:
how should users do that? I feel a duty to make this straightforward and obvious, but I always keep coming back to wanting to say “underscore=internal”. Here’s my thought process:
What could define a precise boundary between unstable & private?
- Availability in the headers? Doesn’t work, unfortunately: If a private function needs to be
static inline
(or macro) for performance, it needs to be exposed. We can’t use#define
/#include
to opt in or out of the private tier. (It can work for other tiers though.) - Docs? Docs are actually a bad way to define API tiers.
- In a volunteer-driven project, features are unfortunately often undocumented or underdocumented. They go out of date over time.
- What even is docs? Do PEPs count? HOWTOs? Source comments? Do I trawl through the whole docs for a note, or is the first entry in search enough? (Not trying to troll here – trying to find a way to make clauses like unless documented otherwise useful for a precise tier boundary)
- A definitive list (exported to the docs)? It works for Limited API, but it’s nowhere near perfect yet, and the checks for that list are quite a maintenance burden. Too much burden for the unstable tier, IMO.
- Should users look in the header file? That would technically work, but it’s ugly.
- (Should users check if the opt-in disables their function? I’ll stop listing bad brainstorming ideas now.)
- Name is the only usable marker I can see.
Yes, PyUnstable_
would work. Thanks, Spencer!
I subjectively find it ugly, but I don’t see other good solutions given the constraint. (I assume I can convince the SC to allow that rather than _PyUnstable
with underscore – see below.)
I’m trying to make things work with the constraint, even though I don’t agree with it. Worse, the reasoning behind the constraint still doesn’t make sense to me. It doesn’t have to (the SC is there to ensure different viewpoints are represented, and I sure don’t have experience maintaining a giant monorepo) – but it sure would help if it made sense. So I’ll try prying further, even though I’m probably annoying everyone at this point.
The few functions with specific purposes point makes it unlikely that you’ll mix different “families” of unstable functions in a single file. To me, that’s what makes a .c
-file-level opt-in enough. If you’re worried about introducing more unstable functions, your file might be too big.
I said this before, but: I see opting-in to unstable API as indication that you may need to revisit that file for future CPython versions. In that light, accidentally introducing other unstable functions still doesn’t seem like enough of a concern. They’re not dangerous, they just have ⅓ of the support time compared to public ones.
What is a big concern for me is that if we lose the “underscored=private” rule, users might accidentally introduce new private functions to their files, seeing that underscores are already used elsewhere (see the inability to opt-out of the private tier I mentioned above). That worries me a lot. I worry very much that people will read the opt-in #define
as “now I can use underscored functions”. If they do, the PEP is a failure.
Um… that’s the bigger problem I’m trying to solve here! I’m trying to inch toward a world where that isn’t the case :)
And finally, I can’t resist adding some purely personal opinion.
I would love to have dead-simple “underscored=private” rule. You should be able to grep your 100GLOC codebase for \b_Py
and pipe the result straight to Jira. You should be able to send a drive-by PR for removing _Py
use, without any doubt about whether it’s fixing a bug.
We’re not in that perfect situation yet, but a big part of that is that we currently also use underscores for unstable API. The PEP was originally meant to strengthen the rule, not weaken it.
I’ve submitted an updated PEP 689, which switches to the PyUnstable
prefix, to the SC for consideration.
Correct me if I wrong, but the issue here is “how can I tell if a function with a leading _
is private or unstable?”
This shouldn’t be an issue, as there is no such thing as a “private API”; it is a contradiction.
We already enforce its non-existence through the PyAPI_FUNC
macro which controls which symbols are visible. At least that’s how it should work.
If a symbol with a leading _
is visible, then it is part of the unstable API.
In the source code:
PyAPI_FUNC(int) PyFoo(void)
: Public, stable API function
PyAPI_FUNC(int) _PyFoo(void)
: Public, unstable API function
extern int _PyFoo(void)
: Private function (not visible to dynamically linked libraries)
What am I missing here?
“Private API” is API for internal use by the interpreter and standard library. I don’t think it’s a contradiction.
Ideally, no names with leading underscores would be exposed. But we can’t get all the way there any time soon: we have underscored implementations of public macros or inline functions, where we always want people to use the public API but need to expose the private function too.
If it is for use of the interpreter and standard library only, then (regardless of whether it is consider API or not) it isn’t visible to third-party code.
Why do we need to expose private functions? Are there private functions marked with PyAPI_FUNC
?
Significant parts of the standard library are implemented as separate modules (DLL/dylib/so), and so have the same system-level access capabilities as third-party modules. The difference is that we update them all at the same time as we update the core runtime, so they won’t be caught out by changes.
So private APIs used by these modules need to be exported, even though we don’t intend for people who ship separately from CPython to ever use them.
Yes, the standard lib is a problem, if it is to have privileged access and be dynamically loadable.
In New-C-API-for-Python/DesignPrinciples.md at main · markshannon/New-C-API-for-Python · GitHub I suggest that the standard library have no more privileges than third-party code. We can do that by moving functionality into the core, and only calling (unstable) API functions from the stdlib.
But that is a lot of preparatory work before we can have a add an unstable API.
In the shorter term we should explicitly mark exported private functions as private, either by renaming or with an annotation.
So, extending my earlier classification, we would have something like:
PyAPI_FUNC(int) PyFoo(void)
: Public, stable API function
PyAPI_FUNC(int) _PyFoo(void)
: Public, unstable API function
extern int _PyFoo(void)
: Private function (not visible to dynamically linked libraries)
PyPRIVATE_FUNC(int) _PyFoo(void)
: Private function (temporarily visible to dynamically linked libraries)
An example: The public macro Py_DECREF
calls the private function _Py_DecRef
. The function needs to be visible to the linker, but it’s not part of the API.
Also, how would do you mark private macros? We still have those.