(pssst) Let's treat all API in public headers as public

encukou · July 4, 2023, 7:08am

In PEP 689, I wrote – foolishly, I now realize – that:

Any C API with a leading underscore is designated internal, meaning that it may change or disappear without any notice.

I failed to make the distinction between advice for users, and advice for core developers. I should have been much more explicit about that. This one was meant for users.

The devguide contains a (hidden and weak) hint:

Note that historically, underscores were used for APIs that are better served by the Unstable C API:

“provisional” APIs, included in a Python release to test real-world usage of new APIs;

APIs for very specialized uses like JIT compilers

That is, underscored API could be private, or it could just be from a time when the leading underscore wasn’t clearly defined as a “private” marker (which was, like, a year ago). It’s not easy to tell.

Unfortunately, it is quite easy to mass-remove underscored API, breaking all its users. Especially if you don’t wait for a review. It is much harder to research if each one should be removed, and harder still to revert a removal.
I argue that we should be careful and deliberate when cleaning stuff up, even if it’s more work. Partly because I care about users (some of which will properly report breakage and argue+wait for a revert, but others will get fed up and leave). And partly because I spend a lot of my time fixing avoidable breakage, and I’m frankly fed up. (And a lot of this is not volunteer work, so I can’t easily just leave and say it’s not my problem.)
Mass changes to the API leave a disproportionate amount work for other people.

I argue that for removing API, there is no rush. It would be nice to make the API cleaner, but it’s not a goal we need to reach ASAP.
It’s fine to leave an old untested function in, until someone finds the time to remove it properly.

So, let me channel my frustration into a radical draft guidance for core devs. How does this sound?

Treat any API in public headers as public.
There are exceptions, but they aren’t clear-cut. Consider them carefully. Some common exceptions include:

Underscored functions added for 3.12 or later (but be careful about these too – e.g. the underscore could be there just to match surrounding declarations)
A note in the docs marking it unsupported (but check the history – the note was after the API, some users haven’t seen it)
A proper public function nearby, added at the same time or before, that’s a straightforward “frontend” to the private one
Internal or impl in the name
API in Include/internal/ or behind Py_BUILD_CORE (fwiw, we should tack an underscore onto that macro)

Search the docs and the Internet. If you find that the API is documented, or used in a public project/tutorial, treat the API as public (i.e. deprecate it, or leave it in).

When adding a private function to a public header, let’s put _internal or _impl in the name. Make it extra ugly so people know not to touch it.

The advice for users is still to avoid all underscored names, and report cases where it’s the only way to do something.
But, while it would be great for us if they all dropped what they’re doing and fixed their API usage now, let’s not force them to do that.

vstinner · July 4, 2023, 4:42pm

I suppose that this discussion comes from my meta issue C API: Remove private C API functions (move them to the internal C API) · Issue #106320 · python/cpython · GitHub in Python 3.13 where I removed many private functions (around 173 private functions). I understand that you consider that even if these functions are prefixed with an underscore, people use it, and removing them will impact many projects: you prefer to leave them unchanged since their maintenance is minimal or these functions have a zero cost of maintenance. I agree that these changes will impact (for now) an unknown number of C extensions and fixing either the C API or these C extensions will take a few months.

I see things differently. While these functions don’t need much maintenance, my concern is that they are actually used in the wild (since it’s technically possible, so well, people use them for various reasons). If a private function is changed, the change will impact third party code relying on the old behavior. So even if a function is “marked” as private (by the underscore prefix), it’s more stressful for core devs to modify them.

The other issue is that these functions are usually closer to Python internals, than public C API functions. For example, they have no error checking and make multiple assumptions on how these functions must be called. Again, it’s a problem if CPython internals change: the fact that these functons exist prevent Python to evolve.

Moreover, it’s a big burden for other Python implementations like PyPy, since they actually have to implement private functions as soon as they are actually used by C extensions. Otherwise, PyPy cannot support these C extensions. I suppose that it’s easier for PyPy to support private functions, rather than helping C extensions maintainers to get rid of them (use the public C API).

Sometimes, when I see a private function, I don’t know its purpose, I don’t know how it’s used, I don’t know how it’s supposed to behave. It costs me the “Chesterton’s fence” maintenance burden: it takes me more time to think about such private API, compared to when I meet a public API (well defined, documented, tested, backward compatibility warranties).

My goal in Python 3.13 is to continue the work that I started since Python 3.7 (I already removed a few private functions in each release): clarify the distinction between public and private APIs. In practice, I try to move as many private functions as possible to the internal C API and no longer export them, so a 3rd party C extension can no longer use it. I’m not against exporting internal functions if it makes sense. But it should be the exception, not the default.

If a private C function is commonly used in 3rd party code, it’s a sign that we should consider promoting it as a public function: document it, test it, provide backward compatibility warranties. For example, in Python 3.11, I removed private _PyFloat_Pack8() and _PyFloat_Unpack8() (not tested, not documented), we discovered that they were used by msgpack: we decided to promote them to the public C API (add tests, write doc): C API: Pack and Unpack functions.

The goal is to clarify the contract between CPython developers and users of th CPython C API: clarify the backward compatibility warranties, clarify the scope of the C API, clarify what’s inside the C API or not (public or not).

The C API is big: 33,013 lines of C header files, 217 files, 1,376 exported functions (even more non-exported functions), 218 variables, 277 structures (type names): statistics including the internal C API (which does export functions!). See C API statistics. I would like to make the C API smaller to ease its maintenance and ease the implementation of the C API in other Python implementation.

vstinner · July 4, 2023, 4:49pm

I suggest making decisions on a case by case basis, for each removed private function.

If a C extension is affected by the removal of private functions, we should see how it’s used, check if a public C API is available and good enough. If there is a good reason to use the private API, we should consider to design a good public API for it: add error checking, document it, write tests, think about its design.

If the number of affected C extensions is too big (ex: more than 50 extensions), we can add again the function to the public C API as a private function (revert its removal). It’s just about moving one line from a file to another, it’s not a big deal. I didn’t remove any function implementation: so far, I only moved their declaration from Include/cpython/ to Include/internal/. The idea is to give more time to design a better replacement, and consider removing it again later (ex: in Python 3.14).

When a new C API is added, an implementation for Python 3.12 can be added to the pythoncapi-compat project. I would suggest that C extensions are updated to use the new public function, and use a compatibility layer to get it on old Python versions. Usually, the new API is better, like less error prone, easier to use.

For example, recently I added PyWeakref_GetRef() to Python 3.13: it returns a strong reference, rather than a borrowed reference, to avoid race conditions. You can use pythoncapi-compat to get this function on Python 3.12 and older. So you can make your C extension safer even when running on old Python versions!

vstinner · July 4, 2023, 4:54pm

As an user of the C API, it sounds uneasy to understand if a function is private or not depending when it was added, since its name doesn’t make it explicit. You have to dig into the documentation and check each function that you use.

In general, I’m trying to make the C API more regular, so it’s safer (less “error-prone”) to use it even without reading the doc (since I heard rumors of developers who don’t read the doc!):

Make reference counting more regular: add new functions returning strong references, rather than borrowed references.
Make it easier to identify what is the latest and safest API to use when they are many variants of it. For example, mark the old ones as deprecated (and remove them later).
Have well defined API: input and output types, clear variable scope, etc. For example, avoid macros which have “Pitfalls” (see PEP 670 for details).

vstinner · July 4, 2023, 9:52pm

So far, I removed 181 private functions in Python 3.13. A code search on PyPI top 5,000 projects found 480 matching lines in 34 projects.

I listed all removed functions, projects using removed functions, and which removed functions are the mostly used functions in comments to the issue: C API: Remove private C API functions (move them to the internal C API) · Issue #106320 · python/cpython · GitHub

If you want to help fixing the projects or designing public APIs for these removed API, I suggest continuing the discussion in the issue.

Python 3.13 beta1 is scheduled in May 2024: we have a few months to decide how to handle these incompatible changes. As I wrote, reverting changes causing most troubles is also an acceptable choice. Well, obviously, if possible, I would prefer to address the issue: provide a better replacement API.

malemburg · July 7, 2023, 10:08am

I very much concur with Petr’s view.

Instead of causing more and more churn or even outright making it impossible for extension writers to implement their logic, we should use a more careful approach, get the extensions into the discussion and have a group of core devs decide on this, possibly with the SC approving such changes in form of C API change PEPs, rather than having a single core dev decide for the whole Python eco system.

As I have already mentioned in several other threads on MLs and on Discourse, you will need SC buy-in and approval for making such vast changes, @vstinner. This is better for both you and the community.

Perhaps we ought to create a WG to discuss such API changes, where we start with defining what we want as a goal and then check which of the current underscore APIs should be hidden and which should be made public (again). I’d be happy to join such a WG.

On the general topic of stripping down the Python C API, I’ll repeat my stance: I am very much in favor of a rich and complete C API. I implemented this for the Python Unicode C API and it’s sad to see this deteriorate and get crippled in the last couple of days. I have maintained that API for more than 10 years and it was never much of a burden.

I also don’t think that grepping the top 5000 packages on PyPI is a good indication of whether an API is useful or not - those 5000 packages are not representative of the Python eco system (e.g. you miss out on the data science world, which mostly uses conda as repo and needless to say, you don’t capture the vast amounts of corporate code bases out there)… At best, such a review can provide some insight into possible breakage caused by changes.

But even then, a single API may very well break an entire package by removing the core entry point into the Python interpreter or make it unusable due to much to slow workarounds. Others may just need to switch to a better API. In the end, not all hits are equally serious.

Usefulness of a single API is not defined by how many people use it, but rather by how well it fits in to the general API design. A rich API will result in a rich eco-system - Python’s history is the perfect proof for this.

And even with a rich API, we can implement change. What we need for this is good and open communication with extension authors and buy-in from most parties. With such buy-in we can even make changes that cause major work on both the core dev and extension writer side.

At the moment, I neither see much progress in opening up such communication channels, nor do I see buy-in. It’s essentially the extension writers who need to follow whatever change core devs come up with.

In the past, using underscore APIs was a last resort for extension writers (with all the strings attached), but with the more recent set of changes, it is becoming impossible to use those, even if you want to for better performance (going through the Python C method API interface is slow for bulk operations) or have to, because there’s no other way to access the functionality (e.g. for low level tooling).

I hope we can use this topic to get the discussion going. Our past attempts at this have not been very successful.

PS: I’d love to notify the SC about this, but Discourse doesn’t let me mention the SC via the @-moniker. Not sure how to get their attention from within Discourse. Perhaps I’ll just send an good old email

vstinner · July 7, 2023, 10:33am

You should have a look at Issues · capi-workgroup/problems · GitHub

It seems like many people want to change the C API. But so far, there is no clear consensus on how to address C API issues.

iritkatriel · July 7, 2023, 11:27am

Indeed, the capi-workgroup is a good place to discuss these issues.

Our intention was at first to collect there everyone’s view of what the problems are with the current c api, and produce a document we all agree on which enumerates them. Without that we can discuss solutions all day, but without agreement on the problems we are trying to solve, we don’t have any reliable way to evaluate different solutions.

Re Petr’s proposal here, should we add an issue there about the meaning of leading underscore being inconsistent or poorly defined?

On a personal note, earlier this week I gave a keynote talk at PyCon IL, and in the weeks leading to that I felt that I need to preserve my headspace in a good state, which included avoiding engagement with the C API discussions. I will now resume my work on this project. My next goal will be to create a draft of a document summarising the issues we collectively identified in the capi-workgroup repo, so we can work on it towards, and at, the Brno sprint.

CAM-Gerlach · July 7, 2023, 7:21pm

Just to note, from the perspective of a scientific Python developer and maintainer (of Spyder, QtPy, Docrepr, etc), while this will certainly skew the top 5000 PyPI results away from scientific packages (which tend to make some of the heaviest use of the C API, and are some of the more sensitive to these changes), it isn’t as big a skew as one might think.

The great majority to almost all conda-forge packages are sourced from PyPI, not directly from the repo, and you basically need to have “standard” Python PyPA packaging set up already to create a standard CF Python recipe, so in practice it is extremely rare that a package (especially a commonly used one) is published to CF but not PyPI. And in general still see a lot of PyPI downloads at least in the same OoM as Conda ones, with some of the core libraries like Numpy having far more PyPI downloads than on all CF channels combined (or likely Anaconda base installs).

What is much more common than CF-only packages is that a smaller project will be found only on GitHub and not on either package index. To help account for this wider spectrum of code, I typically suggest complementing top 5000 checks with a code search. GitHub’s code search (even the new version) isn’t really that good for this sort of thing; for searching public code , I use grep.app, e.g. grep.app | code search .

There’s usually a baseline of vendored copies and other random stuff, but you can get a “cleaner” sample by using regex, filtering by path, or even (in this case, with C++) filtering by language, and its easier to quickly check the results—in this case, filtering to the utils and src directories produces a mostly clean sample, as does filtering for only C++. I usually combine that with doing a spot sample of (say) 50 results starting from near the middle and recording the nature of each usage to make an estimate on the whole.

vstinner · July 7, 2023, 8:40pm

For code written behind closed doors, I wrote upgrade_pythoncapi.py script which adds support for new Python without losing support for old Python. It’s still a manual action: you have to run this tool which changes your C code. But at least, you don’t have to audit manually your source code, or look at compiler errors, one by one, to make your C extension compatible with the new Python.

My tool is incomplete and still requires a few manual changes. But I mean that there is a way to help C extension maintainers to ease their life by automating most of this boring work.

Another solution is to write Cython code, and then just re-run Cython time to time, to regenerated the C code with Cython compiler Cython uses the fastest available API depending on the Python version.

encukou · July 10, 2023, 1:26pm

Thanks for writing down the history and reasoning.

the fact that these functons exist prevent Python to evolve.

That’s a legitimate concern, sure, but I don’t think it needs to be solved by proactively removing all problematic API.
If this is “private” API, and we are allowed to remove it without a deprecation period, then IMO we should do that when it starts causing trouble.
Some of the functions you’ve removed are unlikely to cause trouble.

Moreover, it’s a big burden for other Python implementations like PyPy, since they actually have to implement private functions as soon as they are actually used by C extensions.

This is where cpython-compat can help, by providing implementations that rely on public API.
Also, it’s an inconvenience for a limited number of well-maintained projects (PyPy, HPy), which have largely solved this already. We can help them by making sure we don’t add new questionable API, but removing what they already worked around doesn’t seem too useful.

Sometimes, when I see a private function, I don’t know its purpose, I don’t know how it’s used, I don’t know how it’s supposed to behave. It costs me the “Chesterton’s fence” maintenance burden: it takes me more time to think about such private API, compared to when I meet a public API (well defined, documented, tested, backward compatibility warranties).

And so, you take the most drastic action available – removing the API entirely?
I don’t understand.

My goal [is to] clarify the distinction between public and private APIs
I would like to make the C API smaller

That is a good goal, but I don’t think you need to remove API to get there.
The underscore is already a clear marker. So is Py_DEPRECATED. We can combine them. Another idea that was floated around was to add a macro to disable everything that’s discouraged in 2023. But, again, you’re taking the most drastic option available.

I suggest making decisions on a case by case basis

Yes, that would be great.
If I disagree with your decisions, how should I react?

When a new C API is added, an implementation for Python 3.12 can be added to the pythoncapi-compat project.

IMO, that’s a great use case for pythoncapi-compat. It allows one to use the lates t and greatest API even on older Python versions, if you want to.

However, I don’t think the existence of pythoncapi-compat should justify removing old API.
A C library is not an easy dependency to add (and keep up to date). And pythoncapi-compat also needs tests and docs – isn’t the maintenance burden similar to CPython?

Most old API works. It might be inefficient, or use an older naming convention, or be difficult to use correctly, or not be thread-safe, or have weird edge cases, or be a no-op, but if someone uses it despite the shortcomings, I don’t think CPython should force them to rewrite code just because we found a slightly better way of doing things.
Of course not all old API is like that. But most is, IMO.

I think Python breaks too much. The 3.11 update was painful, 3.12 is not much better, and 3.13 is shaping up to follow the trend. Each breakage we make is a reason for someone to discontinue a working library, or abandon Python altogether. Breakage is hurting the project.

Practicality should beat purity. It’s harder that way, but I think not breaking users unless necessary should be much, much higher on our list of priorities. Just because PEP-387 says an API can change without notice doesn’t mean it should be removed ASAP.

Can we find a way to mark old API as discouraged, but keep users’ code working as long as possible?
Can we limit the breakage to API that needs to change to support new optimizations and features?

encukou · July 10, 2023, 2:19pm

It was recently made consistent and well-defined: don’t touch it!
The issue is that it seems unfair users to (ab)use the new definition for API that was added (and used) before the strict definition was in place.

iritkatriel · July 10, 2023, 8:44pm

Could you create an issue? There is a problem here, and we need to include it in the list.

encukou · July 11, 2023, 8:09am

Sure, I filed #58.

malemburg · July 11, 2023, 10:41am

Interesting; I wasn’t aware there already is an effort in this direction. Who are the capi-workgroup members and how is this organized ?

That seems like a good approach, but wouldn’t it then make sense to wait with PRs such as the ones Victor has been pushing forward in recent weeks, until consensus is reached on where we want to take the C API ?

Related to this: We need buy-in from extension writers for the C API changes as well. I don’t see many people participating in creating issues in the above repo - could be just me, but perhaps it’s not well-known enough yet.

I guess I didn’t make my point clear enough: grepping though top Python extensions on PyPI doesn’t give you an indication of whether an API is useful or not.

There may be niche extensions which are not often used, but heavily rely on certain APIs.

Likewise an API has to be consistent to be useful, which means that even though certain parts don’t get a lot of use in the top 5000 PyPI extensions right now, they are needed to make the API complete and future proof.

You are missing the fact that Cython code you write will still tap into the Python C API directly for many operations. It helps with abstracting away module, function and type interfaces, but if you need data level interfacing, which a lot of extensions do, you still have to use the Python C API for best performance.

And if you remove APIs from the Python lib, neither Cython nor your tool will be able to create code which runs against the next version of Python, unless you start maintaining your own vendored versions of those APIs in your extensions.

This is why I strongly believe that Petr’s and Irit’s approaches are better for our eco-system, than outright removing APIs.

We first have to get consensus on whether things should be moving, where the pain points are (not all Python C API create problems for new approaches such as low level object restructuring), and ideally get a decent buy-in from the folks who will have to deal with the fall-out… namely the many extension authors out there.

There are clearly different views on where the C API should go, whether it’s too big, too low level, exposing too many details, etc. It’s also not clear what we want from the C API and the understanding of how important the C API is for the Python eco-system also seems to diverge in several different ways, depending on who you ask.

Consensus will not be easy to reach, but with leadership from the SC and good written perspective of where the C API should heading, I think we can make progress without causing too much friction and a happy community.

If we make the mistake of ignoring the community, folks will move on to other technology. You can already see this happening in the data science space where more and more tools are using Arrow for data storage and Rust for all the processing work. Python is only used as a high level glue language to fit things together.

This could be a valid direction for Python to take, but then we should focus on providing good tooling to make this as easy as possible for those extension writers (esp. those who provide bridges to and from other languages and technologies).

Back when I started using Python, my main attraction was the C API (I was a C programmer at the time). It was clear, elegant and rich. I’d very much like to keep that theme for the future, since C is not going to go away anytime soon and I’m sure it will remain an excellent low level integration language for many decades to come.

encukou · July 11, 2023, 11:42am

It’s an open repo, anyone can comment, add new issues, or edit the wikis.
Irit will write the PEP to summarize the issues, so she gets to moderate and decide what’s in scope.

AlexWaygood · July 11, 2023, 12:06pm

The working group was set up after conversations at the language summit this year – I blogged about those conversations here: Python Software Foundation News: The Python Language Summit 2023: Three Talks on the C API

As Petr says, Irit has been leading the organisation of the discussions following the language summit conversations

vstinner · July 11, 2023, 12:56pm

So far, I helped to update Cython for C API incompatible changes: it’s not only about removed functions, sometimes the API changes for various reasons. Python 3.11 changed many things related to code and frame objects and the Python thread state.

What I like in code generators (compilers?) like Cython is that even if fixing Cython is hard, fixing it once is enough for fix all projects using Cython (like numpy). If people consume directly the C API, I have to fix every single C extension affected by incompatible changes. The C API documentation already advices… no to use it, but to use a higher level API

By the way, Cython has an experimental build option to only use the limited C API: don’t use any private API.

vstinner · July 11, 2023, 12:58pm

The C API working group is working on listing issues. On purpose, it was decided to not work on solutions for now. Well, sometimes the discussion slipped towards actual solutions (which may benefit to better understand the problem).

vstinner · July 11, 2023, 1:00pm

I don’t think that we can prevent or disallow people from using other APIs or programming languages which better fit their needs. Rust is know to be way faster, it’s not only about the C API.

That’s a great usage of Python It’s good to use the best programming language to fit into each use case, there is no silver bullet which fits best for all use cases. So far, I didn’t see many website UI (“frontend”) written in pure Python for example: Javascript seems to still be preferred