Ideas for forward-compatible and fast extension libraries in Python 3.12

Objective-C/Cocoa has an NSFastEnumeration protocol that provides a middle ground between the current protocol and your proposal: the caller provides a buffer and an iteration state and calls that API in a loop to avoid having to copy the entire contents into a temporary buffer. Basically PyDict_Next, but with more than one item at a time.

Out of curiosity, what happens there if the sequence changes while you’re iterating?

But the bigger question is why. What’s the use case for external API that avoids full materialization of a generic sequence?
For the Limited API specifically, it doesn’t sound like a good fit.

To be honest I don’t know what happens when the sequence changes during iteration. The Cocoa protocol does not have a way to signal errors, and in Cocoa exceptions are generally not used other than to signal programming errors.

A use case for avoiding full materialisation is that this might use a lot of memory. E.g. when iterating over all elements in a sequence of a couple of million entries. In the grand scheme of things that’s still not a lot of memory, but using a smaller buffer might allow using a stack-based buffer and hence avoid dynamic allocations.

1 Like

Windows also uses this pattern fairly frequently (example), mostly because many implementations might have long roundtrip times (e.g. there’s a “find files” one somewhere that might involve network calls), so it lets the caller decide how much latency they care about.

A “PyIter_GetNextFew” API would probably fit the bill, and should certainly be useful. If we wanted to specialise by taking a (single) ParseArgs-style format character to extract raw values as well, that could be nicely efficient for a lot of common cases.

1 Like

No repeated dynamic allocations. If size is not known in advance, it’s not a sequence. Maybe iterators need a PyIter_GetNextFew, but that’s quite different from handling lists and tuples.

The usage I have in mind here is:

  • use PySequence_Size()
  • allocate the buffer with that size
  • pass the allocation size to PySequence_IntoArray, as a check against modification or a Python subclasss that overrides __len__ to not match C data. (In both of those cases you get an exception).

Haven’t we defined that as undefined behaviour? So basically “don’t do it” but also “you’ll find out when it happens”.

The “get many” style APIs usually return the number of actual items retrieved, so if the end of the sequence is reached early then you just don’t get a full result (like a, and if the sequence changes then you get weirdness dependent on how we’re internally iterating.

But we have to leave it undefined, because there’s no way to batch iterate without changing the behaviour of this situation. If we clarify the exact thing that’s undefined though, users can work with that.

1 Like

Hm, it’s becoming clear that we’re talking about different things. IMO a PyIter_GetNextFew-style API isn’t a good fit for the Limited API or lists/arguments tuples, but might be helpful for streams/iterators.
Could you start a new topic for it? (It’s possible to ask admins to split this one but that would be confusing in this case.)

I don’t see how it isn’t, but maybe you’re assuming that it would look different from your IntoArray suggestion, when both Ronald and I are saying that the existing patterns for this exist and are basically the same as that proposal.

The only difference is that they contain a bit of state about the start point (which could equally become a parameter, with some loss of generality), and don’t require you to convert the entire sequence in one go.


Why isn’t a PyIter_GetNextFew-style API suited for the limited API? It can have a well defined semantics that’s suitable for other implementations as well.

The only clear advantage I see right now for PySequence_IntoArray is that it can be implemented as an atomic operation for builtin sequence types, while a PyIter_GetNextFew-style API could see updates to the sequence during iteration.

There are other differences as well of course, but for most of them there IMHO is no clear winner either way (“it depends rulez!”).


Out of curiosity, could you elaborate on that? I don’t immediately see why such an API it isn’t a good fit.

1 Like

There are lots of details to be decided, which means we can make bad decisions on them:

  • Who initializes/allocates the iteration state?
  • Who deinitializes/frees the iteration state? How?
  • Does the iteration state hold a strong reference, or can it outlive the sequence? (See PyBuffer for possible prior art)
  • What are is the behavior with mutated sequences? (Crashes, exceptions or just weird results? “Undefined, may eat your socks” doesn’t cut it. Also, what will common types do?)
  • Should this even be sequence API, rather than iterator API?

IMO, it should be tried in the non-limited API, where we can deprecate/remove things in a few years rather than maybe decades.
Even for the non-limited API, this sounds like premature optimization to me. I’d love to see a concrete use case or some evidence of persistent low-key demand from projects.
And IMO it does deserve its own Discourse topic, even if it does end up making IntoArray redundant. (I won’t get to seriously proposing IntoArray any time soon, anyway).

Why not expose an iterator from an object that has an internal buffer? A configurable buffer, if people need to tweak it?

What are these common cases?
This is where I started doubting that we’re still talking about API for generic sequences.

Iterate a sequence of strings and get them as const char *. Or iterate a sequence of numbers and get them as int or long long or Py_ssize_t. Or at its most generic, iterate a sequence of objects and return them INCREF’d.

I mean, that’s probably what they do, it just isn’t (always) configurable by users so that it doesn’t form part of the contract. There’s not even a guarantee that receiving fewer than the requested amount means it’s the end of the sequence, so if your internal buffer has 2 elements and the caller requests 3, you can give them 2 and start refilling the buffer without blocking.

I’m not saying we can (or should) do this with CPython, just that the API is proven in a variety of situations and with plenty of experience making non-breaking changes to the implementation.

All of these are questions that have been answered by the existing iterator API, which is part of the limited API already. It’s the IntoArray proposal that needs to come up with new answers to each of these, and so is the higher risk.

But if you’re insistent that this split out to a new topic, I’m happy for an admin to do it. I propose the title “Ideas for forward-compatible and fast iteration in extension libraries in Python 3.12”, since that’s what we’re proposing :wink:


My understanding of the original request was that the issue (or fear) is that iterating over a tuple by issuing an API call for each element may be too slow for some use-cases. It seems to me that API for iteration may be useful, but it does not solve the original issue?

I do not see any other way than that the user would do one API call and get back something that is not opaque and will directly allow to go over all the elements (or a subset). Whatever it is it will have to be part of the ABI in order to avoid further API calls. It may be something more “encapsulated” than an array of PyObject*, but such “encapsulation” would be just “syntax sugar” (i.e., macros or inline functions), under the hood, from ABI perspective, it will not be opaque. So I do not think it is worth trying to do anything more sophisticated than an array in the end. I am +1 for the suggestion of passing a pre-allocated buffer, because then the caller can iterate potentially large sequence efficiently by using a small-ish stack allocated array.

Regarding Extending opaque types: there is some discussion about this in HPy too: Support metaclasses in HPy. by fangerer · Pull Request #335 · hpyproject/hpy · GitHub. I think we’re going to face similar issues.

I’ve proposed PEP 697. Could you please check it out, and comment on its topic?


I have created issue Add vector call functions to the limited API · Issue #98586 · python/cpython · GitHub and PR gh-98586: expose more of the PEP-590 vector call API by wjakob · Pull Request #98587 · python/cpython · GitHub to address point #5 from the list in the first post (The ability to issue vector calls from binary extension modules when compiled for the limited API).

1 Like

@encukou, @markshannonwindows-team was added as the sole reviewer for this PR (likely based on some automation), which does not seem like the best choice. I don’t think I can add extra reviewers myself. Could I ask you to add further reviewers? Thanks!

Link: gh-98586: expose more of the PEP-590 vector call API by wjakob · Pull Request #98587 · python/cpython · GitHub

More reviewers have been added.

Thanks. It was merged yesterday, so checkbox 5 can be crossed :tada:

1 Like
  1. PyType_FromMetaclass is in 3.12
  2. Extending opaque types is in 3.12! (PEP 697)
  3. Receiving vectorcall is in 3.12
  4. sequences as arrays is still low on my TODO list. (To discuss it please open a new topic, the discussion is too buried here.)
  5. Calling vectorcall is in 3.12
  6. Interned strings – not needed for vectorcall. It would be good to come up with a good, performant public API for string constants (a const char* and lazily allocated PyObject that’s either per-interpreter or immortal), but it’s low on my TODO list.

Cool, I missed that PEP 697 was adopted in the meantime. We will start using it as soon as the 3.12 beta goes out. (Use PEP-697 interface to access stashed data in type objects by wjakob · Pull Request #211 · wjakob/nanobind · GitHub).