PEP 757 – C API to import-export Python integers

efimov-mikhail · October 13, 2024, 1:09pm

struct PyLongWriter
A Python int writer instance.
The instance must be destroyed by PyLongWriter_Finish()

Is it correct? It’s not obvious for me from PEP’s text, when PyLongWriter_Discard should and should not been called.

encukou · October 14, 2024, 6:59am

Do you mean that this PyLong_Export will fail with small ints?

That doesn’t sound easier for the user:

Both digits_array and value:

call PyLong_Export
switch on the result: use the data, or fail, or continue:
call PyLong_AsNativeBytes; use the result or fail

With only digits_array:

try first optimizations with PyLong_AsLongAndOverflow
use the compact value, fail, or continue:
call PyLong_Export
switch on the result: use digits_array, or use compact value data, or fail, or continue:
call PyLong_AsNativeBytes; use the result or fail

Conceptually, IMO it’s better for PyLong_Export to give you the internal representation of an int, rather than only do that for “big enough” ints – which suggests PyLong_Export itself should be a fallback for some other strategy.

Please don’t use enum in public API. The size of an enum is rather hard to get correctly without a C compiler, and the API should be usable in non-C languages too. Use an int here with:

#define PyLong_ExportType_Error -1
#define PyLong_ExportType_DigitArray 0

skirpichev · October 14, 2024, 7:33am

That’s optional. E.g. current gmpy2 code just uses ob_digit.

But we don’t have yet a really different representation for small integers. As you can see,

static inline Py_ssize_t
PyUnstable_Long_CompactValue(const PyLongObject *op)
{   
    Py_ssize_t sign;
    sign = 1 - (op->long_value.lv_tag & _PyLong_SIGN_MASK);
    return sign * (Py_ssize_t)op->long_value.ob_digit[0];
}

So, proposed version with just one export format — covers all integer ranges. Someday this might fall for small integers, then we have to introduce a new export format.

encukou · October 14, 2024, 8:42am

If that’s the case, gmpy2 should only need PyLong_Export – it should not need to use PyLong_AsLongAndOverflow at all. Otherwise, PyLong_Export fails its mission to give you the data in the fastest way we can (with long-term support).

I guess that’s one performance benchmarks. Is skipping a PyLong_FreeExport call worth the overhead of an extra branch? (Plus all the other differences, but this seems like the main one.)

This being rather specialized API, I don’t think ergonomics are an overriding priority, compared to:

C standard/compiler option support (⇒ name the union)
the shape of the API suggesting how it’s intended to be extended in the future (⇒ separate “compact int” case). Note that this affects the user code (a switch with 2 existing cases is easier to extend if/when a new format comes up) and docs (with 2 cases they would IMO be clearer, even if they’re longer).

skirpichev · October 14, 2024, 9:46am

I would rather say that it’s mission is to provide maximally transparent access to int’s internals, especially in case of “big” values, where no public API exists yet. Currently it’s just an array of “digits”. I think we should create a new export only when this simple picture will be broken.

No, PyLong_AsLongAndOverflow provide some performance boost in case, when long value doesn’t fit in one digit.

Probably it’s better to leave some micro-optimizations for small integers to users. E.g. previously one was rejected in the gmpy2 Update int <-> mpz conversion by skirpichev · Pull Request #484 · aleaxit/gmpy · GitHub

I think it’s documented, no need to guess.

vstinner · October 14, 2024, 10:11am

Hi,

I didn’t reply recently since I got overwhelmed by the discussion. I expected that the C API Working Group would take a decision, either accept or reject the PEP, rather than restarting the discussion (almost from scratch). The current PEP 757 design is the result of many months of work and tries to address most known issues. It’s not perfect, but it’s the best that Sergey and me could propose.

Sergey: I don’t understand why you accepted to add PyLongExport.value and now you are moving backwards and argues against it. There are reasons why we added this member:

We want to treat compact integers differently and having an int64_t is a nice trick to handle them. PyLong_Export() caller doesn’t have to attempt different APIs to check if it’s a “small” or “compatct” integer, it’s part of the API.
Maybe tomorrow, compact integers will not have an internal digits array and we want to avoid allocating memory in PyLong_Export() for best performance.

About preparing the structure for a potential future different export format: I really dislike this, I don’t want to waste memory until it’s needed (adding a reserved member). If tomorrow PyLongExport structure no longer fits, we should deprecate the PyLong_Export() function and add a new one. It’s something very common in Python C API to deprecate functions and replace one function with another. Trying to make PyLong_Export() “too future proof” would make it really hard to use in practice.

I has already been said, PyLong_Export() format relies on the “native layout” which is flexible enough to support any digits array. It sounds very unlikely that a new “bigint” implementation doesn’t fit with its design.

I also dislike changing PyLong_Export() return value to return a kind which can be a new kind in the future. It sounds very unconvenient to have to support future hypothetical formats without knowing them in advance. It was proposed to kill the process in this case: I dislike that I don’t think that falling back on PyLong_AsNativeBytes() makes sense here. If you already support PyLong_AsNativeBytes(), just call it instead of PyLong_Export().

Last but not least, @encukou wants to remove the “always succeeds if it’s an integer” sentence. If tomorrow, Python internals evolve, maybe PyLong_Export() has to allocate memory and so can fail. That’s why it’s paired with PyLong_FreeExport(). I’m open to remove this sentence, I created a PR for that, but @skirpichev doesn’t seem to embrace it yet (at least, he didn’t approve it yet).

In short, I spent enough time on PEP 757, its API is not perfect but I don’t want to rewrite its API anymore. If you have question about why it’s designed this way, just ask. If you consider that the API is not good enough for your criteria, maybe we should just give up on PEP 757 abstraction and leaves the situation as it is: gmpy2 and SAGE use CPython internals and don’t support PyPy, and Python-FLINT uses an inefficient hex export (and so support PyPy). These projects are working well with the current code, so maybe it’s not worth it to provide a “better” API.

I propose to only make a single change to the current PEP 757: remove the following sentence.

This function always succeeds if obj is a Python int object or a subclass.

pitrou · October 14, 2024, 10:15am

No, I think we don’t want to have to introduce new CPython APIs each time an implementation details changes. We would like third-party libraries to be able to get good performance while not changing their code every two years to accomodate for internal CPython changes.

Also, ideally this API would end up in the stable ABI, which requires that it’s not too coupled to internal implementation details.

efimov-mikhail · October 14, 2024, 11:08am

In general, I’m agree with you.
But what do you think about my question on PyLongWriter_Discard?

skirpichev · October 14, 2024, 11:24am

I was thinking that this trick allows us to have both (1) the fixed PyLongExport structure and (2) the PyLong_Export() API, that doesn’t fail.

But that simple picture was objected. It seems, to address @encukou criticism and save (2) - the PyLong_Export() must return some code for the export type we offer (for the given integer value). With new CPython release - new export types might be introduced.

So, with this - now there is no real need to include anything else than simple “array of digits” view. Currently, it’s an optimization, that has some benefits and some drawbacks (that actually will depend on used bigint library). Lets offer just a simple view for our end users and let them do micro-optimizations for their workload.

@encukou wants to remove the “always succeeds if it’s an integer” sentence.

Export API, that fails - doesn’t make sense.

But I thought that this @encukou objection is gone with addition on the export format type. Is this not true?

See referenced above pr.

steve.dower · October 14, 2024, 2:18pm

Like the Unicode export, this isn’t an export API, it’s an efficient access to internals API. If we can’t provide efficient access, we should fail so that you can go use an API that will succeed.

We have hundreds^[1] of ways to reliably export from a Python int, but none with anything approaching guaranteed performance characteristics. This API can guarantee that the time taken does not scale with the value, but the tradeoff is that if we can’t guarantee that, we have to fail rather than silently doing it the slow way.

If you just want a successful export, use AsNativeBytes, but it might be slow for some values.

If you want a fast export, use Export, but it might fail for some values (at some point in the future).

(FWIW, I agree with expanding the return value, provided we require callers actually use the value now. So the switch () case Py_Long_Specific_Macro example is fine, but a >= 0 check is not. That way we can keep the ABI stable, change the format, and not break anyone as long as we don’t expand the total size of the union.)

Okay, maybe like 15. ↩︎

oscarbenjamin · October 14, 2024, 7:51pm

I lost track of this for some time during the discussions above but I’ve just read through the PEP and PR again.

The API as shown looks good to me.

vstinner · October 15, 2024, 12:27pm

We made this change: PyLong_Export() can now fail

vstinner · November 28, 2024, 3:22pm

The PEP was accepted by the C API Working Group.

The PR gh-121339 is now ready for your final review

gpshead · December 1, 2024, 7:20pm

I’m reading the PEP and treating it a bit like the ultimate API documentation that I expect to come out of an implementation (i’m not looking at the PR for now). I think my main issue is the export API feels overly complex and if it is going to be, we need to explicitly capture why that is in the PEP. Otherwise beyond that, I’m highlighting things that could be described better but the APIs feel fine.

Export API

There are two cases:

If digits is NULL, only use the value member. Calling PyLong_FreeExport() is optional in this case.

The feels like a footgun adding needless complexity to an API intended for use primarily with other bignum libraries. It means that all users must now use a conditional to know which data to use and some may think they’re clever by avoiding a call to FreeExport each of which adds conditionals and branches to something that could be done without those. Why are there two cases? I’d simplify this to get rid of .value in favor of always using .digits and always require FreeExport to be called afterwards.

If this was somehow aiming to be efficient, it is too complicated. Keep the simpler API and if you want efficiency for the small case, calling any of the PyLong_As*AndOverflow APIs first and using the result if (!overflow) is a much simpler conditional before calling a digits export API. That way you don’t force people to handle multiple formats and it does not smell like an API of our internal implementation detail of the moment showing. The gmpy2 example would be simpler this way, only two mpz_ calls instead of three.

If there are compelling reasons why the Export API should be this complex, please document why and include a Rejected Ideas section of making it simpler as described.

Import API

PyLongWriter_Create

Document the intent of the function and why it works the way it appears to. I had to read this section multiple times to understand what it meant to be a “writer” and that it wasn’t just a simple API taking an array of digits as input in one call, but was instead an API allocating space for the caller to then later go and fill in digits themselves. This needs documentation as to why it works this way with an example being provided.

Do we do any validation of the digit values filled in? ie: What happens on Finish if values larger than 2**bits_per_digit are present in the digits array? Document that. That being undefined behavior is okay so long as we state it. (Implementation wise, I might leave an assertion loop in place on !NDEBUG builds myself)

ndigits is the number of digits in the digits array. It must be greater than or equal to 0.

What does it mean to have 0 digits? That sounds like not-a-number. Document exactly what 0 digits means and when 0 digits would be used. Is *digits written to when digits == 0? Is this just a way to say 0 via this API without filling in a digit?

The caller can either initialize the array of digits digits and then call PyLongWriter_Finish() to get a Python int, or call PyLongWriter_Discard() to destroy the writer instance. Digits must be in the range [0; (1 << sys.int_info.bits_per_digit) - 1]. Unused digits must be set to 0.

Don’t refer to a value only obtainable from Python. Presumably you meant to reference the added PyLong_GetNativeLayout struct’s bits_per_digit here.

PyLongWriter_Finish & PyLongWriter_Discard

explicitly document that these calls are mutually exclusive. The PyLongWriter is invalid after either call.

In general I agree that official Python C APIs to better play with other bignum implementations have been a long time coming.

I don’t necessarily like that we expose internal Python bigint implementation details, but this keeps the code maintenance burden on our side easy. If, for example, we were to abandon our 15/30bit digits approach to bignums in favor of a classic pure base 2 bit array, these APIs would still be usable. As the bits_per_digit concept can also represent that.

The only thing we’d be unable to do is stop using a binary-multiple base. But that is extremely unlikely to ever be desired as hardware always works in binary-multiple bases and there are no signs of that changing. So I consider it a non-issue.

skirpichev · December 2, 2024, 4:30am

This was suggested before in this thread, here. And implemented on the PEP side in PEP 757: edits, based on C-API WG feedback by skirpichev · Pull Request #4026 · python/peps · GitHub, this commit: PEP 757: edits, based on C-API WG feedback by skirpichev · Pull Request #4026 · python/peps · GitHub (it was reverted later).

You miss other operand to provide a comparison. Take look on the implementation and compare one e.g. with PyLong_AsNativeBytes/FromNativeBytes. Actually, PEP has such comparison in the “Rejected Ideas” section.

The main drawback is that we let this optimization up to user.

In future we might loose simple the “array of digits” picture for CPython integers. Then, to provide this view we will need to do temporary memory allocation. Thus (1) export might fail, (2) we can introduce speed regressions for users of this API. Perhaps, the (1) point is not a problem anymore, as it’s documented that API might fail for int’s (that unfortunate “feature” is essentially for to be able to deprecate API). But (2) - still is. The pros of the current API is that it can provide an efficient export for int’s — forever! API users haven’t to follow CPython development: the value and the digits views cover all possible future changes for int’s internals.

In principle, I’m open to change reading API like described above (with return value of the PyLong_Export being the “kind” of export, etc). On pros, then we can state (as CPython implementation detail) that API will not fail on int’s (but might return a new kind of export with new major release). But I guess that @vstinner will be -1 on this.

The rest of review should be addressed in PEP 757: address SC review by skirpichev · Pull Request #4144 · python/peps · GitHub

I don’t think we do this. Modulo you example with potential non-binary arithmetic in hardware. In principle, we can generalize API to permit this case as well. But note that while the MIX wasn’t assumed to have binary arithmetic - the MMIX has.

Other (mentioned earlier) restriction is using the sign-magnitude representation for “big enough” integers. But that’s a common denominator for all arbitrary precision arithmetic libraries (2s complement being used for integer types with fixed precision).

gpshead · December 2, 2024, 5:14am

I’m not sure this is really a strong drawback? I’m assuming the likely number of users of this C API can be counted on approximately one hand. It feels like it is for a small set of numeric libraries so that they can stop poking at our internals.

I do like that the API has the option to return an error when it cannot be done efficiently in the future.

Regarding (2)… there’s a couple ways around that memory allocation problem when we’re storing a small integer in a native form internally without a digit array allocated: Either (a) error out in that case (the caller should’ve done a PyLong_As…) or include some extra space as a private field in the struct PyLongExport such that space never needs to be allocated for small ints when there is no backing digits array. *digits just becomes a pointer to that extra space, where you’ve spent a little time storing the int64_t in digit form.

Either way the pattern would be the same: document that people should do their PyLong_As*Overflow() call first, followed by PyLong_Export() if necessary, followed by FreeExport or if it errored, I guess AsNativeBytes…

Did you measure a gmpy2 variant using PyLong_AsLong*Overflow followed by PyLong_Export for comparison with the current gmp example?

If these were discussed earlier, great, I just want to see it all captured within the PEP.

Anyways if y’all do want to keep the more complex struct having users run that conditional logic themselves. At least make the reasoning why more clear up front in the PEP.

The number of conditionals in user code using this API is going to be +/- 1 close enough to equivalent regardless of exact design.

naming thoughts (meaning we’re at the end of my bike shedding?)

struct PyLongLayout - change .endianness to be .digit_endianness for clarity.

struct PyLongExport and PyLong_Export() are both pronounced the same if trying to have a conversation. I don’t object because I don’t have actually better ideas. I’m just imagining a “Who’s on first” skit.

skirpichev · December 2, 2024, 6:05am

I don’t think so.

But lets see what Victor think about this change. I also worry, that essentially restarts C API WG discussion again (Having some form or this API in 3.14 will be great.)

The cons is that current gmpy2 (and Sage) code assumes (and that’s is natural), that no errors happens while conversion from the int. Handling errors complicates patch a lot.

BTW, can we change things, documented as “CPython implementation detail” with a new release? I meant does this require usual deprecation cycle, etc?

Obviously, numbers will be same (if current using PyLong_AsLongAndOverflow will be removed from the PyLong_Export).

Yes, I dislike that too. Maybe PyLongExportView for the struct name?

vstinner · December 3, 2024, 2:20pm

There are two motivations for .value:

Make the API future-proof. If tomorrow, Python changes its Python int implementation to no longer use an array of digits but something else for small integers. PyLong_Export() would need to create such array on demand which would be inefficient and can fail (memory allocation failure, even if it’s very small).
It’s the fastest way to export small integers. We ran benchmarks on other existing PyLong functions and they were slower.

I’m not sure that PyPy can expose its internal digits array (since objects can be moved in memory in PyPy, because of its moving GC), so .value may reduce the PyLong_Export() cost on PyPy (no need to create an array for small values).

The behavior is undefined. You may get a crash or dragons may appear

ndigits=0 is accepted to be convenient: it creates the number zero. I’m not sure if it’s a good idea in the long go, maybe we should just reject ndigits=0 since it’s unclear what digits contains in this case.

gpshead · December 8, 2024, 4:22pm

The steering council discussed and has decided to accept PEP 757 per the C API Working Group’s recommendation. Thanks for drafting it with explanations as to why it is needed and providing usage examples.

(and thanks for bearing with my own personal questions about the PEP and documentation requests which you turned into follow-up PRs!)

oscarbenjamin · December 9, 2024, 12:29am

Excellent news. Thanks everyone and especially Victor for working on this!