The user gets an exception on GraalPy et.al., rather than a buffer of bits in an unexpected format or a read of uninitialized memory.
(Yes, they’re doing it wrong, or taking shortcuts to get a MVP out; version 0 makes such code more debuggable when it makes to production.)
This structure should reduce the performance overhead on small integers.
Compared to PyLong_AsInt64(), it avoids the need to raise an exception if the number doesn’t fit into int64_t. Avoiding exception can also reduce the performance overhead for “large” integers.
It also avoids the need to allocate an array on PyPy for small integers.
If tomorrow CPython changes its small integer implementation to avoid ob_digit, we also avoid allocating an array in this case.
As I said on the PEP thread for import/export strings, my main desire is that I’d like consistency between import/export APIs. I’m not sure exactly how that works out here, but even if we end up with two distinct patterns, whatever is next added should be able to follow a pattern rather than restarting discussions from scratch.
I think we should probably remove PyLong_GetNativeLayout and just have it return the layout struct with the export function. That way we can at least return something sensible for compact ints (8-bit digits, etc.), and can satisfy Oscar’s example of using the new export function as a fallback, rather than trying it first and using conversion functions as the fallback.
Import is a bit trickier. Again, I think the essential point is to allow a round-trip, not to replace all other conversions (PyLong_FromNativeBytes exists for interoperability), so I’d be happy enough to require the layout be passed but reject (at runtime) layouts that we don’t directly support. If “export” wouldn’t have returned it, then “import” doesn’t have to support it.
Again, these functions are the super fast path, but we want the API to be stable enough for the limited API. PyLong_[As/From]NativeBytes exist for “always succeeds” needs - these new functions are “fast if we can, else fail fast”.
Actually python-flint does support PyPy. There are conda packages for some versions and wheels for latest prerelease. We previously did not upload wheels just because I had trouble building them with setuptools but that was one of many problems that seemed to get magically solved by switching to meson-python.
Note that python-flint does not currently access PyLong internals and uses hex strings instead although it was proposed to change that. If it was changed then there would be a #ifdef at least for PyPy. Ideally it would be like #if CPYTHON except that I don’t know what the preprocessor check for CPython is supposed to be.
Yet one good reason to avoid this new structure and PyLong_GetNativeLayout() function. All parameters could be queried at runtime from PyLong_GetInfo() (which should be extended with endian field to fit your need).
Good documentation should help people to avoid common pitfalls on this road (using semi-private macros like PyLong_SHIFT, hardcoding these parameters, etc). Actually, mentioned assumption is correct: I can’t imagine that someday CPython will switch to use different layouts for “big” integers (non machine-sized). But layout for big integers can be changed (e.g. CPython once might start using GMP) — that’s why these parameters should be queried at runtime.
+1
But why not drop then use_value? Lets just check if digits set to NULL — then use value.
JFR, GMP has no function to import (or export) from int64_t. So mpz_import() should be used. I would expect this will be slightly slower than mpz_set_si().
IIUC, then export function in CPython will not raise exceptions for int’s (and subtypes).
Whether it is PyLong_GetNativeLayout() or PyLong_GetInfo() does not change anything regarding whether people will ignore it or not.
If the API already leads them the right way it’s even better. From our experience people tend to write code, run it, see that it works and call it a day. Very few people bother with reading the docs and checking if their code doesn’t rely on some undefined behavior. Some people run tests on debug build.
I think this is so specialized API that: 1) we can expect more diligence from the few potential users, 2) finding and fixing misuses wouldn’t be such a big problem. I think it’s worth considering some ways to make it harder for the users to ignore PyLong_GetNativeLayout() / PyLong_GetInfo() / whatever should be queried, but it is not a major problem in this case.
There are two ways to export/import data from Python objects:
Expose the internal structures and expect C extensions to correctly extract the data, not to hold references to it, or otherwise mess things up.
Expect C extensions to provide buffers that CPython then exports the data to, or imports the data from.
PEP 757 chooses approach 1. Why? The PEP doesn’t say.
Using approach 2, the API would be smaller and the risk of messing up internal data structures lower. It should be easier for extension authors as well.
Don’t pass PyLongLayout by reference; pass it by value.
I don’t think endian is unnecessary. Digits are not composed of bytes, but of larger machine integers, so they don’t really have endianness.
The “Optimize small integers” section.
Differentiate between exporting and importing small ints. Presumably there is no need for a new API for importing small ints.
For exporting, a list of functions is recommended. Just recommend PyUnstable_Long_IsCompact() and PyUnstable_Long_CompactValue(). Everything else will be slower. Maybe we should add them to the stable API?
The “Benchmarks” section
What is the example code being compared to?
Both versions need to be listed. Maybe not in the PEP, but in a supporting document.
Approach 1 seems to be simpler on the CPython side. The actual work to convert between different layouts — do math library, like GMP (other popular bigint libraries have mpz_import/export analogs).
Sergey and me updated PEP 757 to optimize export for small integers: we added an int64_t value member. We also changed names in the API since an export is no longer always a digits array, it can now be “a value” as well.
I would prefer to keep the PyLong_GetNativeLayout() function to have a convenient API in C to get the layout. PyLong_GetInfo() returns a namedtuple where values are Python objects, not C types. Moreover, GraalPython uses a layout different than CPython. PyPy may also want to use a layout different than CPython.
There is exactly one copy for both approaches.
Whether you are copying from the PyLongObject to the buffer, or from the buffer to the PyLongObject, there is one copy.
Although PyLong_AsNativeBytes takes approach 2, it requires two copies to copy the data into a buffer. One from the PyLongObject to the bytes object, and then from the bytes object to the buffer.
I don’t know what you mean by “too complex for general-purpose language”.
Approach 1 may be simpler on the CPython side, but it is more complex for clients.
By not doing the implementation ourselves, we force at least three external libraries to do the work independently.
The PEP and this discussion assume that all big ints, both PyLongObject and external libraries will use sign-magnitude (ones-complement). What about 2s complement implementations?
How would that be supported?
The proposed API prevents CPython from switching to a 2s complement implementation, should it ever make sense to do so.
Probably, you question should be better redirected to @steve.dower, his answer was referenced above:
Well, all three libraries use existing implementation of mpz_import/export() from GMP. Zero work, we just have to provide suitable input for them, instead of private stuff, that used now. The mpdecimal library, for example, also has similar import/export functions. libtommath — too.
Are there examples of such bigint libraries in nature? Sounds rather exotic for me.
Thank you for this point. But right now I’m unsure how strong it’s. Is there other constraints, that proposed API poses for the PyLongObject future, on your view? Lets take first the export interface (PyLong_Export/FreeExport functions), as it was suggested.
Probably, we can workarounded this by using temporary objects & doing arithmetic. Unfortunately, that means — PyLong_Export() might raise exceptions for int’s, which we would like to avoid. (And conversion will be more slow.) But in principle, this can be supported.
A more important consideration is how a library that does not use GMP should do this. The API has been designed for maximum flexibility on the CPython/implementation side under the presumption that the caller can use mpz_import. Unless the caller can handle all of the formats that mpz_import can handle we don’t really get the benefit of that flexibility though.
I don’t think that it’s possible to design an import-export API which is the most efficient for any multiprecision arithmetic library. What we can do is the fit into what’s currently available.
Your quote is truncated, the Python decimal module which is not based on GMP, but mpdecimal, will also benefit from PEP 757 API:
If tomorrow new libraries appear with different design and different API, we can start thinking about yet another new API. But for now, I suggest to focus on what’s currently available.