PEP 757 – C API to import-export Python integers

vstinner · September 14, 2024, 9:35am

Read the PEP: PEP 757 – C API to import-export Python integers | peps.python.org

Previous discussions have more than 150 messages which makes it difficult to read/navigate. I wrote PEP 757 to summarize the discussion. It might be easier to make a decision on a PEP.

For this second C API Working Group PEP, I also used: PEP-Delegate: C API Working Group.

Abstract

Add a new C API to import and export Python integers, int objects: especially PyLongWriter_Create() and PyLong_Export() functions.

Open Question

Should we add digits_order and endian members to sys.int_info and remove PyLong_GetNativeLayout()? The PyLong_GetNativeLayout() function returns a C structure which is more convenient to use in C than sys.int_info which uses Python objects.

pitrou · September 14, 2024, 10:46am

Something I don’t understand: the PEP claims that it doesn’t “expose implementation details”, but the PyLong_DigitArray struct has a const void *digits that points into the PyLong object’s internal representation, right?

oscarbenjamin · September 14, 2024, 11:21am

The internals are exposed in one sense since the pointer is exposed. However the API allows the internal representation to change without breaking code that calls it so it does not limit internal changes in the same way that simply exposing that pointer would.

Firstly a future implementation could just allocate that array on demand rather than exposing a direct pointer. Even if the future internal representation was incompatible with the possibilities encompassed by the parameters in PyLong_Layout there would certainly be a need to have functions that could convert the representation into one of those formats (this would be needed internally in CPython if nothing else).

Secondly PyLong_Layout allows a broad range of formats for the data that is pointed to by *digits so if downstream code uses PyLong_GetNativeLayout in the intended way then it would be possible for CPython to use pretty much any reasonable representation of integers greater than word size. The only way I could imagine that future Python/CPython implementations might be fundamentally different from the *digits type representation would be if small integers had a different representation somehow.

One thing I’m not sure about is how to interpret the benchmarks listed in the PEP. What exactly is being compared? They don’t seem to show any speedup so is it supposed to say that this API is not slower than accessing internals directly?

I think that the API in the PEP looks fine. Many alternative APIs have been discussed that also looked fine to me. I don’t think that it is worthwhile to further debate the exact names of these functions.

The real question here is just whether the API is worth having. It is a niche use case that consists of libraries that have their own alternate representations of integers. So far such libraries have used the internals apart from python-flint which uses hex strings but the plan there would be to use the internals if these APIs are not added.

Is it so bad that a few libraries use the internals? If yes then we need this API. Otherwise if no then the situation will be that a few libraries access the internals.

oscarbenjamin · September 14, 2024, 11:36am

One thing I will say actually is that it is awkward that PyLong_GetNativeLayout() can only provide the information at runtime rather than at build-time. I’m not sure what advantage that provides in flexibility on the CPython side but it is awkward on the library side because we won’t want to call this function every time so it requires managing some global state. If the layout exposed by the API is something that is fixed at CPython compile time then it would be better if it could also be fixed at compile time for the library.

pitrou · September 14, 2024, 11:37am

Except that it’s expensive to do so, and doubly expensive when it comes to managing the on-demand allocated array’s lifetime. Since we’re designing a new API for performance reasons, this sounds like a bad idea.

It does, but the proposed API doesn’t allow for different PyLong objects to have a different layout, which also limits possibilities.

Which would be quite a reasonable evolution. For example, storing the sign separately from the absolute value is not really optimal for one-digit integers.

Another concern is that accessing internals makes those libraries CPython-specific.

However, the PEP doesn’t seem concerned with how easy it is for third-party Python implementations to implement the APIs. @vstinner Should that be added?

pitrou · September 14, 2024, 11:38am

Well, at least it allows for these APIs to be eventually part of the stable ABI.

oscarbenjamin · September 14, 2024, 11:54am

Agreed. Note though that the expected usage for python-flint at least would be like:

int overflow;
long long x;
x = PyLong_AsLongLongAndOverflow(obj, &overflow)
if (overflow) {
    /* Use PEP 757 API */
}

This is because python-flint represents integers smaller than 2^{62} inline like tagged pointers so we would want to bypass this API and the mpz structure altogether if possible. The fast path for small integers would not use this API and would hopefully get us very directly from CPython’s internal small integer representation through to FLINT’s representation with only a few bitwise operations.

skirpichev · September 14, 2024, 12:06pm

Correct. But this exposes only fact that “digits” are kept in some contiguous “array of digits”. Size of “digit” or number of bits, which actually are used in the “digit” — are not exposed.

I think, no.

It doesn’t make sense to have different layouts for big integers. Current API targeted to this use case, small integers should be handled by other API (see examples from gmpy2).

If CPython someday adopt a different layout for small integers — proposed API allows to emulate the layout for big integers (at cost of some temporary memore allocations, etc).

In fact, I think that PyLong_Layout structure and PyLong_GetNativeLayout() — just minor convenience and could be removed. The PyLong_GetInfo() function allows to access digit_size and bits_per_digit. The rest (i.e. endian and digit_order) — is a common denominator for all bigint libraries.

They compare import/export of integers in the current gmpy2 master vs my pr, which uses proposed API. It adds some overhead c.f. direct access of private API. Yet, in few cases new code faster, but due to using other API (e.g. PyLong_AsLongAndOverflow).

vstinner · September 14, 2024, 12:17pm

Benchmarks compared the current gmpy2 implementation which access directly PyLongObject members and PEP 757 abstraction. It measures the cost of the abstraction.

One of my motivations for this PEP is to remove _PyLong_New() function in the long term, a function which creates an int objects in an incomplete/undefined state. I prefer PyLongWriter_Create() API which only creates a Python int object once all data is filled, and digits are normalized before the object is created.

These libraries had to be updated at Python 3.9, 3.12 and now 3.13 (before I restored _PyLong_New()). I would like to provide a public stable abstraction for them.

I have no information on that question. Do you?

pitrou · September 14, 2024, 1:29pm

Neither do I. Perhaps we should ask them?

skirpichev · September 14, 2024, 1:32pm

I did update for this section; my proposal actually more like as: “Currently, all required information for int import/export is already available via PyLong_GetInfo() or sys.int_info. Native endianness of “digits” and current order of digits (least significant digit first) — is a common denominator of all libraries for aribitrary precision integer arithmetic. So, shouldn’t we just remove from API both PyLongLayout and PyLong_GetNativeLayout() (which is actually just a minor convenience)?”

Initial discussion shows that this might clear some questions. For example, that proposed API " doesn’t allow for different PyLong objects to have a different layout". In fact, the “native layout” — assumed to be valid internally only for big enough integers (and parameters for this layout could be queried at runtime with PyLong_GetInfo()). Small CPython’s integers could use a different layout internally. The API is able to emulate single layout view for caller in this case. At cost of memory allocation for temporary buffers, etc. Users probably will chose different C-API functions for this range.

So, this API poses no constraints on future optimization for small integers or for using a different layout for big integers (say, someday CPython will use GMP in this case).

vstinner · September 16, 2024, 12:23pm

I discussed with PyPy developers. First of all, the PyPy C API has no PyLongObject.ob_digit member, so gmpy2, SAGE and Python-FLINT don’t support PyPy currently.

PyPy int objects can be moved in memory (moving GC, Python int objects cannot be pinned in memory), so a memory copy is needed to export an array of digits. IMO that’s not an issue, it’s ok to copy memory in the PyPy case.

Current, int.to_bytes() and int.from_bytes() can be used to export/import Python int objects.

steve-s · September 16, 2024, 12:26pm

Let me summarize the API to check I understand it correctly:

There is a function that gives me “native” layout for integers – this is given by the Python runtime and extensions should query and follow this format for API calls. The representation is fixed to be an array of binary digits of some size, what can be configured is the digit size, endianness and order of the digits.
There are functions to fetch the digits from python integer and to create a Python integer from the digits

In JVM based GraalPy, we use Java’s BigDecimal for “big” integers, which has compatible internal representation: 4 bytes per digit, big-endian, most significant byte is in the zeroth element (docs). So conceptually at high level this API is a good fit.

One worry that stems from our experience with people relying on CPython implementation details: people will assume CPython’s PyLongLayout without even querying it and will just use the other APIs. From this avoid API-misuse point of view: it would have been better to pass the layout explicitly, but I also agree that supporting non-native layouts is functionality that the API and Python runtime should not provide. It would be great if at least in debug build CPython would behave differently, like changing the order, so that at least people testing on debug builds would get some indication early. Another idea: check if PyLong_GetNativeLayout was ever queried before any of the other APIs calls are made.

Technicality that probably just needs more explicit documentation: what is the life-time management of the memory pointed by the ptr returned by PyLong_GetNativeLayout. Is it supposed to be static memory? Can it change between sub-interpreters? Is it OK to cache it in global variable? When embedding Python: it is OK to access it once the interpreter was shutdown.

vstinner · September 16, 2024, 12:30pm

It’s supposed to be static memory. It must not change between sub-interpreters. It’s ok to cache it. It’s not ok to call the function before Python initialization and after Python finalization, as any C API function (unless the doc explicitly says that it’s ok).

Should the PEP mention that? If yes, can you suggest a sentence to explain that?

encukou · September 16, 2024, 1:20pm

A previous draft had the user pass in a PyLongLayout to better ensure the format matches. That would be too expensive to check, so we switched to a single global layout. But in doing that, it’s now easy to skip checking whether the user “understands” the format.

I suggest bringing that check back, in a minimal form – a single “version” integer:

typedef struct PyLongLayout {
    uint32_t version;

    // Bits per digit
    uint8_t bits_per_digit;

    // Digit size in bytes
    uint8_t digit_size;
} PyLongLayout;
// we drop the endiannes fields; we can bring them back in the future.

const PyLongLayout* PyLong_GetNativeLayout(uint32_t version);

And these functions would take a version:

int PyLong_AsDigitArray(
    PyObject *obj,
    PyLong_DigitArray *array,
    uint32_t version);
PyLongWriter* PyLongWriter_Create(
    uint32_t version,
    int negative,
    Py_ssize_t ndigits,
    void **digits);

The versions would be:

version 0, single hardcoded value: bits_per_digit is 30, and digit_size is 4.
version 1, generalized: bits_per_digit and digit_size can have any value.
future versions reserved for extensions – for example, PyLong_GetNativeLayout could return a bigger struct PyLongLayout_v2 with something like endiannes, or two different formats and a cut-off point.

By passing 0, the user says they can only handle 30-bit digits. They don’t need any check beforehand, and they also don’t need a comprehensive test matrix. As in all cases, they should be prepared to fall back to PyLong_AsNativeBytes. (This option allows users to take shortcuts and still use the API correctly.)
By passing 1, the user says they’ve checked the PyLong_GetNativeLayout(1) result, and are prepared for Java’s ints, non-default CPython builds with 15-bit digits, etc.
By passing 2 or more, the user says they checked the PyLong_GetNativeLayout(ver) result, and it is compatible. (Note that the version in that result can still be 0 or 1!)

30-bit-digit CPython would allow any version; GraalPy would allow only 1+.

I’d rather make it part of runtime state, i.e. the pointer might not stay valid after finalization and reinitialization.

Proposed API are efficient for large integers. Compared to accessing directly Python internals, the proposed API can have a significant performance overhead on small integers.

I’d add:
The cut-off point for “small integers” is not exposed by this API, and it might change in future versions.
We suggest that extension authors either benchmark with target versons of Python implementations and choose suitable values themselves, or use version-specific API like PyUnstable_Long_IsCompact.

steve-s · September 16, 2024, 1:45pm

From our experience, this is not realistic. Most people using 0 will not take care of the fallback if it doesn’t trigger on the current CPython. What is the purpose of 0 – a default usable without fishing out or even knowing about PyLongLayout?

What if tha API still takes PyLongLayout*, but:

we make the check fast
the only way to construct a layout would be by calling PyLong_GetNativeLayout

This will leave the room for possible new layouts. The layout should have some version field in it in any case, but for the time being the check can be even simpler: compare with the static memory address that’s internally used – this would prevent the users from creating PyLongLayout manually, because it would be impossible to create PyLongLayout that passes this check. The error message would tell the user to use PyLong_GetNativeLayout.

For the possible future when there are more versions: the version field should not start at zero or one, but at some magic constant, so that without looking at the docs/CPython code you have no chance of properly initializing it and when you start looking at these things you hopefully realize what should be done instead. Then the check can still be comparison to the static memory address(es) for small finite number of versions, but also check of the version field.

steve-s · September 16, 2024, 1:52pm

my 2c: it would make sense to me to to keep it.

pitrou · September 16, 2024, 2:31pm

This seems overly complicated.

This API is mostly for expert use, and especially for a few select third-party libraries for which fast bigint access is important. It seems reasonable to tell them that not checking the layout is a bug in their code.

Or we generalize the API, for example:

typedef enum PyLongExport_Kind {
  PyLongExport_NativeInt = 1,
  PyLongExport_DigitArray = 2
};

typedef struct PyLongExport {
  int32_t kind;  // a PyLongExport_Kind value

  union {
    // Which union member is valid depends on `kind`
    int64_t native_int;
    struct {
      // 1 if the number is negative, 0 otherwise.
      int negative;
      // Number of digits in the 'digits' array.
      Py_ssize_t ndigits;
      // Read-only array of unsigned digits.
      const void *digits;
    } digits_array;
  } data;

  // Member used internally, must not be used for other purpose.
  Py_uintptr_t _reserved;

} PyLongExport;

PyAPI_FUNC(int) PyLong_Export(
    PyObject *obj,
    PyLongExport *export);
PyAPI_FUNC(void) PyLongExport_Free(
    PyLongExport *export);

skirpichev · September 16, 2024, 2:42pm

Did I correctly understand, that BigDecimal uses 'big' endianness internally, regardless on native byte order?

steve-s · September 16, 2024, 2:48pm

No, the public constructor:

public BigInteger(int signum, byte[] magnitude)

Translates the sign-magnitude representation of a BigInteger into a BigInteger. The sign is represented as an integer signum value: -1 for negative, 0 for zero, or 1 for positive. The magnitude is a byte array in big-endian byte-order: the most significant byte is in the zeroth element. A zero-length magnitude array is permissible, and will result in a BigInteger value of 0, whether signum is -1, 0 or 1.

On my system and with my JDK version this also happens to be the internal representation AFAICS.