PEP 757 – C API to import-export Python integers

pitrou · September 19, 2024, 2:12pm

No, but the fact that the caller may use an undetermined format is a good reason to export the integer in its CPython-native format, since it’s cheaper, and let the caller do the required conversion (if any).

steve.dower · September 19, 2024, 3:36pm

There’s no bytes object involved - it copies directly into the buffer (this is a new API in 3.13, so perhaps you’re thinking of a different one?). It does, however, convert (non-compact) values from the 15/30-bit digits into 8-bit digits, which adds to the time complexity.

The proposal to simply expose the internal buffer does not require a copy until the caller copies it. Which they will have to do, admittedly, and they may have to with PyLong_AsNativeBytes as well if they are transferring into a different bigint library. But those are outside of our control - we can have a “zero copy, but you’ll probably do a copy yourself” API and still call it zero copy.

“The work” is to convert arbitrary digit sized integers into other arbitrary digit sized integers. The libraries already do this, it’s one of their primary functions. All we’re doing is not creating a fourth.

This is a very good point, and it does actually prevent us from returning the address of the compact value and saying that it uses 8-bit digits. Do we add a sign-mag/two’s-comp flag/format enum as well?

pitrou · September 19, 2024, 7:55pm

Would we ever want to use a 2s complement representation for integers larger than 64 bits?

steve.dower · September 19, 2024, 8:00pm

On the basis that we’ve made incorrect predictions in the past, I prefer not to try and predict the future any more, but to plan for all (reasonable) outcomes.

It’s unlikely we’d use any other system, but I wouldn’t want to rule out the possibility forever, given it’s relatively easy to allow for it now (it may be a flag that will never be set in CPython today - bear in mind, we’re talking limited API here, so we can’t really apply YAGNI as aggressively as for internal/normal API).

But it does immediately affect the balance between which values will succeed and which will fail. x = -(2**60); PyLong_Export(x, ...) would have to fail right now, because we don’t have an array of ones-complement digits to return. If we could flag it as twos-complement, then the call could succeed more often. Everyone needs a fallback path regardless, so I’m not massively concerned, but it’s one bit that allows roughly 2**62^[1] values to now succeed, so perhaps worth it?

I forget exactly where the cutoff is for compact values. It’s definitely not using the full ssize_t range. ↩︎

pitrou · September 19, 2024, 8:57pm

Are we talking about the same PyLong_Export that is able to return a int64_t value? Why would it fail?

vstinner · September 19, 2024, 8:58pm

PyLong_Export() exports numbers in the [-2**63; 2**63-1] range (int64_t range) as an int64_t (PyLongExport.value). All compact values fit into this range.

Compact values are in the range [-2**30+1; 2**30-1], numbers up to 1 digit in base 2**30:

$ ./python
>>> import _testcapi

# Compact: 1 digit
>>> _testcapi.call_long_compact_api(-2**30+1)
(1, -1073741823)
>>> _testcapi.call_long_compact_api(2**30-1)
(1, 1073741823)

# Not compact: 2 digits
>>> _testcapi.call_long_compact_api(-2**30)
(0, -1)
>>> _testcapi.call_long_compact_api(2**30)
(0, -1)

Note: PyLongExport.value is a recent addition to the API.

skirpichev · September 20, 2024, 4:36am

Ah, from implementation it looks rather that Java BigInteger’s are using different digits_order, not endianness:

github.com

openjdk/jdk/blob/57b8251241e2044d5039ce162bf4637a9b2e5466/src/java.base/share/classes/java/math/BigInteger.java#L147-L155


      
           * The magnitude of this BigInteger, in <i>big-endian</i> order: the
           * zeroth element of this array is the most-significant int of the
           * magnitude.  The magnitude must be "minimal" in that the most-significant
           * int ({@code mag[0]}) must be non-zero.  This is necessary to
           * ensure that there is exactly one representation for each BigInteger
           * value.  Note that this implies that the BigInteger zero has a
           * zero-length mag array.
           */
          final int[] mag;

That looks unusual, but less shocking for me as using internally non-native endianness for “digits”. We can extend sys.int_info with this new field to handle your case.

steve.dower · September 20, 2024, 9:48am

Oh I missed that addition. It wasn’t around in our previous discussions, it’s new in this thread.

vstinner · October 7, 2024, 1:03pm

It seems like the discussion has settled down and the PEP doesn’t seem controversial anymore, so I submitted PEP 757 to the C API Working Group. Thanks everybody who was involved in this discussion.

encukou · October 7, 2024, 2:43pm

for PyLong_Export:

This function always succeeds if obj is a Python int object or a subclass.

This means we can never deprecate this function. We might want to do that if, for example, we introduce an internal format that doesn’t fit in the struct PyLongExport parameters.
I suggest removing the line.

encukou · October 9, 2024, 8:31am

I wonder if we could use the return value for the “kind” (union tag). This way, users can switch on a single number to distinguish between the compact/digit-array/error cases, and we can be a bit more future-proof for essentially free.
That is, PyLong_Export would return:

-1: error; an exception was set.
1: Data is in int64_t native_int. Calling PyLong_FreeExport is not necessary but it is not an error.
2: Data is in the digits_array struct. Call PyLong_FreeExport when done.
an unknown positive value: some future format was used. Callers can choose to:
- (be future-compatible:) call PyLong_FreeExport and fall back to e.g. PyLong_AsNativeBytes; or
- (need to adapt if future CPython changes:) fail. Call PyLong_FreeExport if it wasn’t a fatal error.

with:

typedef struct PyLongExport {
  Py_uintptr_t _reserved;

  union {
    int64_t native_int;
    struct {
      Py_ssize_t ndigits;
      const void *digits;
      uint8_t negative;
    } digits_array;
  } data;
} PyLongExport;

(Future formats might theoretically need a bigger struct, but it’s not likely to happen IMO. We can add a PyLongExport_v2 if it does come up.)

Why?
There are compiler-support issues with anonymous unions, but I don’t know reasons against unions themselves.

pitrou · October 9, 2024, 1:29pm

That sounds like a good idea to me.

Petr Viktorin:

typedef struct PyLongExport {
  Py_uintptr_t _reserved;

  union {
    int64_t native_int;
    struct {
      Py_ssize_t ndigits;
      const void *digits;
      uint8_t negative;
    } digits_array;
  } data;
} PyLongExport;

Nit: it’s a bit weird for the _reserved field to be at the start of the struct.

skirpichev · October 10, 2024, 3:30pm

In principle, we can. Using Py_DEPRECATED() macro, after a deprecation cycle…

Perhaps, the problem is that PyLong_Export() now trying to solve different tasks:

Export small integers. Nobody cares - it’s a solved problem, we have a lot of existing functions.
Fast export for big integers. This API is missing in the current CPython.

It’s essential, that reading API shouldn’t fail (just as conversion works now, using private functions). The only reason to fail, maybe: a new kind of export in the new release.

The PyLong_Export() API has PyLongLayout structure to keep us freedom change internal representation for big integers in any release. Which bigint library uses something else than this? Here I don’t see reasons to fail at all.

For small integers - maybe an opportunity to fail makes more sense. Though, some optimizations for values, that don’t fit in int64_t looks rather hypothetical.

@encukou proposal looks as an overkill for me. But as it allows us to use constraint “export doesn’t fail” — I buy it and will be happy to adjust PEP and implementation. However, I would appreciate to hear some other opinions from C-API WG members before.

PS:

Why not move it to digits_array? Currently it’s reserved for this member of the union. IIUIC, the whole structure size will be same.

pitrou · October 10, 2024, 5:39pm

Usually you don’t know up front whether an integer is small or not. So it’s useful for PyLong_Export to handle that case too.

encukou · October 11, 2024, 8:56am

Exactly. We only have unstable API for that (PyUnstable_Long_IsCompact).
IMO, PEP 757 export should both give you the internal representation, and tell you what representation it is. I think it makes a lot of sense to combine that in one API call – I don’t see a use case where you’d need just one of those.

It’s also used to distinguish between digits_array and native_int cases (i.e. if it’s NULL, PyLong_FreeExport won’t decref the integer).

skirpichev · October 11, 2024, 9:18am

(I assumed we offer API (like PyUnstable_Long_IsCompact) to check that.)

No. The digits used instead. The _reserved field kept reference in case digits!=NULL.

encukou · October 11, 2024, 9:33am

I assumed we’re still talking about my proposal, where digits is inside the union.

skirpichev · October 11, 2024, 9:54am

This too) In that case we dispatch to the right member of the union by the return value of PyLong_Export(). The _reserved field does make sense only together with digits.

pitrou · October 11, 2024, 10:00am

No, the _reserved field is for future additions to the top-level struct and/or future union alternatives. It’s not for additions to the digits_array sub-struct.

skirpichev · October 12, 2024, 10:02am

With this we can introduce new formats with new release. So, why not start simply with digit_array only?

The PyLong_Export would return:

-1: error, only possible for non-int’s.
0: success, data is in the digits_array struct. Call PyLong_FreeExport when done.
a positive integer, that means some future export format. Callers must call PyLong_FreeExport and then either (1) abort or raise an exception or (2) fall back to PyLong_AsNativeBytes. Later case does make sense if this call will not fail.

with

typedef enum {
    PyLongExport_Error = -1,
    PyLongExport_DigitArray = 0,
} PyLongExport_Kind;
typedef struct PyLongExport {
    Py_uintptr_t _reserved;
    union {
        struct {
            Py_ssize_t ndigits;
            const void *digits;
            uint8_t negative;
        } digits_array;
    } data;  /* I really, really would like to avoid this :( */
} PyLongExport;

gmpy2 code will look closer to the current state, i.e.:

static void
mpz_set_PyLong(mpz_t z, PyObject *obj)
{
    /* here obj is PyLongObject */
    static PyLongExport long_export;
    /* up to user trying first optimizations
       with PyLong_AsLongAndOverflow() */
    PyLongExport_Kind kind = PyLong_Export(obj, &long_export);
    switch (kind) {
    case PyLongExport_DigitArray:
        mpz_import(z, long_export.ndigits, int_digits_order,
                   int_digit_size, int_endianness, int_nails,
                   long_export.digits);
        if (long_export.negative) {
            mpz_neg(z, z);
        }
        PyLong_FreeExport(&long_export);
        break;
    default:
        PyLong_FreeExport(&long_export);
        abort(); /* (1) new release offer new export format */
        /* or (2) fall back to PyLong_AsNativeBytes() */
    }
}

As default case will handle small integers, probably we can bound here requirements for temporary buffer. Then call to PyLong_AsNativeBytes will not fail and option (2) does make sense.

Edit: implemented in PEP 757: edits, based on C-API WG feedback by skirpichev · Pull Request #4026 · python/peps · GitHub