PyLongObject: just use GMP?

The CPython uses own implementation of big integer arithmetic. It’s mature and stable, but algorithms usually not the best, known in the field. It seems sometimes we already pay a high price for this, see e.g. recent story with the integer string conversion length limitation.

Of course, this situation might change. The CPython could use better algorithms (implemented in C or even in pure-Python). But someone could argue, that it is not a right project for this. Why not depend instead on some external library, that does same and have more appropriate community?

The GNU GMP was proposed for this several times in the past (see PR#46139 and PR#66121. Maybe it’s time to try this last time? IIUIC, major non-technical concern about using the GMP was the license.

Licensing issues are beyond of my comprehension, so I would like to see Yes/No-type answer here. Is it really stops us? This seems to be a non-issue for many projects with liberal licenses (e.g. Julia) or even for proprietary software (as Wolfram Engine or Maple).

Major technical problem with using the GNU GMP is it’s “memory management”: when GMP encounters a memory allocation failure, it will just abort the program. But my homework shows, that this issue could be solved by using low-level mpn_*() GMP’s functions together with custom memory allocation (using setjmp/longjmp to recover from allocation failures). The ZZ library demonstrates this approach, providing an alternative interface to the GNU GMP with the Libtommath-like API.

On top of the ZZ library, the python-gmp extension was implemented. I think it is developed enough to play with: the extension was tested by CI tests of some mathematical packages like the mpmath or the diofant (SymPy’s fork). In principle, same trick could allow to use the GMP for integer arithmetic in the CPython core.

Wait! But we already have the gmpy2 package. Or the python-flint. Ah, and Sage integers. (Did I forget something from the zoo of Python’s interfaces to the GMP?) All suffer from mentioned above memory management problem, but probably it’s not a big deal for an external extension. Why not simply suggest these packages to people who care about big integers?

I believe, there are reasons: using external integer type give severe speed loss for operands of small size (~ machine integers), some optimization techniques aren’t available for extension types. But practical integer-heavy applications usually use operands of different sizes. Taking the mpmath CI tests as a poor-mans benchmark we see that using the GMP shows a noticeable speedup (on CPython 3.14): 6m50s without vs 4m42s with. On another hand, the later result is close to the PyPy3.11 (4m59s) without GMP (and much worse on PyPy with GMP). These results explained by statistics on operands size for mpmath’s tests: ~95% operands are integers of less than 10 bits, ~99% - less than 53 bits. I would say it’s a typical situation for an application like computer algebra system or arbitrary arithmetic package: lots of small integers, but we will still might have a net win from using GMP due to better timings for a small set of “big enough” integers.

With the GMP we will have best performance for operands of all sizes, less maintenace burden for core developers, some simplification in the Python ecosystem (projects like SymPy, mpmath or Sage could unconditionally use builtin integers). Of course, the GMP is not only option (here is an incomplete list of arbitrary-precision arithmetic software, the Libtommath was already mentioned). Though, it looks to be most popular, developed and optimized, it’s performance hard to beat.

1 Like

Yes. The LGPL may be compatible at integration, but we can’t move sources between them (i.e. we can’t copy anything into CPython sources), and many contributors, distributors and users follow more strict rules for anything *GPL than for PSF-approved licenses.


As well, machine-sized integers are the main ones we have to worry about - there’s a reason we use singletons for the smallest couple hundred integers and not more. We’d also still love to get to tagged pointers, and adding an abstraction layer that we can’t modify is going to make that harder.

The principle here is that people who deal with huge numbers are probably thinking a lot about numbers and so are better positioned to find a better type (including Decimal). People who don’t think about numbers at all are largely dealing with small numbers, and so should have a good default experience.

Python’s large integers work, but if they’re too slow, then you’ll only know that if you care enough to measure, which means you care enough to do something about it. Those who don’t want to care shouldn’t have to look for an alternative library to improve the performance of numbers 0-1000.

8 Likes

This question has been posed before. The reality is that Python integers are rarely in practice actually large (ie: i’ll pick “less than 100 bits” as an arbitrary delineation) in normal programs.

People actually needing large/huge integers in a performance critical manner already reach for relevant third party PyPI packages.

GMP or similar external bigint math libraries would likely be more of a maintenance burden because it is a huge third party dependency with maintenance and security concerns of its own. Use of those could also complicate PyLongObject and optimization work that could seek to avoid most our integer internals for smaller values entirely.

The proof is in the pudding though, try it: replace PyLong’s implementation with an external bigint library, get the test suite to pass, and run it through pyperformance benchmarks. I’d be surprised if anything other than code microbenchmarking bigint performance saw an increase in performance.

6 Likes

Oh, bad news. But I think there is no need to copy GMP sources. We don’t copy OpenSSL sources, right?

  1. I don’t think that dependency on the GMP prevents optimizations on small integers. Currently we essentially have different code-path for big integers and small integers. The external bignum library (e.g. GMP) can handle the first case.
  2. The point is that this come with a tradeoff: better asymptotic performance with an external integer type will cost a huge speed loss for small integers, which are common even for math-heavy applications. I don’t think this can be mitigated, e.g. something like bytecode specialization will not be available for external integer types.

Why? E.g. in the long_mul() we will replace k_mul() and friends with a call to the library function. In this way we get rid off all code, that handle many-“digits” integers.

This is a lot of work :wink: — that’s why the thread was started first. The plan was to do this with GMP, but it seems one ruled out by license. Or not?

1 Like

We copy enough that the license matters, yeah. And core devs refer to the OpenSSL sources and then write code in CPython, which is at risk of being seen as copying (I don’t have legal references on hand, but lawyers I’ve worked with in the past have warned that if you can prove that someone referred to copyleft code and then wrote something similar you might prove that they are in breach of the license).

Sure, but when you want to do a big project, the work (and proof) deserves to be done first.

Who knows, maybe your fork with the alternate implementation will become more popular than the upstream without? If it doesn’t exist, you’ll never know :wink:

4 Likes

Well, I trust your judgement, that using GMP is no-go. (At least, I haven’t seen here other opinions.) But I don’t see how using public API of the library could be seen as copying. The point is using the library as a black box, that does clever things with numbers.

Fair enough, unless the work will not be then rejected on non-technical ground (i.e. license terms).

Sound like a good evil plan! :slight_smile:

1 Like

I don’t think the situations can be readily compared.

OpenSSL is an optional dependency for a couple extension modules. Your proposal of using GMP would put it in a critical place of the core interpreter (the implementation of one of the most fundamental built-in types). As @steve.dower and @gpshead mentioned, the performance of small integers [1] is extremely important and implies tight cooperation between the integer implementation and other parts of the interpreter. Chances are that aspects of GMP would “leak” into other parts of the code and it’s not obvious what the licensing consequences would be.


  1. Not wider than a native machine word ↩︎

3 Likes

Licensing under any version of the GPL is always “a problem”. It’s no coincidence that the CPython core doesn’t ship with anything of the sort, not even under the purportedly “friendlier to business” LGPL. That’s partly our decision, but also to spare users from having to pay their own lawyers to guess whether they can get away with using the CPython core without attracting legal hassles from the FSF.

But I’d love to free the CPython core from its own “giant int” algorithms.

Alas, a compromise would be even more complicated: keep our code as-is, but defer to GMP if the ints are “really big” and the user installed GMP themselves (akin to how mpmath uses gmpy2 if it’s available).

I’d say that would at least spare us the effort of trying to speed our own code for this stuff - except that CPython has attracted very few devs with an appetite for such work. Last time that “got serious” I recall is when some of us (like @mdickinson and @rhettinger) put a relatively obscene amount of effort into speeding math.comb, and GMP still ran circles around us in many cases :rofl:

1 Like

and

Hi,

In 2008, I abandoned my research on using GMP in Python because of performance, not because of the license. The core of my change was the usage of mpz_t in PyLongObject:

 struct _longobject {
-	PyObject_VAR_HEAD
-	digit ob_digit[1];
+	PyObject_HEAD
+	mpz_t number;
 };

Creating a PyLongObject requires a memory allocation on the heap memory, but mpz_init() (used to initialize the mpz_t number) does a second memory allocation.

I spent more time on optimizing the code and adding caches, but at the end, it was still slower than the reference (not using GMP).

What solution do you propose to avoid the GMP overhead for small integers? Most Python use cases deal with small integers: up to 64 bits on 64-bit platforms. Users dealing with large integers can already opt-in for a third party module such as gmpy2.

6 Likes

Right, that’s the difference. But I have also other non-optional LGPL dependency on my system:

$ ldd `which python3.14`|grep libc
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f344140b000)

Well, now there are e.g. BSD-licensed libc on Linux. Was such option always available?

I don’t see how. Small integers, probably, will require a separate treatment, like now in the CPython (or as in the FLINT). My library uses GMP only to handle large integers and only via primitive mpn_* API.

Well, if situation with the glibc is acceptable, all we need — an alternative with same API (as for bindings to the GMP) and a more liberal license.

Yes, I know. Such issues were actual also for other mentioned project. But note, that things were changed and now we have essentially a separate implementation of arithmetics for small integers.

I don’t use high-level mpz_* API at all (well, except for few functions right now), this level re-implemented and has different memory management, independent from the GMP. Partially, because of more complex memory management for mpz_* functions: mpn_* functions allocate memory only for temporary scratch space, in LIFO way, input and output assumed to be allocated by user. Another reason: have freedom to interpret (and handle) same structure differently (small integers).

One possibility is to have same structure as now, and same special code paths for small integers. For the rest: call mpn_* functions, that work on pre-allocated arrays of “digits” (or “limbs” in GMP terminology). The difference (wrt current state) is that GMP’s digit’s are 64-bit (in most cases) and have no unused bits (“nails” in GMP):

typedef struct _PyLongValue {
    uintptr_t lv_tag; /* Number of digits, sign and flags */
    digit ob_digit[1];  // the difference: digit will be alias of mp_limb_t :-)
} _PyLongValue;

That’s the point: they pay price of severe speed degradation for small integers. And that can’t be fixed in extensions like gmpy2.

1 Like

I’m not a lawyer, but it can reasonably be argued that CPython isn’t specifically using the glibc, it’s using the standard C and POSIX APIs. The glibc is just an implementation thereof, but you can use musl instead (even on Linux). So, while a standard C library is not optional, the glibc is.

On the contrary, the API implemented by GMP is GMP’s own API, not a standard with several implementations available (AFAIK).

I see. That’s a possibility indeed, I hadn’t thought about that.

However, that will still GMP a required dependency, unless CPython also keeps its current bigint algorithms as a fallback. So it will definitely increase maintenance complexity.

2 Likes

CPython already special-cases 'huge ints" in some cases:

  • Multiplication switches to homegrown Karatsuba multiplication. O(b^{1,58}), where b is the number of bits, much faster than quadratic, but much slower than GMP’s hierarchy of much more advanced methods.
  • str<->int conversions.
  • Division.

The last two are also limited by multiplication speed, and are coded in Python (Lib/_pylong.py). Interpreter overhead barely matters at all - the asymptotics overwhelmingly dominate. “Almost all” the time is spent inside multiplication.

Time to convert to/from GMP ints would also be minor in these contexts. That’s O(b), with a small constant factor. Both use “a large” power-of-2 base internally (30 bits for CPython, 64 for GMP). Conversion is just a matter of fast word-level bitwise operations.

GMP’s format is inherently more efficient (over twice as many bits consumed per HW instruction), but it’s impractical for CPython - portable C doesn’t offer efficient ways to deal with carries. GMP writes hand-crafted assembler thar can exploit HW flags not exposed in C, and “high order bits of product” HW instructions.

Python uses a “limb size” of 30 bits so that a product of 2 digits + predictable worst-case carries from an addition or two always fit in 64 bits. On 32-bit boxes, Python uses a limb size of 15 bits instead, for similar reasons.

Note that I’m happy to use gmpy2 explicitly instead in my own code, so there isn’t an itch here I personally want to scratch. Another option in the core is to use decimal with very large precision. Stefan Krah did implement very advanced multiplication (O(b \log{b})) for decimals.

Hilariously, though :wink: , converting int<->Decimal remains quadratic time.

2 Likes

Oh, I didn’t realize that API is copyrighted and no one can use function, declared like

mp_limb_t mpn_neg (mp_limb_t *rp, const mp_limb_t *sp, mp_size_t n)

IMO, it’s hard to invent different primitives for arithmetics over natural numbers :wink:

Well, such things don’t stop Maplesoft from using GMP in a commercial product.

Or will use a different library as a fallback. The ZZ library has API, which is close to the Libtommath. It could be made compatible. Or it just can include CPython-like implementation as a no-GMP fallback.

In principle, you could try portable variant of code (--disable-assembly). I’m not so sure if it will be beaten by CPython code for non- machine-sized integers. Unfortunately, it’s not easy to check: implementation of CPython integers is tightly coupled to the core and doesn’t exists as a C library.

Note also, that the Libtommath is portable and uses similar to the GMP layout of digits (either 32-bit or 64-bit).

1 Like

It’s significantly more complicated here, but ultimately does come down to your last point (which is what the Google v. Oracle case pivoted on). The realistic risk we’d face in CPython is to write a wrapper that looks like that API but dispatches to alternate implementations may be considered to have been copied from the GPL code, and so it (and anything distributed with it) then also has to be GPL. Whether that would actually happen or actually be enforced is debatable, but the risk of such a debate requiring expensive lawyers is reason enough for us to just avoid the risk entirely.

Ironically, the GPL was actually designed to encourage commercial products to use such licensed code, as long as the commercial entities who modify it make those modifications available to the community (read the license carefully :wink: ).

There have been attempts in court to redefine a “required dependency” as a transformative integration, which would cause the copyleft rules to apply (I don’t think any have been successful, but they have been expensive nonetheless), and so keeping it as an optional dependency is clearly safer.


Ultimately, these are all just risk management, which means if the benefit is there, we can negotiate the risks. So if I were you, I’d treat the feedback as “demonstrate the benefits so we can discuss the risks in context” and “we’re not going to preapprove something when the risks are known and the benefits are hypothetical”.

Which means if you’re really passionate about seeing this happen, then do it first and show off the benchmarks and implementation. We have way more people who can evaluate those than who can reasonably hypothesise about a potential future.

4 Likes

Understood. Though, this looks rather strange for me. Like copyrightleft on arithmetic of integers.

IIUIC, it’s GPL requires to open the whole product, if it works with GPLed code (even via dynamic or static library). What you described above holds for LGPL, not GPL. And it’s why major closed source computer algebra systems (well, at least Mathematica and Maple) use the GMP.

“Show me the code” (c). Ok, got it, will try.

Unfortunately, it seems there is no way to show benefits without full-blown integration. (As I said before, external extension will be unable to beat performance of CPython’s ints for small integers.)

1 Like

I think LGPL means that people embedding Python in closed source programs would need to dynamically (rather than statically) link to GMP. I’m not sure if that’d represent a significant performance cost or not. But essentially downstream users would need to be able to substitute a modified version of GMP.

2 Likes

Yes, probably using the library from system, e.g. from Linux distributive. Or include dynamic library in the program distribution. Here’s WolframEngine tree:

$ ls -d1 /usr/local/W*/W*/14.3/Sys*/Lib*/Linux*/{*gmp*,*flint*}
/usr/local/Wolfram/WolframEngine/14.3/SystemFiles/Libraries/Linux-x86-64/libflint.so.17
/usr/local/Wolfram/WolframEngine/14.3/SystemFiles/Libraries/Linux-x86-64/libgmp.so.10

(Though, I’m not sure that static linking is “forbidden” in some sense.)

GMP manual says:

What’s the reason to do this?

Because they have the right to under the LGPL. So they don’t need a reason - they just can.

2 Likes

I’m only answering to the technical portion of this thread, I have no opinions on desirability/licensing aspects.

yes, I think this is what Racket does. They use the low-level APIs and handle the memory allocation themselves. I don’t thing setjmp/longjmp are required, since mpn_* simply requires that the memory for the results gets handed in as arguments. That means some logic of the implementation (“how many digits do I need for the result of this addition”) needs to be in the glue code. This approach should make it possible to keep the memory for the limbs inlined into the PyObject struct. The Racket code is here:

2 Likes

Unfortunately, it seems to be the case.

The mpn_* API assumes that output arguments have enough space and this stuff is handled by user, but GMP does additional temporary memory allocations (using same memory API as for the rest, see GMP manual, 13 Custom Allocation) in some functions (not all, Racket approach does work for addition), for example in mpn_mul() (grep TMP_* macros).

We need to track this somehow to prevent memory leaks (and segfaults, per default GMP will call abort() on memory failures): setjmp/longjmp is an option, I will appreciate better variants.