PEP 791: imath --- module for integer-specific mathematics functions

PeterL · May 20, 2025, 6:40am

Sympy already has a vast selection of functions in sympy.ntheory
What’s the benefit in bringing the suggested and desired functions to core?
Many recent arguments have been around the maintenance burden of core libraries (PEP 594 - Removing dead batteries)

pitrou · May 20, 2025, 7:12am

Ok, here is a suggestion:

The math module contains mathematical functions that can be called on integral and real numbers.

..note:: Dedicated modules exist for specific purposes, such as statistics for statistical functions, and cmath for functions on complex numbers.

skirpichev · May 20, 2025, 8:25am

This is true. But SymPy is a library for symbolic mathematics, rather than arbitrary precision arithmetic. This, together with some number-theoretical functions come with backends like the gmpy2. As a fallback — Python builtin int’s and stdlib’s functions (and Fraction class) could be used.

So, SymPy will rather benefit if the stdlib will have more such functions. In past, SymPy had own implementations for isqrt(), lcm() and even gcd(). Looks like reinventing the wheel.

Same for the mpmath: it has a small private mpmath.libmp.libintmath module mostly wrapping the gmpy2 functions and provide missing (on the CPython side) batteries, if the gmpy2 isn’t used.

I don’t think there are problems wrt maintenance with current content of math-related modules. Of course, maintenance cost might be an argument against new functions. But it’s so regardless from the module split.

Well, sometimes on complex numbers too (prod/sumprod almost useless in such cases, but they can consume such input). But sometimes they can’t be called on real numbers:

>>> math.factorial(3.14)
Traceback (most recent call last):
  File "<python-input-2>", line 1, in <module>
    math.factorial(3.14)
    ~~~~~~~~~~~~~~^^^^^
TypeError: 'float' object cannot be interpreted as an integer

pitrou · May 20, 2025, 8:39am

Perhaps, but so what? We are not building a thesaurus, we are writing a documentation to describe a module’s contents. It just has to be reasonably accurate and informative.

skirpichev · May 20, 2025, 9:19am

If emphasis means “some functions can be called with on real numbers, some can’t and some could be called on complex too” — I would agree that such description is accurate. Though, I doubt it’s more informative than “The math module contains some mathematical functions.”

oscarbenjamin · May 20, 2025, 10:22am

SymPy is pure Python and either implements these functions in Python or chooses between implementations from stdlib, gmpy2, etc. Various things like gcd are just very important functions that need to have a good fast (in C) implementation and are usable much more widely than just as part of SymPy. As @skirpichev says SymPy now uses math.gcd in place of its previous Python implementation and would likely do so with any other functions that are added in the imath module.

The dead batteries are things that are superseded or outdated formats etc. Functions like gcd are not going to become superseded or outdated and after being written have a low maintenance burden.

steve.dower · May 20, 2025, 10:41am

So it’s a proposal to refactor, with no other benefits? That makes for a weak PEP. Surely you can find some usability benefits from separating functions by module name rather than by function name?

oscarbenjamin · May 20, 2025, 10:43am

This is perhaps further off topic but maybe gives some sense for how I would imagine an imath library being and why it should be separate from the float math module.

I think that the stdlib should provide contexts for e.g. integer arithmetic to control the maximum size of integers. You should be able to do e.g.:

from imath import context
ZZ = context(bits=1000)
ZZ.pow(10**6, 10**6) # OverflowError

The sys.set_int_max_str_digits function is a poor implementation of a global context. Its existence demonstrates why having contexts that can control limits for integer size are needed.

I also think that imath should provide all the other arithmetic functions for integers such as add, multiply etc. The Decimal module contexts provide these and most other mathematical libraries provide these functions although not with consistent names/interfaces (numpy.add, mpmath.fadd, decimal.Context.add, gmpy2.add, …). There are various different reasons that these functions are needed rather than just operator.add but the ones that apply to int are that it should have the right coercion properties and as a context method it could limit the bit size. Specifically you want a function like:

def add(a: SupportsIndex, b: SupportsIndex) -> int:
    return index(a) + index(b)

This is a function that always coerces towards int whereas a + b or operator.add will always coerce away from int. This kind of coercion is what is needed to be able to mix compatible types from different places while still working in a well defined domain. The context .add method could do the same but checking the bit size and raising OverflowError if needed.

NeilGirdhar · May 20, 2025, 3:07pm

Please don’t do this. This makes int into a monolithic class. Ideally it should have as few methods as possible.

guido · May 20, 2025, 3:09pm

No, for two reasons:

A global flag for this would be terrible (you could break a library that needs the flag set to a different value).
Possibly a contextvars context would be less horrible, but cmath was introduced decades before that existed. And even this would have the same problem if you invoked a library with different needs – you can’t expect code that was written for real numbers to work with complex numbers (e.g. < doesn’t work).

But yes, this is horribly OT.

It’s not a maintenance burden for the core devs. It’s annoying churn to make everyone using these switch to imath or math.ntheory.

Honestly I’m also not keen on the submodule idea, it seems a rather huge change of what import math means.

Several folks beside me have suggested a 3rd party package on PyPI. What’s wrong with that, @skirpichev?

oscarbenjamin · May 20, 2025, 4:53pm

There already are third party packages on PyPI that have vastly more than the few integer functions that are being discussed here (@skirpichev maintains some of these). This PEP is specifically about the organisation of the functions that are in the stdlib.

The implicit issue here is the expectation that there could be more integer functions in the stdlib than just the few that are currently there in the math module. If an imath module had been created 10 years ago when gcd was added then I think that by now it would already contain more functions than the ones we are talking about right now.

guido · May 20, 2025, 5:15pm

Note that you slightly misquoted me: Immediately after the quoted sentence (in the same paragraph) I asked “What’s wrong with that, @skirpichev?”

Now you aren’t Sergey, but my question still stands, unanswered by your reply.

My position is that new functions are better off in one of those PyPI packages (or a new one) than in the stdlib. So I see the current proposal as just a rearrangement of existing functions, and it appears that the extra work required to split it up (both the CPython code churn to split up the math module and the work that users will have to do eventually due to the deprecation of the aliases in math) is just work without any user benefit.

Nobody has explained to my why we should add to the collection of number theoretic functions already in the stdlib. We can’t remove them (and I would be against deprecating them, since they are being used), but I think that new functions shouldn’t be added here but to PyPI.

pf_moore · May 20, 2025, 6:52pm

Even though I’m often the first person to say “why can’t this be a module on PyPI?” I do think there’s a significant difference between being in the stdlib and being on PyPI. We’ve made huge advances in terms of making it easy to use PyPI modules in your code, but I still feel there’s value in the “batteries included” philosophy - the step in complexity from “stdlib only” to using that first PyPI module isn’t trivial even now.

I’m not advocating for any particular integer / number theoretic functions to be added to the stdlib right now, but I don’t think we can reasonably say that it will never happen.

As for the proposal to give the current functions their own module, to an extent yes, it is just about reorganisation - but good organisation does have value. Not always enough to justify the disruption that the change requires, but sometimes it can be. In this case, I personally think it’s worth it - but that’s because I believe we probably will keep adding integer functions (based on the evidence of past additions), so there’s value in doing the reorganisation now, rather than waiting until the problem gets worse.

storchaka · May 20, 2025, 8:05pm

Because they are used in the stdlib and in small user code.

For example, gcd() is used in the fractions module. Originally it was in fractions, but it was not were users expected to find it. One of reasons of adding math.gcd() was that it is more appropriate place.

There is other function, _divide_and_round(). There are several implementations of it under different names in the stdlib. All of them are private and well hidden, so they perhaps not widely used in the user code. Users most likely reinvent it or use incorrect code like round(a/b). A standard fast function would benefit both the stdlib and the users.

When I said “small user code”, I meant code that fits in a single file, one-time code, or even code used in the REPL. I suspect that comb(), perm() and lcm() are mostly used in such code. I can’t count how many times I’ve written a dumb implementation of isprime as any(1 for k in range(2, n) if n%k==0). It would be nice to have something faster and easier to type in the stdlib.

Adding functions to PyPI does not help in using them in the stdlib and in user code that does not use PyPI. It does not help in discoverability and accesibility.

vstinner · May 20, 2025, 8:06pm

Maybe PEP 791 should not soft deprecate math functions and leave them unchanged (keep aliases in the math module). IMO the most important part of PEP 791 is to create a new home for new integer functions, rather than moving existing ones.

Even if PEP 791 itself doesn’t propose adding any new function (at least for now). Adding ceil_div() seems to be non-controversial for example. Maybe PEP 791 should add it?

guido · May 20, 2025, 9:38pm

Of all the examples I only understand gcd. So I will leave the rest of this conversation to others. Have fun!

tim.one · May 21, 2025, 2:32am

The functions in question aren’t “shiny Internet objects du jour” - they’ve been defined in nearly all cases for centuries, and not the slightest reason to expect any will ever be subject to the whims of fashion. or rendered obsolete by “better” technology.

For an “obscure” example S2(n, k), - Stirling numbers of the second kind - give the number of ways a set with n elements can be partitioned into k non-empty subsets. That’s something fundamental to know in advance for all kinds of load-balancing and bin-packing kinds of problems. Indeed, I trotted those out on StackOverflow within the last week, as part of answering an electrician’s question about how to tackle a problem involving balancing loads across power distribution centers.

I wish I could have used plain old Python to answer his question, and did - but pasted in my own implementation of the function. That’s harder for the questioner to trust than something that comes with the language.

They’re not a “computer person”, and don’t want to be. Suggesting they install SciPy or Sympy or mpmath too - let alone some relatively unknown package from PyPI - would have been rejected. They just want to solve their problem, and many “plain users” are leery of installing software.

In any case, no, there’s no chance that centuries-old foundational functions will become “dead batteries” next year - or in the next century.

tim.one · May 21, 2025, 2:54am

For the same reasons we added the ones already there: they come up often in discussions and StackOverflow questions ,etc. Perhaps not in discussions you pay attention to, but certainly in ones I participate in (and the others who’ve made substantial contributions to math).

They’re some combination of frequently requested and frequently poorly implemented by non-experts (who write their own rather than search the web for gigantic packages (SciPy, SymPy, mpmath) that contain far more than they’re after. Or pester people like me to write the functions for them.

The current request that re-awakened the push for an appropriate module is for “ceiling division”. math.ceil(i / j) is simple and clear, but returns gibberish results if the ints are “too big”. (i + j - 1) // j is obscure and can give a wrong answer depending on signs. -(i // -j) is correct in all cases but far from obvious (and is usually “better” than the mathematically equivalent -(-i // j) for a reason so obscure I bet you haven’t guessed it yet ).

You don’t have those kinds of discussions? Some of us have them eternally, year after year after year. It’s easy to end them by giving users a standard imath.ceildiv(i, j) function that’s correct in all cases. not to mention faster.

Or we could stuff it in math, and let users sort out how and why math.ceil isn’t what they’re after.

guido · May 21, 2025, 3:40am

Okay, okay! You don’t have to convince me, just the SC.

Raymond has a standard answer for this for iter tools: a chapter with examples of how to do oft-requested functions that you can do by combining existing functions.

skirpichev · May 21, 2025, 3:56am

Maybe not, in fact;) Besides exception behavior (propagate special values vs raise a ValueError) — there are other things, that might be customized with some context notion for the math module. For example, rounding modes. This is something that supported by underlying libm (per C standard), but not exposed yet in our math.h wrappers.

Of course, such interface (i.e. context) has much less sense if the math module is a mix of functions from different application domains… Though, a different context notion might be helpful, as @oscarbenjamin noted, for integer-specific functions (instead of current globals in the sys module).

But nobody forces to do such switch! Old code will not be broken.

IMO, most arguments (if not all) come from this “backward incompatibility”. Can we abstract ourselves from this and discuss “little benefits”, that come with new layout? Say, you have two proposals: 1) new module math with it’s present shape, with current docs and 2) same set of functions, but coming as two distinct modules. Which you will prefer, if we start from scratch in above way?

I think anyone, which prefer 1) should propose some practical suggestions for fixing current docs. (So far I’ve see one such attempt here and it was not successful, IMO.) It’s not just misleading note about C libm’s in the preamble. We essentially have here two camps of functions: for two different application domains, with two different calling conventions, with two different behavior (exact and inexact).

Sorry for a late answer, it seems @guido left discussion. Anyway, I’ll try to answer.

Maybe I don’t understand well the purpose of the stdlib or new rules. Is this a new requirement to start first with a PyPI package and if it got popularity (how it should be measured?) — it’s content could be added as a part of the stdlib?

Well, in some sense — such packages were mentioned. Both the SymPy (or Diofant) and the mpmath have own pure-python implementations for most of proposed functions (SymPy can’t use mpmath, because it’s integer library functions are private). And they can be found in other projects as well. Indeed, why not reinvent the wheel, it’s sooo easy?! (And error-prone.) Here e.g. the GH search for the gcdex().

I think that every number-theoretic function, that is “simple enough” to be efficiently implemented and is widely used across different projects (including the CPython itself: it’s not in above search query, but the gcdex() is there too!) — could be a good candidate for the stdlib.

BTW, new C standards also come with new functions in the libm. The math module could be extended in this way too.

Edit:

What’s a difference? Aliases kept. We don’t announce any deprecations yet.

Soft deprecation is just a kindly suggestion: please don’t use these functions from their old home in new projects.