Split out integer functions from the math module

tim.one · May 10, 2025, 11:55pm

Far as I know, those have never been suggested, let alone asked for.

There are relatively very few “math modules” in the top-level module hierarchy. So “flat is better than nested” tends to rule. Until “flat” gets so very broad that “namespaces are one honking great idea” steps in to introduce some clearly beneficial separation of spaces.

>>> import math
>>> len(dir(math))
67

The flat math module is already quite “broad” Splitting the int functions out would help combat that bloat. But doesn’t matter to that whether the new module is top-level or sub. I prefer “top-”.

It’s not a slippery slope - splitting out the int functions has been a topic for years already, and there are no other candidates in sight.

BrenBarn · May 11, 2025, 12:59am

I don’t find any of these convincing.

These are doc bugs and can be fixed regardless of the module organization. In my view the “simplicity” of having a note at the top that says every function returns a float is of minimal convenience. It would be better to just remove it and make every function’s documentation self-contained, and then there would be no lies.

Sort of tangential, but I think these kinds of doc-rot problems can indicate process failures at other levels — for instance, a lack of rigor in ensuring that no behavior changes get merged without accurate docs. Every time someone tried to add one of these integer functions to math, the change should have been blocked until the docs were made correct.

I don’t see this as a big deal. As you noted in a later post, there are only 67 functions in math right now, which wouldn’t increase to an unmanageable number even if we added more integer functions. Also, what you get as completions after you type “c” (or whatever prefix) is likely to have as much to do with coincidences of spelling as with the return type. When I’m looking for cosine I probably don’t want to be bothered with copysign either, but that’s neither here nor there.

We can end that argument by just saying that’s not what it’s for (even if at some point in the past that’s what it was intended to be).

Personally I’m against framing the scope of library modules in terms of implementation details like whether the functions are provided by a lower-level C library. They should be framed in terms of user affordances, e.g., you use math if you want to. . . do math. If we want to split them into submodules, that can be done according to domain-relevant distinctions — different conceptual subdivisions of math, not different implementations. It might make sense, for instance, to split combinatorics-type functions into a submodule. (I like the idea of submodules, incidentally.)

bwoodsend · May 11, 2025, 1:41am

If we really did start splitting modules up based on return type then I would consider it misleading to keep calling math math instead of something that indicates floats. ^[1]

Yes, you could argue the same for cmath but I don’t believe many people really expect complex numbers to not be special ↩︎

tim.one · May 11, 2025, 3:04am

Nobody has suggested that. We’re splitting modules by application domain. It so happens in the case under discussion that the math module functions associated with “number theory” (hence @skirpichev’s favored ntheory name) are essentially those that return ints, while those not associated with number theory return floats.

More generally, a “number theory function” is a mapping from the positive integers to complex numbers, although all those Python implements so far return ints. An example of one that returns floats is the von Mangoldt function.

If Python did implement that, it would also be a poor fit for math, but right at home in a module devoted to number-theoretic functions.

BTW, in effect, we are splitting by type, but by the type of the domain rather than by the type(s) of results. The “number theoretic functions” are all and only those that raise TypeError if you pass a float.

malemburg · May 11, 2025, 12:01pm

Most of those can be fixed by patching the documentation.

The original idea to wrap libm was a good motivation, but things have move on since those days. This shouldn’t be interpreted as a limiting factor, IMO.

Right and a good motivation at that.

OTOH, the few integer functions in the math module are most likely used by far more people and at the same time a lot less confusing than imaginary numbers.

So if we instead go by the rule: “Put useful everyday math into the math module and separate out more specialized math into additional modules” I think we’d have a documentation compromise which would define better expectations.

This would also cover the reason why we have separate modules for rationals (fractions) and decimals.

BTW: I do sometimes wonder why complex() is a builtin and not defined in the cmath module, or, better, a complex module (together with the rest of the cmath APIs).

vstinner · May 11, 2025, 12:35pm

I’m volunteer to sponsor such PEP.

skirpichev · May 11, 2025, 1:08pm

Nobody did this for years, despite existing issues (e.g. math.prod can return integers (contradicts doc) · Issue #90984 · python/cpython · GitHub). I’m not sure it’s trivial.

Another example is handling errors. Currently we just raise ValueError’s on poles or domain errors (e.g. gamma(0) or log(-1)). Yet C standard specify also returned values for such special cases. Sometimes it’s more helpful for applications, e.g. gamma(+/-0.0) returns one-sided limits at 0, so some libraries per-default not raise exceptions.

Maybe we should too with some global flag enabled (correct values are computed internally anyway). But all this does make sense only for libm’s wrappers (i.e. for most functions in math and cmath modules, but not for number-theoretic ones).

IMO “not being part of the C standard” - a good reason to count a function as being too advanced for the stdlib.

For number-theoretic functions - it’s a bad criteria. Yet another reason to put them separately. One goal for the new PEP — define or at least roughly describe such criteria.

I use statistics module a lot:) (Well, maybe not everyday, but a close to.) Should we move mean() and stdev() to the math module? Hmm. Not sure I understand what “everyday math” means…

Rosuav · May 11, 2025, 1:15pm

Well, given how AI chatbots are being stuffed into every single service on the planet, I would say that matrix multiplication counts as “everyday math” now…

tim.one · May 11, 2025, 5:15pm

Sure! There is no argument anyone can make here that ends with “QED” . These are largely matters of perspective and taste.

Of course. Yet they haven’t been, for years. It’s not because the core devs are careless about docs.

I take it instead as evidence that the module partitioning doesn’t fit the way human brains actually work. You can legislate that “then they should change the way their brains work!”, but I think it better to change the module.

Repetition is always tedious, for those writing and those reading. The catch-all “they all return floats” used to be true, and so “self-evidently” so that nobody’s eyeballs even noticed that it became increasingly belied by exceptions. That’s human nature.

Yet that was never done. That’s fact. Again, it’s not because core devs are inherently careless.

Neither do I. I characterized it as an ongoing annoyance. Little annoyances add up over time.

67 is “a lot” for any module.

While math initially just wrapped the platform C’s libm, libm was itself misnamed and Python’s math carried on that tradition. It’s not really “about math”, it’s about working with HW floats. In a purely OO language, all the original math functions would have been methods of float objects (and of complex …), and the functions later added by Python would be methods of int objects (all the functions in question raise TypeError if you try to pass a float to them).

That’s an instance of “domain-relevant distinctions” that Python’s lack of OO purity obscures. I’m not at all a fan of OO purity either, but it’s a viewpoint that makes it quite clear why those who (like me) believe the number-theoretic functions are a “poor fit” to libm. Although they didn’t have the vocabulary to say so at the time, libm’s creators were specifying float methods, and there are quite enough of those to warrant having their own module (which libm historically was - and “should” go back to being).

tim.one · May 11, 2025, 5:46pm

But I don’t know how to define “useful everyday math”. For example, despite that I do substantial work in combinatorics at times, I’ve never had a use for math.perm().

Yet decimals are the only kind of float some apps want to use. I gave a different “OO-based” view earlies: different modules for different kinds of domains. Concetually, all the original libm functions were methods of float, and all the number-theoretic functions we added later were methods of int. fractions is for working with fractions.Fractions, and decimal for working with decimal.Decimals.

Not so much about “math” but honoring the reality that we’re working in a computer programming language, and different types necessarily implement different approximations to “pure math”. While none of that “comes naturally” to newbies, those are realities they’ll eventually have to make peace with, one “mysterious algorithm failure” at a time.

Just historical accident, I think. While Guido didn’t want to impose it on anyone, he did view complex as a “built-in type” from near the start. No other numeric types. Rationals were relegated to an example Rat,py under the old Demo/ tree. Experience with them in the ABC language soured him against rationals for practical work - they’re CPU- and RAM- hungry.

storchaka · May 11, 2025, 8:55pm

I restored my PR: gh-81313: Add the imath module by serhiy-storchaka · Pull Request #133909 · python/cpython · GitHub

Since closing the original PR, lcm() was added, gcd() supports more than 2 arguments, factorial() rejects floats, factorial(), perm(), and comb() has been significantly optimized. The implementation of the last three functions took up a lot of space space in mathmodule.c. It would be worth moving them to a separate file just for maintenability.

Wombat · May 11, 2025, 9:16pm

And the reason for this is that no actual user seems to have a problem with the status quo.

The situation isn’t much different than with my pocket calculator that has float functions like sin and log and integer functions like n!, nPr, and nCr. No one is arguing that those two categories shouldn’t be on the same calculator.

AFAICT the core devs here are making a purity argument and drawing a distinction that end users don’t actually make or need. People think of all this as “math stuff” and reach for the math module.

Except when explicitly noted otherwise, all return values are floats.

Such sweeping statements should have been removed long ago. The builtin module has a variety of signatures and it hasn’t been a problem at all. So, just stop making untrue sweeping statements about type and the “problem” goes away.

Splitting functions into related groups also aids discoverability.

I would argue the opposite. Spltting modules forces the user to guess where you put the functions. Factorial is integral and Gamma is real. Ceil and Floor are a mix of both. Prod and Sumprod work equally well with ints, floats, fractions, and complex.

A famous Python poet once said that “flat is better than nested”. Perhaps this is what he had it mind?

storchaka · May 12, 2025, 7:30am

There was even earlier discussion:

It proposed to move factorial() (which at thet time acceptet floats) and gcd() and add new functions: as_integer_ration() (supports arbitrary rationals), binom() (later added as math.comb()), sqrt() (later added as math.isqrt()), isprime() and primes().

Issue #81313 was opened after addition of isqrt(), comb() and perm() in the math module). It more than doubled the number of non-floating-point functions in math, so I proposed to move them in a new module.

Later added math.lcm() and int.bit_count() would also find a better place for themselves in a special module.

Nodd · May 12, 2025, 12:56pm

To be somewhat pedantic:

>>> import math
>>> len([v for v in dir(math) if not v.startswith("__")])
61

And for comparison:

>>> import numpy
>>> len([v for v in dir(numpy) if not v.startswith("_")])
499

Wombat · May 12, 2025, 3:59pm

The show stopper is what to do with functions like math.prod that work with all numeric types or math.trunc that work with an object that defines __trunc__.

These functions are generic and defy classification as math, imath, or cmath.

tim.one · May 12, 2025, 4:35pm

Leave them in math. The functions proposed for imath (under whatever name) work only on integer arguments, and in a purely OO language would be methods of the int class. Littering the math namespace with them makes about as much conceptual sense as moving limit_denominator() from being a method of fractions.Fraction to being a function in math that only accepts Fraction arguments. Or similarly using math as a catch-all dumping ground for the int.bit_length() and int.bit_count() methods.

Sure, to a rank newbie all that too is “just math”, but the more someone learns the more one appreciates the subtler benefits of appropriate classifications. Leaving math for “floats and assorted generics that aren’t used enough to rate being built-in (like sum() is)” is fine by me.

skirpichev · May 12, 2025, 4:52pm

BTW, there is only one other: sumprod().

Yeah, they “work” in sense they are not reject complex numbers. Maybe they should (as e.g. hypot): these functions implement special handling only for floats and small integers. On another hand, it’s not difficult to extend the prod for complex inputs. (The sumprod() is much more tricky in this sense.) But currently these generics are useful only for floats/ints.

These special cases are documented. But for the rest — argument handling happens as for most functions in the module (fallback to PyFloat_AsDouble(arg) and libm’s call).

tim.one · May 12, 2025, 6:20pm

prod() and sumprod() work for any numeric types, including complex. They “merely” treat ints and floats specially, using faster C-level arithmetic instead of PyNumber_Multiply() and PyNumber_Add() when they can, and emulating extended precision for floats.

For other finite-precision numeric types (complex, Decimal, user-defined) they don’t enjoy extended precision. For Fraction, they’re exact.

Wombat · May 12, 2025, 6:26pm

If I understand the proposal correctly, all of the existing names would also be left in the math module so as to not “break the world”. So the “littering” has already occurred and cannot be “unlittered”.

Once users learn that math.lcm still works, do you think they will switch to imath.lcm out of a need for purity?

Also, third-party module maintainers have a strong disincentive to switch because their code needs to work across python versions. No one really wants years of, try: from imath import lcm, gcd; except ModuleNotFoundError: from math import lcm, gcd. That’s just gross. It isn’t necessary. AFAICT, the only problem being solved is a vague feeling that what we have now is a bit “messy”. It’s a “purity” problem, not a real problem.

tim.one · May 12, 2025, 6:58pm

That’s right.

Depends on the user. Some will, some won’t. You can guess that “most won’t”, but I’m not sure: my guess is that most users of math.lcm (are you one?) are more sensitive to what you dismiss as “purity” than random users. Note that the core devs visibly in favor of this aren’t “random” core devs either - they’re the ones who actually work on implementing numeric functions.

That’s fine! Nobody is asking them to change anything.

But also an ever-growing one. When we add, e.g., imath.ceildiv(), if won’t also be added to math. All uses that will ever exist will be gotten from imath from its start.

There are dozens of new functions we may eventually add (ceildiv is the current candidate). Part of the reason for the relatively slow adoption of integer-domain functions is the years-long opposition to adding such things to the math module - the people who work on such things always knew they “didn’t really belong” there, but there was no other place to put them.

That’s an historical mistake that can’t be repaired now, but we can prevent it from happening over & over again in the future.

BTW, “why weren’t they made methods of int instead?” would be a good question nobody has asked .

They were, for .bit_length() and .bit_count(). But most of these things have a long history of being used with some kind of functional notation in the literature and prior art, and “practicality beats purity” won. Indeed, Python isn’t a rigidly OO langauge, and that .bit_length() and .bit_count() are methods of int is a bit jarring to my sensibilities.