PEP 791: imath --- module for integer-specific mathematics functions

I would like to vote for the option to just update the documentation so that it no longer claims that math only returns floats (as mentioned in the introduction).

2 Likes

As for documentation, the issue is not how document what each function accepts and returns. The issue is that the module contains functions from different domains. It is like to mix itertools and functools in one module. Adding more documentation will make the issue worse, because users will need to read/skip more. And it will not fix issue with discoverability (to know in which module to find a function, and that it can be found at all, you need to look at all the functions in the module), nor with tab-completion.

9 Likes

I want to highlight that math is often one of the (if not the) first module that is introduced to learners. (To pick just one example from my own experience, e.g., in the Software Carpentry course.)

Therefore …

… I think this point is particularly important for the math module. Mixing functions from multiple domains and putting mental load on the reader to figure out which functions are relevant in which context is making things much more difficult for learners, who already are under a high mental load.

7 Likes

This will need to (well, ideally) happen sooner or later as the module grows. And separation here is unambiguous and tested in other places. This is exactly what I did for myself - factored out all integer / number theory math functions into separate module.

So +1.
And +1 for zmath.

4 Likes

Presumably a math.newname module wouldn’t break anything?

2 Likes

Making new module a submodule of math opens a separate can of worms. We will be forced to add math.complex and math.float for cmath and these functions from math that work only with float. And maybe math.statistics for statistics and math.generic for ceil(), floor(), round(), prod() and sum() which work with many numeric types.

This increases the cost of the change too much, and I do not think that we are ready for this.

3 Likes

Ok, here is a new poll:

  • imath
  • intmath
  • ntheory
  • zmath
  • integermath
  • imaths
  • dmath
  • math.ntheory
  • math.integer
  • math.int
  • math.discrete
0 voters

I should mention some naming conflicts I’ve found so far:

  • imath — see current PEP text
  • ntheoryntheory · PyPI (GH repository is not available)
  • zmathZMATH · PyPI (GH repository also deleted)
  • dmathdmath · PyPI (latest release in 2006, project was on the Google Code)

Sorry, I don’t buy that. We’d just be adding a new level of namespace. There’s no good reason to rearrange the whole world to make it look as if we had nested namespaces from day #0. Nobody complains about cmath or statistics … being where they are, and keeping it that way forever will at worst provoke idle curiosity in newcomers, easily answered with “historical accident, not worth changing - you’ll quickly learn to live with it”.

It’s the principled and pragmatic separation of namespaces that’s most important, much more so than the hierarchy in which they’re arranged.

3 Likes

Seriously, fine by me if @skirpichev purges my name entirely from the PEP. In fact, I suggest he do so. Out of sight, out of mind …

I’m sure it will be very repetitive and module already has functions a lot. And don’t forget docstrings.

All this is a price to mix functions for non-interlaced application domains.

There is the Specification section.

That was precisely the goal for “open issue” about new functions.

Obvious criteria (necessary condition): match the Specification, i.e. functions must be exact and consume integer input.

Sufficient conditions are more tricky. Probably, we want to include a) widely known in literature functions, b) common in applications, c) simple enough to be implemented efficiently. Are there more items, that must be true? If not, probably I could move that part to the specification, together with examples.

I’ll to address this (and following your remarks) in the update: PEP 791: address PEP review comments by skirpichev · Pull Request #4430 · python/peps · GitHub

No, I don’t think we should rename cmath just for foolish consistency in this case. The math.generic maybe can be considered, but current math’s functions like prod/sumprod actually aren’t generic. I.e. the prod already works worse with complex inputs, than corresponding reduce() call. So far — this stuff fits to the math module.

I see submodule option just as naming variant.

Edit:

@root-11, such question was already answered.

The first half of the second paragraph of motivation is inaccurate and either needs updating or can be removed:

For example, the math module documentation says: “Except when explicitly noted otherwise, all return values are floats.” This is no longer true: None of the functions listed in the Number-theoretic functions subsection of the documentation return a float, but the documentation doesn’t say so.

I was going to update the docs but the number-theoretic functions all explicitly or implicitly say they return an integer:

Function Docs Return
comb “Return the number of ways …” integer
factorial “Return factorial …” a factorial is the product of integers => integer
gcd “Return the greatest common divisor of the specified integer arguments. If … then the returned value is the largest positive integer … If … then the returned value is 0 … without arguments returns 0.” all integers
`isqrt “Return the integer …” integer
lcm “Return the least common multiple of the specified integer arguments. If … the returned value is the smallest positive integer … If … then the returned value is 0 … without arguments returns 1.” all integers
perm “Return the number of ways …” integer

Are any of these unclear? Do any of them need to explicitly say they return an integer? If so, let’s update the docs.

And is there an accurate example that can be added to the motivation instead?

4 Likes

That’s not Sergey’s fault :smile:. I added that paragraph to the PEP based on this comment in an effort to put everything that was in the thread into the PEP. I think Tim was saying that the preamble doesn’t make the exceptions obvious or clear enough.

Whether this PEP gets accepted or not, I think simplifying and clarifying the documentation makes sense. So it’s great that you’re taking the initiative on that!

2 Likes

What is inaccurate here? It reflects the present state of docs. Is the return value isqrt() of an integral type or just has integer value (being a float)?

Currently we left this deduction to readers. Though, it might be wrong:

>>> import mpmath
>>> mpmath.factorial(100)
mpf('9.3326215443944151e+157')

You can simplify this table a lot: for all entries in the Return column you can place “integer”. But this looks odd.

Perhaps, description of return values (and special input processing, which is missing in your version) belongs to the “number-theoretic functions” subsection, not to the table. And there are docstrings. And other functions in the module.

Sorry, all text is entirely my responsibility :slight_smile:

I doubt this has a chance.

I think this repetition would be a good thing. When I need to figure out what math.foobar does, I usually don’t read through the entire document. I don’t read the intro, I know what math is [1]. I just read the section about the foobar function. If you type math.foobar into the docs.python.org search, you get sent to the foobar section as well. If you type help(math.foobar), or if your IDE shows docstrings, you also don’t see the module docstring, only the function docstring. So yes, if there are important details, such as the return type, they should be repeated instead of assuming they appear in some page intro or module docstring.

Speaking of the docs.python.org search appears to show cmath before math, which can be surprising. Some users may just automatically click the first result and get the wrong function. I guess that if we introduce [a-l][a-z]*math (dmath, imath, intmath, integermath), then the most general result from math may end up in the third place in that search, if there are also complex and integer variants with the same name…


  1. This is probably why the “defined by the C standard” nonsense stayed in the docs for so long: nobody read or cared about that intro. ↩︎

8 Likes

The PEP says none of them return a float, and that the docs fail to state the returns. I disagree; the docs explicitly or implicitly say they return an integer. Are you suggesting they should explicitly state they return an int type?

Are you suggesting that, because the third-party mpmath.factorial can return a float, that we should explicitly document the stdlib’s math.factorial returns an integer/int? Can do.

Sure, this table is just for this discussion, not a suggestion for the docs :slight_smile:

Simplifying? Perhaps not. Clarifying? Hopefully!

2 Likes

Implicitly — maybe. But as shown above, some implications can be wrong :wink:

It’s not just about random third-party package. People can have experience from other languages as well. Here is Octave (read MATLAB):

>> factorial(10)
ans = 3628800
>> factorial(100)
ans = 9.3326e+157

(Ah, and my pocket calculator in it’s Android version thinks it’s true as well. Though, CITIZEN SRP-145 shows overflow error in this case.)

It’s not just about the return type. The math.factorial computes result exactly, but people can expect instead something like math.gamma(n+1):

>>> math.gamma(21).is_integer()
True
>>> math.gamma(21) == math.factorial(20)
True
>>> mpmath.factorial(20) == math.factorial(20)
True
>>> # this already not fits in 53-bit of default precision:
>>> mpmath.factorial(25) == math.factorial(25)
False
3 Likes

Conversely, I’ve been around long enough that I learned “math is where C-like floating point stuff goes” and internalised that. I’m routinely surprised at all the useful discrete maths functions that are now in there.

I agree with this. I wouldn’t expect math.factorial(100) to be exact. That’s specifically because it’s in the math library. And the documentation doesn’t contradict my expectation - the only mention of the type of the return value is the general statement “Except when explicitly noted otherwise, all return values are floats.” As math.factorial(100) overflows the range of integers that can be represented exactly by a float, I assume the result is only accurate to the precision a float value can represent.

Yes, documentation can be changed. But it’s a lot harder to change people’s expectations.

Moving the discrete maths functions to a new module that explicitly documents that it’s for functions taking and returning int types, and always producing exact results, is a great way to reset people’s expectations. If anything, that is the key motivation for this change in my opinion.

12 Likes

Re documentation:

Why don’t we just type hint the functions in the docs?
e.g.
math.factorial(n )def factorial(x: SupportsIndex, /) → int:

3 Likes

The docs should make clear the domain of the functions which is connected to the type but not necessarily the same e.g. math.comb and gmpy2.comb are both integer functions but have different domains.

Maybe to some it seems implicitly obvious that factorial would be a function on the nonnegative integers that returns integers. There are good reasons why most programming languages don’t define a factorial function like that though because a function like this overflows a 64-bit integer very quickly. Hence SciPy’s factorial function defaults to returning 64 bit floats even for small integer inputs.

There are different models for handling different types and domains in different libraries both in Python and elsewhere. In some languages multiple dispatch is used meaning that you have e.g. one sqrt function but it is overloaded to dispatch to different implementations for different types. In the stdlib though there are two functions math.sqrt and cmath.sqrt. The convention is that the module from which the function comes determines the domain and inputs will be coerced to the function’s type. Third party packages in Python that have things like this use a hybrid model where they have their own sqrt function that coerces to their own set of types but then dispatch over their own types to determine the domain:

>>> np.sqrt(-1)
RuntimeWarning: invalid value encountered in sqrt np.sqrt(-1)
np.float64(nan)
>>> np.sqrt(-1+1j)
np.complex128(0.45508986056222733+1.09868411346781j)

If you follow that convention then it would be reasonable to have a factorial function that behaves differently depending on whether the input is integer vs real vs complex. It would even be reasonable for factorial to promote integer types to float but then otherwise dispatch over real vs complex in the same way that np.sqrt does.

I actually think that the stdlib model is a good one that you have a function that is associated with one type and that it coerces to that type and has a domain associated with that type. The signatures should be like:

# math
def sqrt(x: SupportsFloat) -> float: ...
# cmath
def sqrt(x: SupportsComplex) -> complex: ...

Then you use either the math module or the cmath module to determine the domain of your calculation. What messes this up is that there are integer functions in the math module because there is no reason why someone should assume that factorial is an exact function of arbitrary precision integers and also it would be useful (perhaps more useful!) to have a factorial function that is for floats.

We can’t have a factorial function for floats though unless math.factorial breaks the model of all other named mathematical functions in the stdlib by dispatching on types. Dispatching on types would be confusing though because usually you are allowed to pass e.g. 3 instead of 3.0 because an int is coerced to a float when needed. This presumed coercion of int to float is even (unfortunately) hard-coded into the static type system by PEP 484.

3 Likes

The PEP spec says it’s for functions that “accept integers and objects that implement the __index__() method, which is used to convert the object to an integer number”.

Should it also specify they return int types?