Make float.__(i/r)floordiv__ return an int

What I describe in the PEP allows, indeed, for implementations to make ints larger than the largest float come out of an integer division. But I’m not saying it’s now required. I think it’s opening the door to that feature and it’s an interesting one, but if you prefer I can move it out of the recommendation section and in an “open-end” section as a regard to the possible future evolutions this enables ?

I don’t think what you say is enough to close the door on generating bigger-than-float ints rather than raising an OverflowError, but at the same time I’m not convinced it definitely should be on the list of things to do and I also think it’s only tangentially related to this question of the return type.
But I certainly feel what you’re saying : I tried making an implementation of the algorithm I drafted near the end, and I simply can’t check the results because then everything overflows all the time !

So, I think I’ll revert the recommendation back to “it should raise OverflowError whenever it used to”.

One other thing to keep in mind that you’ll need to submit a draft PEP besides a sponsor and at least rough consensus here that a PEP is appropriate (even if not agreement on the specifics of the proposal), is you’ll want to have a reference implementation, usually in the form of a PR to the cpython repo or your fork of it, at least as a prototype. Sometimes PEPs are initially published to the PEPs repo with a good reason, or if the author commits to providing one prior to acceptance and we have a basis to trust that, but in general especially in this case IMO it would be best to have at least a rough working prototype prior to merging (and ideally, submitting) a formal draft PEP.

1 Like

@CAM-Gerlach Ok. I don’t know if I have the level to do that. For now, at least.

@franklinvp I’d rather have the HedgeDoc file be just mine for now. Nothing to worry about, it was my mistake with the privacy settings. And thanks for the thing you added, it made sense and I rephrased things a bit.

I hope this doesn’t pass as something self-centered, if I could open it to people who alredy spoke here without letting unknowns deface it, I probably would, but I don’t think I can.

and now you get inf:

In [18]: a
Out[18]: 100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

In [19]: b
Out[19]: 1e-200

In [20]: (a / b) * b + (a % b)
Out[20]: inf

Is that any better? floats have limited precision, that’s a limitation that this doesn’t make any worse. And it would actually be quite difficult for this not to be inf or an error – I don’t think anyone is proposing making a while new type of math so we can do unlimited precision computation with non-integers. (OK, I suppose you could coerce to high precision Decimal, but again, no one is proposing that)

I think the really compact way to describe this proposal (@Gouvernathor, correct me if I’m wrong) is pretty simple:

The original PEP 238 said that floor division would mean:

a//b == floor(a/b)

as simple as that – and when it was written (py2) that means it would return a float if a or b was a float – in fact, it would always return a float if you were strict about it. But it did mean:

type(a//b) == type(a/b)

Then PEP 3141 created a more consistent Type Hierarchy for Numbers, and floor(a) now returns an int.

This proposal is that a//b should return an int, so that we are back to the original idea:

a//b == floor(a/b)

and also:

type(a//b) == type(floor(a/b))

Is that so hard?

So far:

  • I don’t think anyone recalls why this wasn’t done with PEP 3141
  • Some folks think it would have been better not have made the change to floor() and ceil()
    But it’s hard to argue that the inconsistency is good.

Backward compatibility is a good thing, and that may prevent this idea from being adopted, but that’s different than it being a bad idea.

Yes, of course it is. Valid operation between values of a type that yields a result that is a value of a type.

float('inf') is not any better or worse than any other value of float. It is a number, like all others.

7 posts were split to a new topic: Number theory discussion

So, coming back to that, as amended.

The issue that I have with that is two-fold.

One, it is not a value “like all others”, since it’s specifically unsupported by the floor, ceil, int and round functions. Just like nan and the opposite infinity. Therefore we can envision these values being specifically unsupported by an operation, just as 0.0 is specifically unsupported by true division. It makes even more sense in the case of nan and infinities than in the case of 0, because of the precedent these four functions create.

Two, the floor division, as returning a float, has something special compared to all other math operations on number types (int, float and complex) : it cannot return all values of its type. For every other operation, there is a way to obtain every single float value (let’s say normalized values to be prudent, but still), by adding or substracting 0, or multiplying or dividing by 1, or multiplying its half by 2…
For floor division, that’s not possible. That’s what places it apart from the others in my view, and what makes it impossible to consider as a “Valid operation between values of a type that yields a result that is a value of [that] type”.

Well, surjectivity is a strange condition to not consider a function an internal operation.

I don’t think it is strange, especially when all returned number values (i.e excluding nan and infinities) can be represented using the int type. Floor div is not similar to the other math operations, it was even originally defined as a composition of true div with a call to a function (floor).
Even modulo, which is the other non-surgective operation (I forgot that one) doesn’t have that property of having all its values be valid ones for another type, and it doesn’t either have that property of being a composition of an operation with a function. Floor div is special.

Also, could you define what you mean precisely by “internal” ?

Internal (n-ary) operation f:X^n\to X, like multiplication of integers. As opposed to External f:X^n\to Y, or f:X^n\times Y\times X^m\to X with Y\neq X, like the scalar (or dot) product, or the multiplication by a scalar.

1 Like

Thanks for the info.
Well, since math.floor is unary and external when applied to float, and since float-float true division is internal, the composition of the two which makes up the floor division makes sense to being external.
f : X x XX
g : XY
f o g : X x XY

So this has petered out – but as it happens, I was coding away today and found myself having to call int(a // b ) over and over again, and purposely using math.ceil() rather than numpy.ceil() (which still returns a float).

I’m still dumbfounded by the resistance to this idea – why are you using floor division if you don’t want an integer?!?

It may be too late to change, sure, but that’s not the same as arguing that this is somehow the best design.

[*] If you are curious, I’m working on some graphics code, where I need integer pixel coords in the end.

1 Like

It depends on what you want to do with the result. If it is a quantity that can be counted (like pixel coordinates), you want an integer. But if it is something that has to be used further in floating point computations, it makes no sense to first convert it to an int and then back to a float. So, there are people like me, who are purposely using numpy.floor in place of math.floor because they do not want the result to be an int.

2 Likes

In pure python, there is no need to “convert back” – you can use an int in pretty much any place you can use a float – that’s the whole point of this thread, and the point of PEP 3141.

Numpy arrays are statically typed, so the trade-offs are different there. Though in most places, the casting rules would work fine there too.

But this thread is not about ceil and floor – that decision was made long ago, it’s about making “floor division” consitent with what ceil and floor already do.

who are purposely using numpy.floor in place of math.floor because they do not want the result to be an int.

In this case, are you using numpy already? If not, how does getting an int result cause practical problems? And how is using numpy.floor easier or better than calling float(math.floor()) ?

And calling numpy.floor is substantially slower for a regular python scalar float value.

In [5]: %timeit float(math.floor(1234.123))
156 ns ± 9.78 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [6]: %timeit np.floor(1234.123)
1 µs ± 125 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Even though there is an extra name lookup and function call.

Anyway, unless a Core Dev is interested in sponsoring a PEP on this, it’s not going anywhere.

If you’re flooring, why do you need the following calculations to be to be “floating point computations” in the first place ?
If you need to multiply afterwards and want that to be a float multiplication to save time, then I’m not sure why the default implementation of an operator should be optimized for uses in the middle of a succession of calculations - specifically float calculations, because if // returns an integer, (a//b)*c would still be optimized, and also more accurate, if c is an int.

I think I’ll try a C implementation, if a PR can make this go forward. Can someone point me to the area of code where such builtin dunders are defined in CPython ?
Nevermind I found it.

I posted an issue on Github. I hesitate to open a pull request : a base implementation could be as simple as that, but it will require at least a big news and changelog entry and some tests changes I expect, not to mention the possible transition steps I explored in the PEP draft (emitting a warning, managing the previously-accepted values…).

In the PEP draf it says

However, the set of int values is a superset of the integer float values - those for which f.is_integer() is True.

This is not taking into account that the underlying sets of values are not all that is important for an algebraic structure, or a type. The operations, their behavior, the algebraic expressions, or logical propositions that they satisfy are also important. If you take the float for which is_integer() returns True, with the operations +, -, and * it is already not a sub-structure of int.

It says

The a//b operation was initially described, in :pep:??? , as being the equivalent of math.floor(a/b).

Perhaps that “equivalent” was only an oversimplification that accidentally got into that text.

It says

It fails (raising an exception) whenever passed 0 as the second operand, as well as when passed float("nan") , float("inf") or float(-inf) as either operand.

I don’t think we want x // y to fail with x finite and y = float('inf') ? The result n is expected to give r = x - y * n where r is x % y. The latter being the remainder operation of floating point numbers, as in page 19.

Note: Shouldn’t the change also change float_divmod for consistency? And well, I already gave my opinion on // returning an int. I have the same opinion of divmod returning a tuple of an int and a float.

I think it is. What operation(s), expression, or logical proposition can you make with an integral-float that you can’t with an int ? As far as I know, only .hex() and direct typechecks.

I think we should reserve this question for until the rest is accepted, but yes, that’s a good point. I wouldn’t want this change to make the same kind of oversight I’m accusing 3141 of.
I’ll note, ironically, that divmod’s documentation describes it (in the case of floats) as “usually math.floor(a / b)”, so even there the return type is not explicit.

I can get behind that, sounds good to me. Though I don’t understand why it sometimes returns -1 instead of 0, but that’s not my problem here.
I’ll integrate that in my draft, I think - separating cases where inf or nan is passed from cases where it’s returned by the true division.

Some examples:

  1. In int the equation x+1=x doesn’t have solutions. In float for which is_integer() returns True (let me call them “fint”) it does have solutions. Note that this example is not about one peculiar algebraic equation. This phenomenon happens for lots of equations.
  2. +, -, * give int outputs for all int inputs. That is not the case for fint.

Your (2) yields a difference in a direct type-check (direct as including isinstance and type/__class__, and excluding duck-typing), so it’s just that it propagates other API (= duck-typed) differences. But it doesn’t add any one situation, expression or behavior to the list of these differences.

Your (1) is a limitation, not a feature you can rely upon. Why do I say that ? If you really need factual points, the fact that direct and exact comparison between floats is advised against in the doc, so testing a+1 == a is as close as it gets from an undocumented behavior.
What’s more, the size of the mantissa and exponent sizes are not constant and not documented, they can be accessed in sys.float_info and change depending on the install, the system, the implementation and who knows what else (as far as the documentation extends), so the values on which the equation is true will change based upon these sizes. If a given value can stop satisfying that equation when moved to another computer, it’s not much more breaking a change if we make it an int and never satisfy the equation ever.

So, these wouldn’t be documented, reliable behavior differences between “fints” (why not) and ints.
I’ll stick with .hex() and direct typechecks.