Make float.__(i/r)floordiv__ return an int

The // operation is made to return an integer value. However, when either operand is a float, the result is an integer in the form (type) of a float.
I think there are several advantages to making this return an int.

For one, there is no circumstance using the standard types where // would return anything other than an integer value, so having it return an int would not create a weird situation where the value of the result can change its type (if we were to do that to / for example and have 4/2 be an int and 3/2 a float, but it’s not the case here).

For another, there is a considerable number of situation in which integers are useful and floats fall short, for example range, or indexing a list or tuple. In fact, these two are the use cases where I stumble on that idiosyncractic behavior most often.
Methods on the int objects (not that they are often used) are more numerous that ones on the float object. Among float methods are two related to converting it to integer and checking whether it has an integer value - so that wouldn’t be useful for the result of a floordiv - and the other two are related to hex representation, and while the name of the hex/bytes methods on int are different, I think the same purpose and approximate feature is covered on the int type as well.
As a last but considerable advantage, ints are arbitrary big and precise, while floats, even integer floats, have precision limitations, mandatory rounding and information loss.

I realize the change cost itself is a big thing to consider, for a math operation’s behavior on builtin types, but other than that, what would be an advantage to the current behavior instead of having it return ints ?

2 Likes

I’ve always been confused as to why floor division doesn’t return a int. I took a look at the original PEP (PEP 238 – Changing the Division Operator | peps.python.org) and all it days about it is:

“For floating point inputs, the result is a float.” with no explanation. I suspect it’s partly because the operation has to be done with floats, so it would require an extra coercion to an int.

But I do think it’s a bit of a wart – I know I use floor division more often than not when I want an int – an need to use it as an index or something.

In fact, I’d live to see math.floor() and math.ceil() return ints too :slight_smile:

But there were some smart folks involved, so I’m sure there is a good reason.

2 Likes

There are a number of problems here.

  1. The result of division of a large float on a small float can be large. The size of any float object is 24 bytes, the size of equal int object depends on its value and can be much large (sys.getsizeof(int(1e300)) == 160). Creating a larger object takes more memory and time.
  2. Currently the floor division is implemented as the call of the C library function fmod(). Converting C double to Python int takes more time.
  3. float has limited precision. If you want a precise result, you cannot use an intermediate value returned by fmod() (which is most likely translated to one or a couple of FPU instructions), you have to implement the floor division yourself. It would be a complicated code, and it would be much more slow.
>>> int(1e40 // 3e20)
33333333333333336064
>>> int(1e40) // int(3e20)
33333333333333334345

But taking into account that operands has limited precision, it is not always worth to spend all this efforts to produce a result with larger precision.

8 Likes

Precision is really the argument here as to why float operations should not return integers outside of explicit conversion. You don’t want to promote the less precise type to a more precise type implicitly, and get hit by unexpected behaviour. This has nothing to do with floordiv in particular.

>>> from math import nextafter
>>> a = nextafter(1., 2.)
>>> a
1.0000000000000002
>>> (10*a)//a
9.0
1 Like

Unless you’re using Python 2, they already do.

1 Like

I understand that, but in my opinion taht’s what you sign up for when you use // instead of truediv : I want an integer, so let it be precise. And even more to the point :

That’s the problem. In “my” world, you can implement the old behavior by using math.floor(a/b), while there’s no way at all (if I understood correctly) to get a correct floordiv at all in vanilla python + stdlib (and the only ways I could think of would be a) using a ctypes implementation based upon undocumented stuff b) manipulate the bytes/hex representation of the operands).

I think it does. I get your point which I’d rephrase as “if you cast the result to int without changing anything else, you’ll lose time and memory while not regaining any lost accuracy”. I think we should implement the correct calculation instead.

The easiest way, if you don’t care too much about performance, is to go via fractions.Fraction:

>>> from fractions import Fraction as F
>>> x, y = 1e40, 1e10
>>> F(x) // F(y)
1000000000000000030378602842700
>>> int(x // y)  # for comparison
1000000000000000019884624838656
>>> Timer("F(x) // F(y)", "x=random.random()*1e40;y=random.random()*1e40", globals=dict(random=random,F=F)).autorange()
(100000, 0.2973314999999843)
>>> Timer("F(x) // F(y)", "x=random.random()*1e40;y=random.random()*1e40", globals=dict(random=random,F=F)).autorange()
(100000, 0.29095069999999623)
>>> Timer("int(x // y)", "x=random.random()*1e40;y=random.random()*1e40", globals=dict(random=random,F=F)).autorange()
(5000000, 0.42125029999999697)
>>> Timer("F(x) // F(y)", "x=random.random()*1e40;y=random.random()*1e40", globals=dict(random=random,F=F)).timeit(5000000)
15.008783200000039
>>> Timer("int(x // y)", "x=random.random()*1e40;y=random.random()*1e40", globals=dict(random=random,F=F)).timeit(5000000)
0.42941569999999274

That’s more than 30 times slower, so the concern is very understandable (if it doesn’t get any better it should probably not be “fixed” as I intended it to), but I suspect the Fractions framework may add some amount of overhead to this.
I’m not an expert in C math, but is there a transcendental reason why building the result directly as an int would take significantly more time than the current implem ? @storchaka mentioned the fmod call which iiuc returns a C double, but could we just circumvent that by doing it another way ? One closer to int-int floordiv maybe ? Why would it be slower then ?

I would also like to point out that in the typical use-case, I don’t really care about precision for large values : those are somewhat incompatible with list indexing and building a range() with them would be (b/r)arely useful. If we have to not increase the precision and get an int at the end, I’m satisfied.

>>> float("inf")//2.5
nan
1 Like

I would argue that this already breaks the invariant of (a//b).is_integer() being True. So I would support keep breaking an invariant about it, either by it returning a float nan, or raising an exception (OverflowError seems cut out for it).
I understand if some are frosty about making native types raise a new exception, but I think this is the situation, if any, where OverflowError is warranted and deserves to be raised. It’s what it’s there to do after all.

We can’t change the type of a_float // b_float without breaking a ton of code. So you’re left with coming up with another spelling for the proposed behavior.

2 Likes

That, or from __future__ import integer_division.
But I challenge the “a ton of code”, I doubt a ton of code calls the hex method on the result of a floordiv - or floordivs an inf/nan. Especially since it returning a float is not currently documented : there’s only a warning about floordivs that “the result’s type is not necessarily int”.

The future import just means you’ll break the code in the future when it becomes the default.

I have code that dispatches based on the types of values. I don’t want the types of existing calculations to change.

1 Like

Yes, wasn’t that the purpose of the whole __future__ feature ? It’s the same as the division future, only in a much, much less impactful way.

Code relying on undocumented implementation details get broken as the language evolves, I think that’s the rules of the game. If there’s a future, you will have time to fix it.

I think you’re grossly underestimating the amount of work and machinery involved in the original from __future__ import division, and it’s not at all clear how a future import would work in this case. If you followed the PEP 238 model, you’d create a new bytecode instruction that would be emitted for occurrences of // in modules with the relevant future import enabled, and then you’d have to add machinery to handle that bytecode, possibly involving new dunder methods.

If you really want to go this route, then a PEP would certainly be needed. But I think this has very little chance of getting off the ground - the amount of work and code churn (not just to CPython, but to 3rd party projects) required way exceeds the possible benefits.

If all you want is an efficient way to do floor division of two floats, then that’s a much more reasonable proposition, and one that could be satisfied by (for example) a new math.quorem function that divides two floats and returns a pair (quotient, remainder), where the quotient has type int and the remainder has type float. As a bonus, we could make the remainder have the same sign as the dividend (which is generally what you want for floating-point operations), in which case both quotient and remainder can be provided exactly, with no floating-point precision issues.

3 Likes

Ok, maybe I was a bit too enthusiastic about saying it’s the same as the division future. That would definitely be a too massive change, as you describe it. But I’m not even convinced it actually deserves a future at all, since to my knowledge those only applied to new syntax (with_statement, reserving a keyword) or changes to documented behavior (unicode_literals for example). On the contrary, this is only an implementation detail.

Your idea of a math function for that is interesting as a middle ground, I’m not sure as to how it’s workable in practice. What I’m after is for a syntax comparable, and as simple if possible, as liz[val//k] and range(val//k). I don’t find that in liz[quorem(val, k)[0]], but maybe there’s room to improve it ?

Apart from all the issues being pointed out here, I think your premise isn’t correct. The // operator is “floor division”, not integer division. And floor as a floating point operation returns a floating point number. What you are proposing would more fundamentally need to change floor.

3 Likes

Yes, I’m aware.

Yes, that’s the issue, it’s what I’m saying we should change. That’s what we’re discussing.

I don’t understand, what’s your point ?

You are asking to change the // operation to produce integer output, because it “is made to return an integer value”. That is not true: the operation is divide and floor. Since a//b means floor(a/b), you are either

  • implicitly asking to change the floor() function to produce integer output, not the // operator, or
  • implicitly asking to change // to mean “integer division” instead of “floor division”.

Either of these would be an enormous change to the language.

Can you explain what your use-case actually is?

If your end-goal is to have an int that you can use to index into a sequence, or pass to range, why are you starting with an float?

sequence = list(range(100.0//2))
sequence[103.0//4]

Obviously you’re not doing literally that, but I don’t understand why and how this issue of starting with a float or a Decimal comes up, if the end result is to turn it into an index.