Translate in the bytecode 1e23 into 1 followed by 23 zeros

Marco_Sulla · February 5, 2023, 3:47pm

(from https://discuss.python.org/t/let-int-accept-scientific-notation-strings/)

>>> int(1e23)
99999999999999991611392

What if the bytecode compiler translates NeM in N followed by M zeros and .0, where N is an integer and M is a positive integer? So

1e23 → 100000000000000000000000.0

PS: this doesn’t solve 0.2e24

storchaka · February 5, 2023, 4:05pm

I don’t see what bytecode has to do with this, but 1e23 and 100000000000000000000000.0 are the same in Python.

>>> 1e23 == 100000000000000000000000.0
True

But, float has limited precision.

>>> int(100000000000000000000000.0)
99999999999999991611392

You perhaps need to look at the decimal module.

Marco_Sulla · February 5, 2023, 4:17pm

@storchaka Mh I see. What if 1e23 will be stored internally as a C integer? Not for all floats, only for floats that are integers. It seems to me too much difficult to use decimal for a simple scientific literal.

storchaka · February 5, 2023, 5:28pm

I don’t even know where to start. At best, what you’re suggesting is replacing native binary floating point numbers with decimal floating point numbers which have to be builtin in the interpreter core rather of external module. Ignoring an enormous complexity of such project, breaking compatibility, performance hit and maintenance cost, this will just replace a certain set of binary floating point “gotchas” with different set of decimal floating point “gotchas” which are even more weird and less known.

But I think you haven’t really thought it through and are suggesting something less coherent.

Rosuav · February 5, 2023, 5:32pm

Worth noting that this is a legal integet literal:

>>> 100_000_000_000_000_000_000_000
100000000000000000000000
>>> type(_)
<class 'int'>

That might not be sufficient for e+23, but for mid-range numbers it might be sufficient.

Marco_Sulla · February 5, 2023, 6:01pm

Well, no. As you said, this will be an overkill X-D

I’m asking if it’s possible to create an internal PyFloatIntegerObject, that acts as a PyFloatObject but that internally stores the value as an integer. This is only for floats that has an integer value. I don’t know if this is practical, or if this is a special case which is not special enough.

I call it “brainstorming” X-D

Rosuav · February 5, 2023, 6:03pm

If the float has an integer value, it is equal to that integer, and will mostly behave identically. The float value 1e23 has an integer value:

>>> 1e23 == 99_999_999_999_999_991_611_392
True

Marco_Sulla · February 5, 2023, 6:08pm

Chris Angelico:

If the float has an integer value, it is equal to that integer, and will mostly behave identically. The float value 1e23 has an integer value:
>>> 1e23 == 99_999_999_999_999_991_611_392
True

I try to explain it better. I would say that the value of a PyFloatIntegerObject is not represented by exponent and mantissa, but by a single integer.

PythonCHB · February 6, 2023, 12:15am

but what is the advantage of integer-values being represented as integers, rather than plain old floats. Python floats are IEEE 754 doubles – and that is chosen because it’s a good, well vetted format, but more importantly is supported directly by hardware – making a different kind of float would add a lot of overhead – but what would we gain?

apalala · February 6, 2023, 12:47am

>>> 10**23
100000000000000000000000
>>>

NeilGirdhar · February 6, 2023, 1:19am

I’m asking if it’s possible to create an internal PyFloatIntegerObject, that acts as a PyFloatObject but that internally stores the value as an integer.

I’m not sure you need a new type. All integers support the float interface since numbers.Integer < numbers.Real.

I think what you seem to want is to have exact integer constants that are expressed in e-notation. It’s too late to change the definition of e-notation in Python to produce that. However, Juancarlo’s solution is the best you’re probably going to be able to do. Instead of 138e35, you can write 138 * 10 ** 35, which legible. It probably comes up rarely enough that the lack of compactness isn’t such a big deal.

Does that solve your problem?

steven.daprano · February 6, 2023, 2:26am

In CPython, expressions of the form a*10**p may be evaluated by the peep-hole optimiser into a constant. According to my tests, this occurs up to p=32:


>>> dis.dis('2*10**32')

  1           0 LOAD_CONST               0 

(200000000000000000000000000000000)

              2 RETURN_VALUE

>>> dis.dis('2*10**33')

  1           0 LOAD_CONST               0 (2)

              2 LOAD_CONST               1 (10)

              4 LOAD_CONST               2 (33)

              6 BINARY_POWER

              8 BINARY_MULTIPLY

             10 RETURN_VALUE

This implies that for smallish numbers like 1x10^23, there is no runtime cost in writing them as 10**23.

And for largish numbers like 1x10^50, the runtime cost of computing it is probably negligible compared to whatever work you do on it next. Maybe.

Marco_Sulla · February 6, 2023, 8:01am

No, because 10**23 is an integer, not a float.
If under the hood 1e23 is represented as an integer but it’s a float, you have speed and accuracy at the same time.

For the sake of simplicity, this can’t be applied to 10e-1, 0.2e2 and so on. This can be applied only to NeM, where N in an integer and M is a positive integer (or not negative?)

This in theory can be applied also to 10.0, float(10) and so on. 10 is only a brief example.

PS: this is not a reply to Neil only, I have replied to all in a brief way.

NeilGirdhar · February 6, 2023, 12:29pm

It’s impossible to have the speed of a float and the accuracy of an integer at the same time. You either store the integer (which is long) or the float (which is fast).

And I don’t understand why the negligible speed difference is important to you at all. When would that even come up and be significant? It’s only significant when you have a huge quantity of such numbers, e.g., a numpy array. In which, you can push for wider numpy “dtypes” if you need more accuracy.

steven.daprano · February 6, 2023, 12:34pm

How? By magic? You can’t just declare that “if we change the data structure, we will have speed and accuracy” as if that happened automatically.

The float 1e23 uses 64 bits, and can be operated on by your platform’s floating point libraries, which Python uses.

The integer 100000000000000000000000 requires a minimum of 77 bits. That makes it 20% larger than a float, but most importantly, not being an IEEE-754 binary64 value, your platform floating point libraries cannot operate on it. You would need to re-implement all the floating point libraries to work with arbitrary precision integers. Good luck with that! (Especially the trig functions. Have fun reimplementing them.)

Let’s take an example: suppose we did what you want. Now the float x = 1e23 is stored internally as the int 100000000000000000000000. Great! To check the accuracy, we calculate 1/x and see if it is exactly 0.0000…01 (mathematically exact 10**-23).

Ah, now the problem is that the float 1/100000000000000000000000 doesn’t exist. The closest we can get is the binary float 1e-23, which equals 0.00000000000000000000000999999999999999960434698015 (approximately).

So we did a huge amount of work to get a result which is no more accurate than floats for many operations, but slower.

Perhaps you should try Decimal instead? It is almost as fast as float, and almost as precise, but has better accuracy when trying to represent exact decimal values (both whole numbers and fractional).

Marco_Sulla · February 6, 2023, 1:10pm

That’s what I thought about: a “peep-hole optimization” for floats. I use the quotes since its not for speed reasons, but for accuracy for integer literals.

This is probably due to the memory (32bit). It’s faster to load into memory a 32 bit instead of computing it. On the contrary, 100 bits are costly to load directly and it’s better to create it step by step.

Using a PyLong as value for PyFloatIntegerObject. PyFloatIntegerObject will be created by the peep-hole optimizer only, from expressions like 1e23, float(100000000000000000000000), 100000000000000000000000.0. Not sure this can work well for n>32, as you pointed out.

1/x is slower if x == 10**23 instead of 1e23? You have to time the operation only, not the creation of the object
have you considered integer operations?

Rosuav · February 6, 2023, 1:18pm

Peephole optimization is a very specific term. Like “inconceivable”, I do not think it means what you think it does.

steven.daprano · February 6, 2023, 3:42pm

Hell yes. About four times slower.


[steve ~]$ python3.10 -m timeit -s "x = 1e23" "1/x"

10000000 loops, best of 5: 32.9 nsec per loop

[steve ~]$ python3.10 -m timeit -s "x = 10**23" "1/x"

2000000 loops, best of 5: 139 nsec per loop

If you want integer operations, why aren’t you working with the integer 10**23 instead of a float? But for the record, floats are about twice as fast as ints on my PC:


[steve ~]$ python3.10 -m timeit -s "x = 1e23" "3.0*x - 2345.0"

10000000 loops, best of 5: 35.9 nsec per loop

[steve ~]$ python3.10 -m timeit -s "x = 10**23" "3*x - 2345"

5000000 loops, best of 5: 68.6 nsec per loop

The reason they are faster and smaller is that they compromise by having less precision.

Marco_Sulla · February 6, 2023, 3:45pm

I agree with you.

Marco_Sulla · February 8, 2023, 8:44pm

On a second thought, I’ve done some timing:

>>> min(timeit.repeat("x * x", setup="x = 10**22", number=10000000))
0.49673155788332224
>>> min(timeit.repeat("x * x", setup="x = 10e22", number=10000000))
0.14488742407411337
>>> min(timeit.repeat("x * x", setup="x = 100", number=10000000))
0.21661818190477788
>>> min(timeit.repeat("x * x", setup="x = 100.0", number=10000000))
0.14479235606268048

How is it possible that integers are so much slower than floats, if floats are represented by 2 integers?

EDIT: much stranger:

>>> dis.dis("100 * 100")
  0           0 RESUME                   0

  1           2 LOAD_CONST               0 (10000)
              4 RETURN_VALUE
>>> dis.dis("100.0 * 100.0")
  0           0 RESUME                   0

  1           2 LOAD_CONST               0 (10000.0)
              4 RETURN_VALUE