Let int accept scientific notation strings

user0 · February 4, 2023, 8:39pm

Let int accept scientific notation strings, like float does (but with non-negative exponents only). This would allow scientific notation strings (e.g., command-line arguments specifying big integers) to be parsed directly without the loss of precision incurred by converting via float (e.g., int(float('1e23')) != 10**23).

PythonCHB · February 5, 2023, 12:32am

I’m +1 on this – I just noticed this (probably not for the first time) when prototyping a significant figures function in another ideas thread.

Yes, it’s not a common use, and maybe you are using the"wrong" type, or the wrong input method – but as ints can be passed to the “e” formatter, it would be nice to round trip.

guido · February 5, 2023, 12:53am

And why not allow 10e-1?

More seriously, this feels it might break consistency between the literals supported by the language (where 1e1 is a float) and those supported by int().

It also seems potentially confusing with hex numbers.

Maybe it should require a flag? Or be a different function altogether (maybe in math)?

PythonCHB · February 5, 2023, 4:41am

Actually, I think “10e-1” would be fine – as long as the value is an integer, though maybe there is no other place where value, rather than the form, of a string number would be a ValueError e.g 1e4444 doesn’t give you a ValueError, it give a float with the value of inf.

Yeah, this is of greater concern – though I’m having trouble coming up with any way this inconsistency would lead to actual confusion or incorrect behavior. Though my not thinking of it doesn’t mean much …

I don’t know that either of these would be worth it – I don’t know about the OP, but I think for the most part, you wouldn’t know that you’re getting exponential form when you write the code.

PythonCHB · February 5, 2023, 4:54am

hmm – there is already some inconsistency on how literals and int string parsing are interpreted:

In [9]: int('012')
Out[9]: 12

In [10]: 012
  Cell In[10], line 1
    012
      ^
SyntaxError: leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers

granted, that’s an Error rather than a different interpretation, but IIRC, it wasn’t always an error.

so maybe:

1.2e3 being evaluated as 1200.0 and int('1.2e3') being evaluated as 1200 wouldn’t be any more surprising – and they would have equal values, as long as it was within the float range.

storchaka · February 5, 2023, 8:04am

I am -1. It will break a code like:

try:
    x = int(s)
except ValueError:
    x = float(s)

Note also that 1e23 != 10**23, so int('1e23') would not be equal to float('1e23').

pf_moore · February 5, 2023, 8:21am

Surely that doesn’t matter? If you want to allow exponential notation, use the new function. If you don’t, use int. What to allow should definitely be your choice.

Of course, once it’s a separate function, we have the debate of why not write your own, why not publish it on PyPI rather than in the stdlib, etc.

tjreedy · February 5, 2023, 8:25am

Currently, int(s) is int(s, base=10) and int('1e23', base=n) is invalid for n < 15 and valid with e == digit 15 for n >= 15. Both should remain true. A new flag sci=True, mutually exclusive with base could work.

Marco_Sulla · February 5, 2023, 2:35pm

Well, this is horrible, but works:

>>> int(eval("1e1"))
10

but

>>> int(eval("1e23")) == 10**23
False

anyway.

Rosuav · February 5, 2023, 3:06pm

Except that, no, it doesn’t work. That’s just int(float(x)) from the OP but in a slower and more risky way.

Marco_Sulla · February 5, 2023, 3:30pm

I agree a flag is a more practical choice to not break any previous existing code.

Rosuav · February 5, 2023, 5:26pm

I really dislike the editing feature. I can’t quote the entire message because it’ll get removed, and if I quote less than all of it, nobody knows which version I replied to.

But I was actually responding to the post-edited version. You came up with something that’s exactly as wrong as the original int(float(x)), but with a new set of problems since it uses eval. So you were half right. It is horrible. It just doesn’t work.

Marco_Sulla · February 5, 2023, 6:26pm

########
EDIT2
########
Christopher has a more elegant solution for checking if the new flag is correct: if sci and base > 15: raise

########
EDIT
########
Anyway, maybe this flag is too much specialized. Maybe it’s better a new math function, as suggested by the BDFL.

########
Original post:
########

As I said, I feel the idea as good, but if sci have to be mutually excusive with base, this means that

int("1e23", base=16, sci=False)
int("1e23", base=10, sci=True)

are both illegal? Is not enough to check if sci and base != 10?

PythonCHB · February 5, 2023, 11:38pm

Is there anything fundamentally wrong with using scientific notation with other bases? weird maybe, but wrong?

OK – base 15 and above use ‘e’ as a digit, so not good – so yeah, disallow it – I suppose you could check for sci and base < 15.

NOTE: for those to whom it’s not obvious, two reasons that:

int(float(a_string)) is not a good solution are:

truncation of non-integer values:

In [9]: int(float('1e-3'))
Out[9]: 0

overflow to in inf for very large numbers:

In [10]: int(float('1e500'))
---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
<ipython-input-10-540b4755c49c> in <module>
----> 1 int(float('1e500'))

OverflowError: cannot convert float infinity to integer

I think the truncation is worse – or at least would be hit more often.

-CHB

malemburg · February 6, 2023, 10:30am

[guido] Guido van Rossum https://discuss.python.org/u/guido guido
CPython core developer
February 5
Maybe it should require a flag? Or be a different function altogether
(maybe in |math|)?

I can see the point in having a better way to express very large
integers, but the E-notation is closely tied to floating point numbers
and so people who write 1e23 expect to get a float and not an integer.

As a result, having int(‘1e6’) work and int(‘1000000.0’) fail would be
inconsistent.

There doesn’t appear to be a notation similar to the E-notation for
large integers and inventing one for Python (e.g. “1L23”) would again
confuse people.

So why not simply use a helper function, e.g.

def largeint(x, e):
     return x * 10 ** e

 >>> largeint(1, 23)
100000000000000000000000

pf_moore · February 6, 2023, 10:46am

For that matter, why not simply use an expression? It avoids the function call overhead, and as was pointed out earlier, the peephole optimiser even removes the overhead of doing the calculation at runtime where possible:

BIG_LIMIT = 10 ** 23

(Anyone claiming that 10 ** 23 is less readable than 1e23 is drifting very much into subjective opinion territory).

Rosuav · February 6, 2023, 10:59am

It gets a little harder when you’re not working with a plain power of ten though.

MASS_OF_EARTH = 5_972 * 10 ** 21
MOLE_QUANTITY = 602_214_129 * 10 ** 15

def mole_of_moles():
    print("If one small furry animal weighs 75g...")
    mole_mass = MOLE_QUANTITY * 75 // 1000
    print(mole_mass, "kg of moles")
    print(MASS_OF_EARTH, "kg of earth")
    print("This planet weighs", MASS_OF_EARTH // mole_mass, "moles of moles.")

… if anyone asks, I did not tell you it was ok to write code like this.

pf_moore · February 6, 2023, 11:22am

How would it be easier with some sort of 5972e21 notation, though? Are you assuming that 5.972e24 should be treated as an int?

Rosuav · February 6, 2023, 11:30am

No, I’m not, because that includes a decimal point. (I suppose you could argue that, if the exponent exceeds the number of digits after the decimal, it could be stored as an int, but I’m not proposing that.) But if it were written as 5_972e21 then perhaps yes, it could be stored as an int.

pf_moore · February 6, 2023, 11:31am

So what’s “a little harder” with the version using 5_972 * 10 ** 21 then? I feel like I’m missing your point/