Let int accept scientific notation strings

I guess “a little less readable” is the point. You said:

and I absolutely agree that the difference between 10**23 and 1e23 is close enough to ignore, but the difference between 5972 * 10 ** 21 and 5972e21 is a bit more notable. Having supurious digits in there that are neither part of the mantissa nor part of the exponent (the 10 that simply defines the base) isn’t helpful.

BTW: If the letter e is a problem, I wouldn’t be averse to using some completely different notation when dealing with large integers, even if it’s something that looks like an operator.

OK, thanks for clarifying. I still say it’s subjective, and the e-notation says “float” to me too strongly to work for me, but whatever. Everyone has opinions :slightly_smiling_face:

3 Likes

Yeah, definitely subjective.

If 1e23 is strongly a float, why -10**23 is strongly an integer? And why 10.0**23 is strongly a float?

The answer is 1 and -10 are integers, 23 is a positive integer and 10.0 is a float.

So 1e23 is an integer.

Hmm, I’m thinking about “do the right thing” and “precision” and coming up with int("1e2"), int("1.0e2"), and int("1.00e2") should all equal 100; but int("1.000e2"), should give an error.

Why not a warning? X-D

Speaking seriously, in Physics is perfectly fine to write 1.000e2. It means it’s 100 with an accuracy to the third decimal digit. Why it should be an error?

that accuracy you mention cannot then be addressed by the int? (Question mark because I’m skating on the thin ice of my knowledge here, but it seems contradictory to specify a precision that cannot be specified in ints, whilst asking for an int conversion).

P.p.s. I remember the last time in a forum where mathematicians laid in to the maths butchery they say Physicists perform - I did a Physics degree, but I think I will now duck :slight_smile:

The question is wrong. I didn’t say you must have three digit precision with 1.000e2. I said that I don’t know why int("1.000e2", sci=True) should be considered an error.

PS: mathematicians are three points (quote)

I think this has gotten a bit off-track – the OP was asking for a string representation using “e notation” to be parsed by int, e.g.:

int("123e10")

In that case, it’s using the int, so there is no ambiguity about what type is wanted, and there is also not the option of using an expression or function (other than one that takes a string).

The example provided was command line arguments. I also have fairly common use for input from text files, or a Web API, or … so I think it would be useful (we science geeks do often use “e notation” for large numbers that may very well be ints). The point is that the person typing into the command line may not know whether it’s supposed to be an int or float. In practice, I’ve generally used int(float(a_string)), which works fine, but does have its limitations.

Ok – a bit more, and hopefully narrowing down the discussion here:

As far as I can tell, the OP hasn’t posed since the initial request. So who knows what they think now? But they did ask for int to accept e notation as a string, not as a literal, with the example of command line user input as the example use case.

But this caught my eye for two reasons:

  1. I Do fairly often want to input large integers in “e-notation”[*] – honestly, they are usually converted to float anyway, but not always.

  2. As it happens, right before this thread was started, I had proposed a polymorphic function for rounding to significant figures:

def sigfigs(x, n):
    return type(x)(f"{x:.{n}g}")

[what the heck is discourse doing with that colorization?]

This works for float and Decimal (and will work for Fraction in the next Python version) but not for int, because the int string parser does not accept e-notation. And it could work for any type that defines g formation and can parse e-notation.

Is this an important use-case? maybe not, but that’s why this caught my attention. Also note that in this use case, having a separate flag or function wouldn’t help. Granted, there are only a few numeric types built in to Python, so it wouldn’t be a big deal to special case them all (or only int) but still, I like the simplicity of that version. (I could also add another trip through Decimal to make it work for int, but again, not so simple)

So here’s a specific tiered proposal:

1st choice:

int() parses an e-formatted string, and:

  • if the result is an integer, creates an integer with that value
  • if the result is not an integer, raises a ValueError
  • if base > 14 (I think) – raises a ValueError (or base !=10)
  • if it can’t be properly parsed, raises a ValueError (of course)

2nd choice:

If the above is too much overhead to do for every int string parsing, OR it’s decided that it’s too backward incompatible, then:

  • have a boolean flag, sci that activates all of the above.

Note on backward compatibility – I think it’s not a big deal, but as it is an idiom to do:

try:
    value = int(input_string)
except ValueError:
    print(f"input{input_string}: not valid, must be an integer")

Maybe we shouldn’t change the behavior of default string parsing. Personally, I’d love to see e-notation NOT get rejected by this code – why I’m supporting this in the first place, but others may not agree.

3rd choice:

One of the above two, but accept only:

[an_integer]e[a_posative_integer]

Frankly, I don’t see the need for this restriction, except maybe to make parsing easier – while it’s been said in this thread that e-notation is inherently for float – I think that’s only the perspective of programmers that think about type – which is not what the use cases presented so far are about. In fact, the g and e formatters don’t follow that convention, so my use case two would be dead if that restriction were in place anyway.

-CHB

[*] if you’re curious, for our oil spill model, oceanic turbulence generally has a diffusion coefficient of around 1^5 cm^2/s – so 1e5 is a lot easier to read and write than 100000 or 10_000. And the values are always of that magnitude, so no need for a float.

1st and 3rd choice are not good. The first one it’s too slow. The 3rd is backward incompatible.

I think the sci flag (2nd choice) is the more practical solution.

Warning: sci does not require to be mutually exclusive with base. Mathematically, you can have a scientific notation in whatever base you choice. sci is False by default, so it’s perfectly retrocompatible.

If you restrict sci to be used only for base 10, a flag is useless, it’s better a new function.

PS: 4th choice: make float and int the same object.
PPS: what about a thread in the Help section to not spam devs? Can a moderator move the “off topic” posts?

That’s a perfectly fine way as well and probably even better, since it removes the need to add context to those numbers.

I think you’re a bit off topic – this is NOT about literals, nor is it anyway related to making a new type.

As to too slow – maybe, but I doubt it – successfully parsing a regular old int string would be exactly the same, and can’t imagine the overhead on adding some more to the parsing of e-notation strings would be that large, or that parsing int strings would be a bottleneck in any code. We’d never know without some analysis and experimentation, but it can’t be huge (assuming it would be written in C). Remember, this is NOT about literals.

No, I was referring to int("nEm").

I misunderstood, since I interpreted result as result of calcula, not result of parsing.
Anyway, it’s not retrocompatible. int("nEm") now throws you an exception, and if you change this behavior, potentially you can break code. This is what Terry said.

even worse (though fundamentally the same problem)

In [23]: int(float("1e23")) == 10**23
Out[23]: False

passing through float to get a large integer is a bad idea.

Yes, and so did I. But there’s backward incompatible, and there’s backward incompatible – in this case, certain values currently raise an Exception – those would no longer raise.

We know that using int(a_string) is a common idiom to validate input, so some code would certainly change its behavior – but in many cases, things would “just work” that didn’t before, rather than “stop working” when it used to work. The question is, are there cases where folks want e-notation to fail. I suspect not often. but there’s no way to know without polling everyone in the world :frowning:

There’s a little problem. While I was doing some playing, I discovered that

>>> int("1.0")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '1.0'

while

>>> int("1")
1

So it seems it’s invalid to convert a string that represent a float to an integer with int(), even if the float is clearly an integer. I think “this special case is not special enough”.

Yes. this proposal is to change the behavior. If folks think it’s more consistent, you could certainly limit it to [an_int]E[an_int] – which was my option 3 (I think). However, that would mean that result of the “e” and “g” formatters wouldn’t be legal:

In [27]: f"{123_000_000:.3e}"
Out[27]: '1.230e+08'

The fact is that it is the very nature of scientific notation to use no-integers for the mantissa.

I understood know what you meant. 1.0e2 is 100, but 1.000e2 is 100.0.

IMHO there’s no difference. For a human being, 100.0 is an integer. This should be true also for a machine, but the world is not perfect. "e"float is much simpler.

You can always write m * 10 ** n. Less elegant than mEn but effective.

I don’t think it will be ever accepted… :slight_smile:

That won’t work for int(input())