Why int is unbounded?

I know that int is unbounded: you can write a very large positive and negative integer, it’s only bounded by the resources of your machine. On the contrary, float is not. AFAIK is a wrapper around C double.

The question is: why int was not designed as a wrapper of C long too?

My guess is that in this way, sequences size are virtually infinite.

Python originally had 2 integer types, int and long. int was bounded, long wasn’t.

It used to be the case that int would raise an exception when calculations using it overflowed, but then that was dropped in place of automatic promotion to long, so the distinction between them became blurred.

One of the changes that occurred during the cleanup that happened during the switch from Python 2 to Python 3 was that int was removed and long was renamed to int.

2 Likes

See PEP-237 for more details on this change (found via the Python 3.0 release notes).

1 Like

Ty. May I suppose the strongest motivation of dropping “short” ints was because of this?

Having the machine word size exposed to the language hinders portability. For examples Python source files and .pyc’s are not portable between 32-bit and 64-bit machines because of this.

I suppose that limiting “short” ints to 32 bit everywhere was not doable?

Restricting to 32 bit everywhere would be possible, but to what benefit? Ideally, the user shouldn’t have to worry about how the basic integer type is implemented and just do the calculations they need. Beyond syntax support, there’s nothing actually special about the int type; a short type could be implemented just fine in a third-party library (which I believe numpy actually does).

1 Like

C originally had 4 integer sizes: char, short, int and long. The only stipulation was that sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long).

I remember a time when sizeof(short) was 2 and sizeof(long) was 4.

There are, broadly speaking, two reasons to want a short integer type:

  1. You specifically want wrap-around semantics - that is, all arithmetic is automatically performed modulo some power of two
  2. Performance.

The first one is definitely of value (for example, try reimplementing a hashing algorithm like SHA256) but is quite rare, and should definitely not be the default. It’s also probably fine if it’s implemented in userspace.

The second, though, is a consideration that should not affect userspace at all. The best way to handle this is completely invisibly. There are ways to make it such that (say) all integers between -2**62 and 2**62 are stored “unboxed”, without making any change to the way you actually write your code - other than that it’s faster and uses less memory. PyPy does a variety of tricks like this. I’m not sure if it’s something that CPython has planned or not, but the beauty is, since it’s an optimization with no semantic impact, it can be done in any version without backward compatibility concerns.

Unifying the int and long types was without question a good move. Writing code in Python does not require you to think about how big your integers are and what CPU you’re running on. But I would love to see the invisible optimization done.

1 Like