Make int iterable returning itself?

Consider

tuple('a') # tuple ('a',)
('a',) # tuple
tuple(1) # TypeError: ‘int’ object is not iterable
(1,) # tuple
(1)  # 1 (expresion)
[1]  # list
list(1) # TypeError: ‘int’ object is not iterable
list('a') # list ['a']

Is there a reason why tuple() and list() accept iterable only?
Why not to test if parameter isinstance of Iterable and return one item tuple/list instead?

Or why a character (scalar/singlechar string) is iterable and scalar int not?

Make int iterable:

class IterableInt(int):
    def __iter__(self):
        yield int(self)  # or: yield self

i = IterableInt(7)
list(i)     # [7]
tuple(i)    # (7,)
sum(i)      # 7
for x in i:
  print(x)  # 7

The idea came from filling data for api request from any user input:

month = 8 # or None or (7,8)
request = {
  "month": [f"{m:02}" for m in ((month,) if month else range(1, 13))],
                        # Ideally: … in (month if month else …
# [f"{m:02}" for m in (month if isinstance(month, Iterable) else (month,) if …
}
# {"month: ['08']}

Is there a reason why int is not iterable returning itself? In contrast to iterable single character…
Thank you in advance.

A string can conceptually be said to be a sequence of characters, so iterability makes an amount of sense. What is an integer a sequence of?

You’re able to use a custom type as you illustrate to do this, but it won’t be happening in int itself.

A

5 Likes

A string is a sequence of characters, an integer is not a sequence.

Python doesn’t have a character type, it uses a string of length 1 instead, and that’s iterable.

3 Likes

In mathematics, the Peano axioms can be interpreted as defining the (unsigned) natural numbers as sets or sequences of the smaller numbers. 0 == , 1 == [0] == [], 2 == [0,1] == [, []], …, and this remains the standard definition of numbers in set theory. In Python, this would be n == range(n). Guido rejected the proposal that iter(n) return iter(range(n)) as too obscure for non-mathematicians. Better to spell it out. (-n would be equivalent to range(0, -n, -1)).

7 Likes

A vector (tuple) can conceptually be said to be a sequence of numbers, so iterability makes an amount of sense. What is a character a sequence of?

Why so “pedantically” distinguish between ‘one’ and ‘more’/‘many’?

Iterating over many items return many items, iterating over one item should return one item…

1 Like

Thank you for the explanation, I’ve studied math…
My suggestion is that iter(n) should return n in sense iter((n,)).
Iterating over single item shuld return the one item one time.

You have described the state.
But why is it this way?
What could happen when iterating over integer (number generally) will return that number in the sense as iterating over single character?
Do we really need so strictly distinguish between n and (n,)?
“When there is no character type for single character, why is there integer type? We could have operate only with single items tuples (n,)?”

A one-character string is iterable because it is also a sequence: all sequences are iterable, including one-element sequences. Whereas single integers (or numbers) are not sequences. I think you could also think of it this way.

3 Likes

Yes, that distinction saves countless hours of frustration.

It’s this way because a mistake was made[1] with the design of strings in Python, and nobody would wish it upon another built-in type.

The fact that there is no character type causes bugs. Here’s one example:

def greet(names):
    for name in names: 
        print(f"Hello {name}!")

greet("Hans")
# Hello H!
# Hello a!
# Hello n!
# Hello s!

And here’s another:

from collections.abc import Container

def lookup_path(data, path):
    cursor = data
    for key in path:
        if not isinstance(cursor, Container):
            raise LookupError("bad path, got scalar")
       cursor = cursor[key]
    return cursor

# good, traverse some structure
lookup_path([["tada"]], [0, 0])  # "tada"

# good, errors
lookup_path({"nested": 100}, ["nested", 0])

# questionable at best
lookup_path("tada", [1])  # "a"

The fact that Python doesn’t have a character type is nice for the very earliest beginners, harmful and dangerous to them as their knowledge grows, and burdensome for proficient users of the language.

Why that definition rather than the countless other viable definitions? Because in one particular usage it would save you a conditional?

A good feature should generalize well to many use cases.


  1. I consider it a serious mistake, anyway. And I know some others do. I’m actually unsure what the broad consensus is, since the discussions of it tend to be centered on type checking and that limits the participant pool. ↩︎

3 Likes

Your examples illustrate the issues with string being a sequence. You need to iterate over it twice:

def f(strings):
    for string in strings:
        print(f"Characters of {string!r}: {[ord(c) for c in string]}")

f(["foo", "bar"])
# Characters of 'foo': [102, 111, 111]
# Characters of 'bar': [98, 97, 114]
f("abc") # should raise an error
# Characters of 'a': [97]
# Characters of 'b': [98]
# Characters of 'c': [99]
lookup_path("tada", [1, 0])  # "a"
1 Like

Yeah, to your point, the fact that strings contain themselves allows you to index or iterate on them infinitely deeply.

I almost included an example like yours, e.g.

lookup_path("tada", [1, 0, 0, 0, 0])  # "a"

I thought it might be missing the fact that being iterable at all exposes some sharp corners. But maybe I focused too much on that and not enough on how it gets weird.

These examples of functions showing the potential problems with strings being sequences are all untyped and don’t use any checks of any kind: this kind of code would never be deployed in a real world situation, and I speak as a developer, e.g. if greet were typed list[str] and mypy were used, then you would catch the bad inputs.

It is interesting to consider the consequences of strings being sequences in Python (and therefore self-iterable), but my guess is that to most users such issues must appear as a matter of pure academic interest only. There’s a middle-ground approach that users can take, which is to take the language as a given, and work with how it is.

1 Like

I find this to be an extremely narrow view in many ways.

I’ll take your premise that all production Python code is type annotated as a given – it is false, but even accepting it does no good.

def greet(names: typing.Sequence[str]) -> None:
    for name in names: 
        print(f"Hello {name}!")

Same problem. Because str is a Sequence. It is an Iterable. It is a Container.

And if you don’t think this is an issue, then why would pytype have made the judgement call to treat str as not being a subtype of Sequence[str] (even though it is)?

You can work around it, I’m not saying you can’t. Most typically I reach for:

def greet(names: typing.Sequence[str]) -> None:
    if isinstance(names, str):
        names = [names]
    for name in names: 
        print(f"Hello {name}!")

That “solves it”, but you need to know that there is a problem to be solved here.

It requires expert knowledge to construct a sequence type containing strings and exclude strings. You can see how this is handled in useful_types as well:


Also, lookup_path was simplified from some strictly typed code I was writing just days ago using tomlkit for the first time.

Simplified, it looks something like this:

def read_toml_value_from_path(file_path: pathlib.Path, *path: str | int) -> object:
    with pyproject_path.open("rb") as fp:
        data = tomlkit.load(fp)

    cursor: tomlkit.items.Item | tomlkit.TOMLDocument = data
    for subkey in path:
        if not isinstance(
            cursor, collections.abc.Container
        ):
            raise LookupError(
                f"Could not lookup '{path}'. "
                "Terminated in a non-container type."
            )
        cursor = cursor[subkey]

    return cursor

Do you see the bug? It’s obvious after I’ve made such a big deal about str being a Container, but it wasn’t obvious to me – even with many years of experience writing production applications – until my tests for this function failed.

2 Likes

But there is no reason to do that. If you want a single-item tuple, add the comma that makes one.

>>> a, *b = 3,
>>> a
3
>>> b
[]
1 Like

This is probably becoming peripheral to the OP’s post, and it’s not my intention to distract from the OP’s question. But, firstly, I am not at all dismissing your point. It’s an important point, and I agree that strings being iterable over their elements can cause serious problems. I can’t think of a recent example from my work from memory, but I’m sure I have bumped into it at some point.

For your greet function case, I would follow the kind of option you’ve opted for: an appropriate type hint (preferably narrow), and checks within the function. Yes, if I were faced with this function, I would be aware of the problem: it would come up during writing test cases. I mentioned the checks in my original reply, and wasn’t suggesting that type hinting alone would fix the bug. It’s the way the language works, which is fine with me, but therefore checks / validations are essential.

For your second example, I am not that familiar with TOML parsing and the problem domain, but at a glance it seems the bug can be caused by path being a single string: as far as I know TOML permits dotted keys, and you’re not splitting `path` on ”.”` in case of a string, which I suppose could a check before the loop.

The TOML example is generally about how you need to exercise extra care when you traverse structured data. If you don’t, you’ll accidentally traverse the strings inside of it, which you probably meant to treat as scalars.

The buggy case in question is

lookup_path("foo.toml", "a", "b", 0)

on

# foo.toml
[a]
b = "hi"

which gives you "h" rather than an error.

If we had a proper character type, strings might still be iterable, so it’s not like this just “goes away”, but we’d have a clearer paradigm for reasoning about what string iteration is and does.

This is all relevant to the idea of making integers iterable, in that doing so gives int some of the same footguns. OP asked “why isn’t int iterable?”, and my answer is “because that would be bad, here’s why.”

1 Like

This is very much off topic, so I’ll say this and then stop. The issue isn’t so much that there’s no character type (which means that strings are containers of strings) but more fundamentally, that strings are both a fundamental data type and a container. That’s a conceptual problem, not a technical one, and while Python’s solution isn’t wonderful, I don’t think there’s any other solution that’s going to be any better. Suppose we did have a character type (and let’s say c"a" was a character literal). Then we still have a problem, because we can’t distinguish between "abc" and [c"a", c"b", c"c"] without a data structure schema, or an explicit type test.

Anyway, if people want to pursue this side discussion, I suggest asking the moderators to split it into its own topic (probably in the “Help” category). Personally, I think we should just let it drop.

5 Likes

Python has a near equivalent to that. There’s no “byte” type, so the type involved is a regular integer, but we have b"abc" and [97, 98, 99] functioning nearly identically. IMO this is a feature, not a bug. If Python had a “character” type, a string WOULD be a sequence of characters, just as a list[character] would be, and the two should be able to be used interchangeably.

1 Like

Agreed, I wasn’t really trying to pull the discussion down this rabbit hole, but since my first post was imprecise about “the problems which arise from strings being a sequence of strings”, we’ve gone a bit OT.

I have various thoughts about “how I would redesign strings” but since they’re massively breaking changes, I don’t think it’s that interesting to explore.

It would probably be useful for the type system to be able to express ‘collection of str/bytes/bytearray’ in such a way that the text/binary sequence types are not themselves permitted. That would presumably be doable without any changes to the runtime?

A