Guarantees on sum() for non-numeric types

The documentation of sum() at Built-in Functions — Python 3.10.8 documentation reads:

“Sums start and the items of an iterable from left to right and returns the total. The iterable’s items are normally numbers, and the start value is not allowed to be a string.”

However, its docstring has a much stronger wording on summing values of non-numeric types. pydoc sum gives

“This function is intended specifically for use with numeric values and may reject non-numeric types.”

(and doesn’t mention the restriction that the start value mustn’t be a string, BTW).

In the code, I see that summing an iterable of lists has quadratic behavior (cpython/bltinmodule.c at main · python/cpython · GitHub). It is thus a bad thing to use sum() on iterables of lists in general. My question, though, is: what compatibility guarantees can be expected for it? Should I dig into my code bases for usage of this idiom because it might break in the next version of Python? (I suspect not, but I also suspect some people have irrational fears about Python backwards compatibility due to Python 3.) Should I expect that it might be deprecated in the future (with a deprecation warning during a period of a few releases)? Or does the warning perhaps mean that it might not work in alternate implementations? In general, could the documentation be brought in line with the docstring or the opposite? (I can easily open a PR for that but I would need to know what should be documented first.)

I think the wording of both docs and docstring are clear - you should use sum() for numeric values. If you use anything else, it might work, but it’s not intended use, so you accept that risk. It may simply perform badly, or it may error. Or it may work fine, of course…

More explicitly, if you use sum() with non-numeric types, you may get an error. Whether you do or not isn’t specified, so I’d say yes, it could change between Python versions. (Although in reality, it’s not likely to without good reason, as we take backward compatibility seriously).

Why bother? If you have (presumably accidentally) used sum() for non-numeric types, just fix it when it breaks. It doesn’t seem worth the effort to scour your code for things that might fail (of which this is likely only one possibility). But I guess it’s up to you…

1 Like

I think that:

  • The status quo for sum won’t change without plenty of advance notice, including a deprecation period; this is just Python’s standard policy of not breaking backwards compatibility without good reason.

  • We should continue to discourage but not prohibit use of sum with non-numeric values such as lists;

  • such uses are occasionally useful, especially in the interactive REPL, where efficiency is not such a high concern and for small enough N, O( N^2 ) is fast enough;

  • only summing strings is likely to be problematic in practice, which is why sum special cases strings and prohibits them;

  • but even that can be worked around if you try. (But apart from doing it to show off, why would you? Just use ''.join().)

I don’t think anything here needs to change. Both the main docs and the docstring say the same thing, just in slightly different ways.

I don’t think you should remove sum(bunch_of_lists) because it might break in the future. It probably won’t break. But you should consider replacing it because it is probably slow.

I suppose that other implementations might choose to diverge from CPython’s behaviour here. It is unlikely though.

Agreed with what was already said. We should be careful not to rule out summing data structures that can be summed efficiently while also acting like sequences (e.g. certain trees).