PEP 8 says “For sequences, (strings, lists, tuples), use the fact that empty sequences are false” but doesn’t clarify whether this means explicit len checks are discouraged or not (in particular for len(x) == 0 /len(x) < 1.
I micro-benchmarked and like I expected using bool(x) is faster than using len(x) > 0 but is it actually “more pythonic” ? As it is, reading pep8 does not settle the debate. Our primer shows that the pattern exists in a lot of popular projects, but it’s hard to say if it’s a common oversight or voluntary.
So what is the spirit of PEP 8 ? And should it become more explicit about this particular issue?
Apparently the true spirit of PEP 8 was missed again. It’s contents are just suggestions and you can’t try to treat is as a defining document where every sentence must have a precise meaning or else corrected. The pylint folks need to make up their own minds about this; I don’t want to weigh in and neither should PEP 8.
PS. The sentence you quote clearly implies that you ought to prefer ‘if seq:’ over ‘ if len(seq):’ – but I find it more useful to think about which one is clearer in the context of the code, and the answer varies a lot.
Definitely micro-benchmarks are not the way to decide between the two, unless you have very clear evidence that they are a performance problem in your program.
My view is that if the exact number isn’t important, but only whether it’s non-zero, then asking for the length might seem excessive because you’re asking for more than you really want.
I don’t know if PEP8 intended to differentiate between if len(seq) and if len(seq) > 0 but looks like pylint has been differentiating between them since 2019. It also explicitly allows the latter in the docs:
Empty sequences are considered false in a boolean context. You can either remove the call to 'len' (``if not x``) or compare the length against a scalar (``if len(x) > 1``).
I think it would simply be too disruptive to extend its scope now.
Wow, I thought if container_object:if collection_object was the clear, and Pythonic way to test for a non-empty containercollection type. This seems like a roll-back to me?
Not that I suspect it is extremely common, but it’s not true that all sequence-like objects have this bool behavior. N-D arrays typically look a lot like a sequences (have a len() and allow iteration), but the useful definition of bool() for them is to look at the value and only allow that if they are 0-D.
So “sequence” is a fuzzy term there. Now is code this common? I am not sure you might have to survey scientific projects.
But, someone used to work with (N-D) data may want to add the len() for very good reasons.
Another example is:
import random
random.choice(0)
which would at least raise a less precise error if this was replaced with if not seq:
--> 372 if not len(seq):
373 raise IndexError('Cannot choose from an empty sequence')
(And, as pointed out in the comment, this choice there is explicitly to make it work correctly with N-D array objects. But I think the better error type is also a small reason.)
numpy extends its type to zero dimension arrays that act like scalars and have no len. np.ndarray type is also not part of the std lib.
So an np.ndarray type is both a collection and not a collection type depending on other attributes. Any rule of the type “collections of length zero are False in a boolean context”, could not apply when the ndarray can have a shape of ().
Explicit length checks can be semantically different from a bool() check and I don’t think this is necessarily a matter that should be handled by PEP8.
Called to implement truth value testing and the built-in operation bool(); should return False or True. When this method is not defined, __len__() is called, if it is defined, and the object is considered true if its result is nonzero. If a class defines neither __len__() nor __bool__() (which is true of the object class itself), all its instances are considered true.
The wording of the language reference states that overriding the __bool__() is permitted for sequence types, and that __bool__() and __len__() > 0 are not exactly equivalent operations. Whilst the builtin sequence types all exhibit the __bool__(seq) == (__len__(seq) > 0) behavior, and non-builtin sequences are expected to follow the convention, I can’t find the docs that state this be a hard requirement.
In other words, the semantics of if seq: and if len(seq): are allowed to intentionally differ. To make the bool() behavior non-standard is a decision vested in the API designer. The below is a contrived example to show that the differing bool() and len() behavior is not prohibited by the runtime (The only restriction placed by the runtime on __len__() is 0 <= len() < sys.maxsize), but there are legitimate use cases such as numpy (and probably others) for this.
from sys import maxsize
from collections.abc import Sequence
class Foo[T](Sequence[T]):
__slots__ = ("_num", "_value")
def __init__(self, num: int, value : T = None):
if not 0 <= num < maxsize:
raise IndexError
self._num = num
self._value = value
def __len__(self):
return self._num
def __getitem__(self, index: int):
# if isinstance(indx, slice):
# raise TypeEror("Slicing not supported yet")
if 0 <= index < self._num:
return self._value
raise IndexError
def __contains__(self, obj):
return self._num and obj == self._value
def __iter__(self):
for _ in range(self._num):
yield self._value
# This is not needed, but not prohibited either
def __bool__(self):
return True
foo = Foo(0, "bar")
print(f"foo is an empty sequence of {len(foo)=}")
print(f"and can be converted to an empty {tuple(foo)=}")
print(f"but can still have a wonky {bool(foo)=}")
Output:
foo is an empty sequence of len(foo)=0
and can be converted to an empty tuple(foo)=()
but can still have a wonky bool(foo)=True
PEP 8 tells to write Pythonic code, not just translate code from other languages.
In Java you write a.length != 0 (or a.length > 0) to test that an array a is not empty. You cannot write a.length or a – this does not work in boolean context. In Python you can write len(a) != 0, len(a) > 0 (if len was not overriden), len(a) and just a (if it is a sequence like string, list, tuple, etc). This is an intentional feature of Python. It was specifically added for you to use. Use Python features in Python code.
There are cases when these variants are not equivalent. Then use the appropriate one.
This pylint check is triggered only for classes that are inheriting directly from list, dict, or set and not implementing the __bool__function, or from generators like range or list/dict/set comprehension, so it’s not going to raise false positive on numpy’s array, pandas’s dataframe or something similar.
I now understand that it was dumb to even ask to modify PEP8 and that this is probably not the first time that Guido is bothered with such a question. That being said I’m also bothered by the question at my level and as Guido said pylint have to take a decision to programmatically raise or not. I have an opinion about this, but a community opinions would be more authoritative when telling new issue openers that this particular issue is settled and won’t be discussed further.
So really I should have asked “should pylint raise on len(seq) equivalents like len(seq) < 1, len(seq) > 0, len(seq) != 0, or len(seq) == 0 according to you” ?
Well some want the check only for pure len(seq), it’s possible to add an option so they have what they want and other also have what they want, but then we have to chose the default and we’re back to answering the initial question (or the software is unusable without a configuration).
should pylint raise on len(seq) equivalents like len(seq) < 1, len(seq) > 0, len(seq) != 0, or len(seq) == 0 according to you” ?
I looked through a module in my package that does a lot of low-level manipulation of collections, and most germane objects were of type Sequence | None.
This one is copied for the most part from numpy’s guide to subclassing, where the behavior is the same for an empty list as for None.
A second conditional checked for explicit None where there might be an assumption of a nonzero-length sequence(str), so theoretically they could also be made if param.
The third checked for None when numpy already guarded against length-zero sequences. If it hadn’t guarded, I would have. if axes is not None and not axes reads weirdly IMO, whereas if axes is not None and len(axes) == 0 makes more sense.
So it feels like there’s cases for either.
Beyond that
Most conditional checks in that library are for if len(foo) == 1. I can see how, for readability, it might be nicer to see elif len(foo) == 0.
I use numpy enough that @seberg’s point about how numpy arrays can be sequence-like but have a different meaning for __bool__ matters.
if len(foo)==0 might still have use cases with a walrus operator, i.e. if (len_foo:= len(foo)) == 0
OTOH I recall decades ago we had a bug where there was a database-table object whose length was the table’s record count, and there was code that checked whether the table existed using ‘if table:’ and did the wrong thing when it existed but was empty. So this is really type-specific.
I don’t think anyone really wants to change PEP8. I just think it’s good to realize that this may be opinionated and that opinion may sometimes be wrong. I don’t even have a strong opinion about including it or not beyond maybe the docs mentioning that there are clear reasons to ignore it.
(Similar to except Exception: which is good to lint but also has use-cases where it is needed.)
I personally avoid truthy conditionals unless the variable could be of multiple different types. F.e. if seq really is a sequence, then I prefer if len(seq) == 0:. But if it could be a sequence or None, then if not seq: would be preferable.
I’m sure there’s a variety of opinions on this, and PEP 8 is a little more ambiguious because it is showing the return value of len(seq) still used in a truthy context[1].
So for me, I don’t like the recommendation in the PEP, but I also don’t think it’s worth changing.
i.e. it is not explicitly testing the value of len(seq)↩︎
I don’t use pylint but based on my general experience of using linters my preference is that any rules about this should be disabled by default. I haven’t seen any suggestion that any lint rule here would help with fixing any real issues in code and I prefer all opinionated style rules to be off by default.