`None` as index

It is a little known fact that str, bytes and bytearray methods that take index arguments start and end (find(), rfind(), index(), rindex(), count(), startswith(), endswith()) accept also None. It is not explicitly documented.

.. method:: str.find(sub[, start[, end]])

   Return the lowest index in the string where substring *sub* is found within
   the slice ``s[start:end]``.  Optional arguments *start* and *end* are
   interpreted as in slice notation.  Return ``-1`` if *sub* is not found.

Yes, it says “as in slice notation”, and None is accepted in slices (also not well known fact), but it is not obvious that analogy goes so far. The documentation of startswith() is even more subtle:

.. method:: str.startswith(prefix[, start[, end]])

   Return ``True`` if string starts with the *prefix*, otherwise return ``False``.
   *prefix* can also be a tuple of prefixes to look for.  With optional *start*,
   test string beginning at that position.  With optional *end*, stop comparing
   string at that position.

Nothing starts or stops at the None position (until you specify the meaning of None in this context).

To add to confusion, list.index() and other methods of other sequences do not accept None.

I do not know whether the support of None was added intentionally, or just a side effect of using the same _PyEval_SliceIndex() helper which was used in the slice notation and the slice() constructor (I later added the _PyEval_SliceIndexNotNone() helper for use in Argument Clinic when None is not accepted). In any case such inconsistency is confusing. I see two variants:

  1. Explicitly document that None is accepted as index where it is accepted. In particularly, writing the signature as str.find(sub, start=None, end=None, /) instead of str.find(sub[, start[, end]]) will help. In future we can add supprt of None in list.index(), re.Pattern.match(), and other places.
  2. Deprecate support of the None index in these methods. It will still be valid in slice notation and slice(). As undocumented, this feature may be very little used. Signatures will need support of multi-signatures.
1 Like

IMO there’s a 3rd option, which is to just leave things as they are. None as “the thing you can use as an explicit form of omitting an optional argument” is a common convention, and it might be sometimes useful to be able to do this - but not enough to warrant explicitly documenting it.

I don’t think there’s much to choose between any of the 3 options, so for me status quo wins. Is there an actual problem being caused by this behaviour? If there is, that might provide a reason for perferring a different option.

5 Likes

As a user, the only reason to document this is to make it clearer that I can do something like

for start, end in zip([None, 5, 10], [5, 10, None]):
    print(my_str.find(substr, start, end))

And I might otherwise think I need to write some if start is None, etc clauses to deal with each situation.

2 Likes

This confuses me in multiple ways. I will have to ask a question in Help.

IMO, changing the signature in the documentation from [, end] to , end=None would make things a bit more correct, with virtually no downsides.

As James pointed out, allowing an explicit value that works the same as a “missing argument” is very convenient. In addition to loops, it’s quite useful for writing wrappers and adapters – things like:

    def find_needle(self, haystack, start=None, end=None):
        return haystack.find(self.needle, start, end)

I asked, because I currently work on multi-signatures support, which allow to express signatures of all builtins which previously waere not parsabe. find(sub[, start[, end]]) is not parsable, it can be written as a union of signatures:

find(sub, /)
find(sub, start, /)
find(sub, start, end, /)

But since both start and end accept None as the default value, it does not even need multi-signatures support:

find(sub, start=None, end=None, /)

If None is officially acceptable. If it is just an artifact of implementation, it can only be expressed with multi-sigature.

So what is it? An official feature or an artifact of implementation?

Since e.g. str.find() refer to slice(), and slice() documents the None behaviour, I would argue it’s an official feature. I have relied on it many times.

Why not just go all the way to find(sub, start=None, end=None)?[1] These are AC functions, which means making them support names is trivial (whereas when we started these were doing manual argument parsing and so it was not trivial), and “sub”, “start” and “end” are meaningful names.

FTR, I don’t have a problem with going to =None in this case (range is still special enough to leave alone).


  1. That is, delete the / that makes them position only. ↩︎

4 Likes