Strange behavior when using String.startswith() with an empty string and a big start

luc10921 · September 29, 2021, 6:52pm

When using 'test'.startswith(''), it came out as True, I was under the impression that startswith() was doing something like this: string[start:end] == value, but whenever start is equal or greater than the length of the string it results in False. So, I thought that values that would result in out of bounds in other languages would all return False, but 'test'.startswith('', 0, 99), 'test'.startswith('', -99, 99) and 'test'.startswith('', -99, -99) return True.

So my question is: what is going on with string.startswith('', len(string))?

tjol · September 29, 2021, 7:53pm

Hi Lucas,

Interesting behaviour.

What’s going on here internally is that the indices are first fixed up such that if start points somewhere before the start of the string, it’s treated as 0, and if end points somewhere beyond the end of the string, it’s treated as the length of the string. Then, the function checks that start is less than or equal to the length of the string.

I don’t know why this particular behaviour was chosen, but I did discover a couple things that might be interesting:

in Python 2, unicode behaved differently (this was fixed in Python 3)

Python 2.7.18 (default, Apr 23 2020, 09:27:04) [GCC] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 'test'.startswith('', 4)
True
>>> 'test'.startswith('', 5)
False
>>> u'test'.startswith(u'', 4)
True
>>> u'test'.startswith(u'', 5)
True
>>>

Python 3.8.12 (default, Aug 31 2021, 01:23:42) [GCC] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> b'test'.startswith(b'', 4)
True
>>> b'test'.startswith(b'', 5)
False
>>> 'test'.startswith('', 4)
True
>>> 'test'.startswith('', 5)
False
>>>

I found this comment in the Python 2.2 code (it had gotten lost by 2.3)

	/* adopt Java semantics for index out of range.  it is legal for
	 * offset to be == plen, but this only returns true if prefix is
	 * the empty string.
	 */

Make of that what you will.

aivarpaalberg · September 29, 2021, 8:21pm

Python documentation: Documentation > The Python Language Reference > 6. Expressions > 6.10. Comparisons > 6.10.2 Membership test operations

Empty strings are always considered to be a substring of any other string, so "" in"abc" will return True .

For example - what should zero length slice of string return?

>>> 'test'[0:0]
''

steven.daprano · September 29, 2021, 11:05pm

Here is the output of help(str.startswith):

S.startswith(prefix[, start[, end]]) -> bool

Return True if S starts with the specified prefix, False otherwise.
With optional start, test S beginning at that position.
With optional end, stop comparing S at that position.
prefix can also be a tuple of strings to try.

So when you provide start and/or end, it is equivalent to
taking a slice of the string, except without needing to make a
copy first:

"Hello world".startswith("o", 4, 8)
# like "Hello world"[4:8].startswith("o")
# or "o wo".startswith("o")

Except that no actual copy of the slice needs to be made.

Then the actual startswith(“o”) comparison is equivalent to another
slice:

"o wo".startswith("o")
# equivalent to "o wo"[0:len("o") == "o"
# or "o" == "o" which is True

except, again, no actual slice is made.

If the prefix was bigger:

"Hello world".startswith("o world", 4, 8)
-> "o wo".startswith("o world")
-> "o wo"[0:7] == "o world"
-> "o wo" == "o world"
-> return False

If the prefix is smaller:

"Hello world".startswith("", 4, 8)
-> "o wo".startswith("")
-> "o wo"[0:0] == ""
-> "" == ""
-> returns True

Remember though that no actual string copies are made.

An untested pure Python implementation might be something like this:

# Warning: I have not tested this.
def startswith(string, prefix, start=0, end=None):
    if end is None:
        end = len(string)
    if len(prefix) > end-start:
        # Prefix is too long to fit in the slice.
        # So it can't be a prefix of the slice.
        return False
    for i in range(len(prefix)):
        if string[start+i] != prefix[i]:
            return False
    return True