Python re lookahead don't ignore endpos

I am not sure if this is a bug or an intent and badly documented feature.

I am using \b for simplicity, but the same applies to the full lookahead/behind expressions.

import re

pattern = re.compile(r'\b\w+\b') # matches full words

assert pattern.fullmatch("abc")
assert pattern.fullmatch("abc", pos=0, endpos=3)
assert pattern.fullmatch(" abc ", pos=1, endpos=4)
assert not pattern.fullmatch("xabc ", pos=1, endpos=4)
assert not pattern.fullmatch(" abcx", pos=1, endpos=4)

The problem in my eyes is the last one: It shouldn’t match because there is an x after abc, but it seems \b and the lookahead machinery in general doesn’t look beyond endpos. This is in conflict with the behavior of pos as can be seen from the "xabc " example.

Is this is intended? And if not, is this fixable, or are we at the point “too much code might rely on this behavior”?

This seems pretty clearly documented here?

1 Like

Aha, right, missed the last sentence. I still find it confusing behavior, but I guess it matches the docs, so :person_shrugging: