I’m writing a little prefix matching function whose signature looks like:
get_prefix_n(s, prefix, offset=0, *, n=None):
whose motivating use case is to match a string like "s02". Because I want some flexibility, prefix can be a str such as "s" or any object with a .match(s,offset) method, such as an re.Pattern. But not necessarily a Pattern.
I was hoping for some agnosticism about the non-str case and thus to minimise the requirements on the object to just .match(). But of course to parse the numeric value after the prefix I need to know where the match result ends.
For an re.Pattern I can of course measure len(m.group()) or use m.end(), but that involves more requirements on the prefix object.
What I’d hoped to find was that len(m) gave len(m.group()). I realise there would be some tension there between len() being characters and the existing m[group_index] support.
I’m going to go with requiring a .end() method giving the offset beyond the prefix, which feels less opinionated than .group(), which implies more about the semantics of the match result than “it ended here”.
I corrected the title because I believe you meant re.Match and not re.Pattern.
It was discussed before. Iteration, indexing, and getting the length are related. It is expected, that if the object m has length and can be indexed, integers from 0 to len(m)-1 are valid integers, and iteration gives values m[0] … m[len(m)-1].
In the case of the match object this conflict with defining len(m) as len(m.group()). There were conflicting proposition of defining len(m) as len(m.group()) or len(m.group())+1 and making iteration producing values m.groups()[0] to m.groups()[len(m.group())-1] or m[0] to m[len(m.group())]. Expectation of some people would not match with expectation of other people. To avoid ambiguity, iteration was explicitly forbidden, and support of len() was not implemented.
Aye, I was coming to the same conclusion myself, almost as I typed my
own sentence about this tension between len and getitem/iteration.
Thank you for the background.
My current spec requires a .match() for my prefix object and a .end() for the resulting match object, which feels a lot more concrete
and specific anyway.