Supporting len() for re.Match objects

I’m writing a little prefix matching function whose signature looks like:

get_prefix_n(s, prefix, offset=0, *, n=None):

whose motivating use case is to match a string like "s02". Because I want some flexibility, prefix can be a str such as "s" or any object with a .match(s,offset) method, such as an re.Pattern. But not necessarily a Pattern.

I was hoping for some agnosticism about the non-str case and thus to minimise the requirements on the object to just .match(). But of course to parse the numeric value after the prefix I need to know where the match result ends.

For an re.Pattern I can of course measure len(m.group()) or use m.end(), but that involves more requirements on the prefix object.

What I’d hoped to find was that len(m) gave len(m.group()). I realise there would be some tension there between len() being characters and the existing m[group_index] support.

I’m going to go with requiring a .end() method giving the offset beyond the prefix, which feels less opinionated than .group(), which implies more about the semantics of the match result than “it ended here”.

Thoughts?

Cheers,
Cameron Simpson

Do you mean re.Match instead of re.Pattern?

Yes, I meant len() on the re.Match result object (or whatever I get back from the prefix.match() call). I can see that wasn’t clear.

I corrected the title because I believe you meant re.Match and not re.Pattern.

It was discussed before. Iteration, indexing, and getting the length are related. It is expected, that if the object m has length and can be indexed, integers from 0 to len(m)-1 are valid integers, and iteration gives values m[0]m[len(m)-1].

In the case of the match object this conflict with defining len(m) as len(m.group()). There were conflicting proposition of defining len(m) as len(m.group()) or len(m.group())+1 and making iteration producing values m.groups()[0] to m.groups()[len(m.group())-1] or m[0] to m[len(m.group())]. Expectation of some people would not match with expectation of other people. To avoid ambiguity, iteration was explicitly forbidden, and support of len() was not implemented.

3 Likes

Aye, I was coming to the same conclusion myself, almost as I typed my
own sentence about this tension between len and getitem/iteration.

Thank you for the background.

My current spec requires a .match() for my prefix object and a
.end() for the resulting match object, which feels a lot more concrete
and specific anyway.

Cheers,
Cameron Simpson cs@cskk.id.au