Supporting len() for re.Match objects

cameron · February 16, 2023, 10:51pm

I’m writing a little prefix matching function whose signature looks like:

get_prefix_n(s, prefix, offset=0, *, n=None):

whose motivating use case is to match a string like "s02". Because I want some flexibility, prefix can be a str such as "s" or any object with a .match(s,offset) method, such as an re.Pattern. But not necessarily a Pattern.

I was hoping for some agnosticism about the non-str case and thus to minimise the requirements on the object to just .match(). But of course to parse the numeric value after the prefix I need to know where the match result ends.

For an re.Pattern I can of course measure len(m.group()) or use m.end(), but that involves more requirements on the prefix object.

What I’d hoped to find was that len(m) gave len(m.group()). I realise there would be some tension there between len() being characters and the existing m[group_index] support.

I’m going to go with requiring a .end() method giving the offset beyond the prefix, which feels less opinionated than .group(), which implies more about the semantics of the match result than “it ended here”.

Thoughts?

Cheers,
Cameron Simpson

Jelle · February 17, 2023, 1:57am

Do you mean re.Match instead of re.Pattern?

cameron · February 17, 2023, 2:05am

Yes, I meant len() on the re.Match result object (or whatever I get back from the prefix.match() call). I can see that wasn’t clear.

storchaka · February 17, 2023, 7:33am

I corrected the title because I believe you meant re.Match and not re.Pattern.

It was discussed before. Iteration, indexing, and getting the length are related. It is expected, that if the object m has length and can be indexed, integers from 0 to len(m)-1 are valid integers, and iteration gives values m[0] … m[len(m)-1].

In the case of the match object this conflict with defining len(m) as len(m.group()). There were conflicting proposition of defining len(m) as len(m.group()) or len(m.group())+1 and making iteration producing values m.groups()[0] to m.groups()[len(m.group())-1] or m[0] to m[len(m.group())]. Expectation of some people would not match with expectation of other people. To avoid ambiguity, iteration was explicitly forbidden, and support of len() was not implemented.

cameron · February 17, 2023, 9:45am

Aye, I was coming to the same conclusion myself, almost as I typed my
own sentence about this tension between len and getitem/iteration.

Thank you for the background.

My current spec requires a .match() for my prefix object and a
.end() for the resulting match object, which feels a lot more concrete
and specific anyway.

Cheers,
Cameron Simpson cs@cskk.id.au

Topic		Replies	Views
Regular Expressions (RE) Module - Search and Match Comparison Python Help	9	2184	October 26, 2023
Module re: add possibility to return matches during substitution Python Help	5	181	April 12, 2024
Get `if re.match(...)` result in a comprehension without assignment expression? Python Help help	3	280	June 5, 2023
Need help with my code (Regular Expression) Python Help	3	332	April 4, 2022
Using {} regex within re.sub when using f-string Python Help	2	183	March 20, 2024

Supporting len() for re.Match objects

Related Topics