Support \a and \z in regular expressions

Stefan2 · February 25, 2025, 11:07am

There are:

\A: Matches only at the start of the string.
\Z: Matches only at the end of the string.

And there already are pairs of “opposites”:

\b and \B
\d and \D
\s and \S
\w and \W

I propose to add \a and \z as opposites of \A and \Z, matching everywhere except at start and end, respectively.

A case where I’d like to use that: Insert a character after every 2 characters in a string (except at the end), turning for example 'aabbccdd' into 'aa-bb-cc-dd'. With \z, it could be done as re.sub(r'(..)\z', r'\1-', s).

JamesParrott · February 25, 2025, 11:37am

Can this idea do anything that "-".join over a generator expression can’t?

str_ = "aabbccdd"; '-'.join(str_[i:i+2] for i in range(0,len(str_),2))

It would help write an implementation using re.split though.

storchaka · February 25, 2025, 11:50am

\a is already supported. It does not mean what you expect.

Negation of \Z can be expressed as (?!\Z).

blhsing · February 26, 2025, 1:58am

In many Python-based frameworks regex search and/or substitution is offered as part of the DSL, but not arbitrary Python code execution.

But as @storchaka already pointed out, the proposed \a and \z can be rather trivially expressed with negative lookarounds (negation of \A can be expressed as (?<!\A)), and given that \a is already interpreted as the bell character, picking another equally terse escape sequence isn’t worth the loss of mnemonic quality of escape sequences IMHO.

MRAB · February 26, 2025, 2:28am

In Perl, \z matches at the end of the string like \Z does in Python’s re, and \Z matches at the end of the string or at a newline that’s at the end of the string, much like $ does when you don’t use the MULTILINE flag, so \z meaning the opposite of \Z could be confusing anyway.

Stefan2 · February 26, 2025, 12:07pm

Looks like I’m 22 years too late then… (Python 2.3 is the first mentioning that.)

I don’t really like (?!\Z), but supporting \z without corresponding \a doesn’t seem right. And it would be backwards compared to the other pairs of opposites anyway, where the upper case version is the negated one. Which would not be nice. Oh well, I guess I’ll use (?!$).