Partial string matches in structural pattern matching

This is a very crude proof of concept that can perform custom match logics in a match statement.

Usage example:

match Matcher('Hello, Python!'):
    case StartsWith('Goodbye'):
        print('Farewell')
    case Search(r'Hello, (.*)!') as m:
        print(f'Greetings to {m.match[1]}')

which prints:

Greetings to Python

The basic idea is to make the attribute specified by __match_args__ a dynamic property that returns an object with a custom equality test that performs the desired match logics.

import re

class MatcherBase:
    def __init__(self, string):
        self.string = string

    def __init_subclass__(cls):
        @property
        class matcher:
            __eq__ = cls.__eq__

            def __init__(self, instance):
                self.instance = instance
        setattr(cls, cls.__name__, matcher)
        cls.__match_args__ = cls.__name__,

class StartsWith(MatcherBase):
    def __eq__(self, pattern):
        return self.instance.string.startswith(pattern)

class Search(MatcherBase):
    def __eq__(self, pattern):
        self.instance.match = match = re.search(pattern, self.instance.string)
        return match

class Matcher(StartsWith, Search):
    pass

It still needs features such as support for regex flags, but for easier understanding of the core idea I’ll leave it as is for now.

11 Likes

Support for optional parameters such as regex flags is rather tricky because there’s no easy way to tell which attributes are specified in a class pattern and which are omitted.

But there’s actually a clue! We can take advantage of the fact that only names of attributes specified in a class pattern are looked up in the class–those that aren’t specified are skipped–so we can keep track of the names of the attributes being looked up by adding them to a todo_args set of the Matcher instance when their properties are accessed.

Then when the equality tests of those attributes are performed one by one, don’t perform the desired match logics right away–collect the values of the specified attributes into an args dict of the Matcher instance and simply return True to make the match statement happy, until the judgment time, when it’s got all the attributes from the todo_args set necessary to perform the desired match logics with all the arguments ready:

class MatcherBase:
    def __init__(self, string):
        self.string = string
        self.args = {}
        self.todo_args = set()

    def __init_subclass__(cls):
        for arg in cls.__match_args__:
            @property
            class MatcherProperty:
                name = arg

                def __init__(self, matcher):
                    self.matcher = matcher
                    matcher.todo_args.add(self.name)

                def __eq__(self, value):
                    (matcher := self.matcher).todo_args.remove(self.name)
                    matcher.args[self.name] = value
                    if matcher.todo_args:
                        return True
                    match = cls._match(matcher)
                    matcher.args = {}
                    matcher.todo_args = set()
                    return match

            setattr(cls, arg, MatcherProperty)
class StartsWith(MatcherBase):
    __match_args__ = 'prefix',

    @staticmethod
    def _match(matcher):
        return matcher.string.startswith(matcher.args['prefix'])

class Search(MatcherBase):
    __match_args__ = 'pattern', 'flags'

    @staticmethod
    def _match(matcher):
        matcher.match = match = re.search(
            matcher.args['pattern'],
            matcher.string,
            matcher.args.get('flags', re.NOFLAG)
        )
        return match
    
class Matcher(StartsWith, Search):
    __match_args__ = ()

So that:

match Matcher('Goodbye, Python!'):
    case StartsWith('Goodbye'):
        print('Farewell')
    case Search(r'Hello, (.*)!') as m:
        print(f'Greetings to {m.match[1]}')

match Matcher('Hello, Python!'):
    case StartsWith('Goodbye'):
        print('Farewell')
    case Search(r'Hello, (.*)!') as m:
        print(f'Greetings to {m.match[1]}')

match Matcher('Hello, Python!'):
    case StartsWith('Goodbye'):
        print('Farewell')
    case Search(r'hello, (.*)!', re.IGNORECASE) as m:
        print(f'Greetings to {m.match[1]}')

outputs:

Farewell
Greetings to Python
Greetings to Python
1 Like

Specifically regex flags are a poor example since it’s far simpler to just include them as part of the pattern

Maybe off-topic but wouldn’t the t-strings be ideal for this, with some parse method ?

temp = t"{greeting} {last_name} !"
string = "Hello Python !"

match temp.parse(string):
    case ('Hello', 'Python'):
        print('works as expected')
    case _:
        print('input error')

No, because you are imagining semantics implied by the name “Template” that don’t actually exists. The code snipped would fail at the first line because greeting and last_name are not defined.

(I would go so far as to say that the PEP was a mistake since it didn’t go far enough and is misnamed, but that’s not on-topic here)

2 Likes

The whole idea of a match statement is to improve readability by expressing multiple tests on the same subject in a more declarative way. My demo offers a framework to adapt arbitrary calls into patterns allowed by the match statement so some may find it useful where expressiveness is more important than outright performance. Regex happens to be a good solution to the OP’s problem and I personally find spelling out re.IGNORECASE as a flag more expressive than an inline (?i), hence the choice of the example. You’re free to use the framework as a wrapper for other more useful calls with optional parameters as you see fit.

This third version aims to allow matches of sub-patterns in named groups to be more intuitively assigned to names using capture patterns so there’s no more need to explicitly access the re.Match object.

Usage example:

match Matcher('Hello, Python!'):
    case Search(r'Hello, (?P<subject>.*)!', subject=subject):
        print(f'Greetings to {subject()}')

which outputs:

Greetings to Python

Unfortunately CPython’s match implementation retrieves values for capture patterns before equality tests are performed for attributes with specified values, which means I have to feed capture patterns some not-yet-existent values before a regex search can be performed to produce those values. I end up returning lambda functions to capture patterns so they can defer evaluation.

Support for arbitrary keyword-based capture patterns relies on the __getattr__ method of a Matcher instance, but the tricky part is that __getattr__ has no easy way to tell which parent class (StartsWith, Search, etc.) is currently being used since all that it is given is the main Matcher instance and the attribute name being looked up. I worked around this by making property accesses set the current class in a new cls attribute on the matcher instance so later __getattr__ would be able to call the getter function specific to this class.

I also realized that todo_args doesn’t need to be a set. A simple counter will do:

class MatcherBase:
    def __init__(self, string):
        self.string = string
        self.args = {}
        self.todo_args = 0

    def __init_subclass__(cls):
        for name in cls.__match_args__:
            class MatcherProperty:
                def __init__(self, matcher):
                    self.matcher = matcher
                    matcher.cls = cls
                    matcher.todo_args += 1
                def __eq__(self, value, name=name):
                    matcher = self.matcher
                    matcher.todo_args -= 1
                    matcher.args[name] = value
                    if matcher.todo_args:
                        return True
                    match = cls._match(matcher)
                    matcher.args = {}
                    matcher.todo_args = 0
                    return match
            setattr(cls, name, property(MatcherProperty))

    def __getattr__(self, name):
        return self.cls._getattr(self, name)
class Search(MatcherBase):
    __match_args__ = 'pattern', 'flags'

    @staticmethod
    def _match(matcher):
        matcher.match = match = re.search(
            matcher.args['pattern'],
            matcher.string,
            matcher.args.get('flags', re.NOFLAG)
        )
        return match

    @staticmethod
    def _getattr(matcher, name):
        return lambda: matcher.match[name]

class Matcher(StartsWith, Search):
    __match_args__ = ()
1 Like

Here’s a working hack that allows matching a value using regex. It can be used interchangeably with other matching patterns, and applied recursively to any json-like type. It’s more verbose than proposed new syntax, of course.

JsonSimple = NoneType | bool | int | float | str
type JsonLike = JsonSimple | list[JsonLike] | dict[str, JsonLike]

# inherit from str so that you can use it anywhere
class RegMatch(str):
    __match_args__ = ('_match_magic_', '_group_magic_')
    _last_match_: re.Match[str]

    @overload
    def __new__(cls, value: str) -> RegMatch: ...
    @overload
    def __new__[T](cls, value: T) -> T: ...
    def __new__(cls, value): # type: ignore
        if isinstance(value, str):
            return str.__new__(cls, value)
        elif isinstance(value, list):
            return [RegMatch(v) for v in value]
        elif isinstance(value, dict):
            return {k: RegMatch(v) for k, v in value.items()}
        elif isinstance(value, JsonSimple):
            return value
        raise TypeError('RegMatch only works with JSON-like types')
    
    def __init__(self, value: JsonLike) -> None:
        self._match_magic_ = _MatchMagic(self)
        self._group_magic_ = _GroupMagic(self)
    

class _MatchMagic:
    def __init__(self, regmatch: RegMatch):
        self.regmatch = regmatch
        
    def __eq__(self, other):

        # use a cache?
        if match := re.match(other, self.regmatch):
            self.regmatch._last_match_ = match
            return True
        return False
    

class _GroupMagic(Mapping[str|int, str|None]):
    def __init__(self, regmatch: RegMatch) -> None:
        self.regmatch = regmatch
    
    def __iter__(self) -> Iterator[str|int]:
        yield from range(len(self.regmatch._last_match_.groups()))
        yield from self.regmatch._last_match_.groupdict()

    def __len__(self) -> int:
        return (
            len(self.regmatch._last_match_.groups())
           +len(self.regmatch._last_match_.groupdict())
        )
        
    def __getitem__(self, val: str|int) -> str|None:
        return self.regmatch._last_match_.group(val)


# any json-like can be safely wrapped in RegMatch()
match RegMatch("Hello world"):

    # RegMatch(PATTERN_LITERAL, GROUPS)
    case RegMatch(r'(\w+) (?P<second_word>\w+)', {1: g1, 'second_word': g2}):
        print(f'Matched! {g1=}, {g2=}') # g1='Hello', g2='world!'
    

value = ['hello world', 125]
match RegMatch(value):
    case [RegMatch(r'hello .+'), number]:
        print(number)

value = {'1': 'hello world', '2': 10}
match RegMatch(value):
    case {'1': RegMatch(r'hello .+')}:
        print('dict!')
1 Like