Partial string matches in structural pattern matching

Hi there,
I’d love to be able to use partial string matches:

match text:
    case "prefix_" + cmd:  # checking prefix
        print("got", cmd)
    case "Hello " + last_name + ", " + first_name + "!":  # more complex example
        print(last_name, first_name)

I know the various tricks using split() or guards (case s if s.startswith("prefix_")), but this would be so much more intuitive for simple checks.

Any thoughts? Cheers! :beers:

2 Likes

You are asking that match becomes a parser.
I am not sure that is the right tool for parsing or that you could define the parsing semantics for general use.

Coming from C/C++, I was surprised to see the structural aspect of match/case in Python (when it came out) like

match val:
    case ("a", x):
        print(x)

and this structural aspect gave me the idea of having something similar for strings.

Not wild about case "Hello " + last_name + ", " + first_name + "!".

Using f"{strings}" might work better.

match text:
    case f"prefix_{cmd}":
        print("got", cmd)
    case f"Hello {last_name}, {first_name}!":
        print(last_name, first_name)

But this still it isn’t very precise. Can first_name contain a comma if there are two? This is a job for regular expressions!

match text:
    case  re.compile(r"prefix_(?P<cmd>\w+)":
        print("got", cmd)
    case re.compile(r"Hello (?P<last_name>\w+), (?P<first_name>\w+)!"):
        print(last_name, first_name)

But that looks far too complex, and doesn’t answer whether we want match, fullmatch, or search.

Add regular expressions to the f"{string}" mini-language syntax?

match text:
    case rf"prefix_{cmd!(\w+)}":
        print("got", cmd)
    case rf"Hello {last_name!(\w+)}, {first_name!(\w+)}!":
        print(last_name, first_name)
1 Like

Yeah, and name-wise I’m not a fan of using “compile” as a matching feature. So here’s a thought. Can we make some sort of helper class called, say, re.RegExp(), which can be constructed with a pattern and which returns a type suitable for a match case statement?

I would also be in favour of having a similar function for sscanf.

These would actually end up being “type specializations” rather than actual type constructions. So re.RegExp and sscanf themselves would either be metatypes, or types that take parameters. Which might mean a viable spelling would be re.Pattern["pattern-goes-here"] but that might get in the way of type checkers.

Building in more special cases like this seems unpleasant. There will continue to be demands for more, and it’s not at all clear what should be special about compiled regexes - or any other class produced by the re standard library module - that would enable special behaviour with match.

I can, however, imagine a general extension: extend the match syntax to allow as <NAME> to capture the expression in a variable (necessary since it won’t necessarily already be a name), and extend the case syntax to accept a from <EXPR> clause (before any if clause) that can compute some other value and unpack and match that. Thus:

match text as t:
    case ("prefix", cmd) from t.split('_'):
        print('got', cmd)
    case (last_name, first_name) from re.match('Hello (.*?), (.*?)!', t).groups():
        print(last_name, first_name)
    case unknown: # default; equivalent to "case unknown from t"
        print("don't know what to do with", unknown)

I’m not very enthusiastic about this idea, but I wasn’t very enthusiastic about match in the first place.

Minor quibble: case _ does not bind to underscore, and is NOT equivalent to case t which does. I’m not sure what case _ as t would do but it should still not bind to underscore.

Fixed it - the comment didn’t properly, er, match my own proposal anyway :sweat_smile:

I love the f-string approach!

I’m not saying that this syntax needs to cover all complex cases where a full-blown regular expressions makes sense. Let’s aim for a 80% coverage of simple but common cases (just like the list or tuple expressions in case can only handle simple but common cases).

And yes, matching a "Hello Doe,, Jane" with f"Hello {last_name}, {first_name}" should give a trailing comma in last_name IMHO.

If you want a search rather than a match, you could throw in a {_} at the beginning (and/or at the end if you want to eat trailing stuff).

If any extension was made I also feel that general is better than specific. The match / case syntax already feels relatively complex.

When thinking about adding case NAMES from EXPR, this strikes me as adding the match statement to the match statement, if that makes sense? Since we can already do this:

match EXPR:
    case NAMES:
        …

So even though it is perfectly general, adding ‘from’ might miss the original use case requested, which seemed to be something more like glob matching?

With regard to some of the other ideas like using f-strings: Python already had a lot of “mini languages” and I would argue against adding another one, especially if it was a subset of an existing one. Things that look the same as something else in the language but do something different and don’t support all the same uses is a design red flag.

This sounds fun! I might try to make that if I have time.

I don’t think match expression as <NAME> is necessary, since we’ve got the walrus, which can bind the expression to a variable.

match (t:= some + expression):
   case ("prefix", cmd) from t.split('_'):
      print('got', cmd)
   ...

In the possibly more likely case where a variable already exists, no additional capture is needed.

match t:
   case ("prefix", cmd) from t.split('_'):
      print('got', cmd)
   ...

So really, we just need the from EXPR: extension for case.

2 Likes

Structural pattern matching in Python comes with a bunch of patterns. My idea would be to add a very simple and basic pattern to decompose strings without the need for full-blown REs.

While I enjoy the brainstorming part about making it more general, I really like the “keep it simple, cover the basics” aspect about my initial proposal. But then again, I might be biased. :stuck_out_tongue_winking_eye:

If you have a proposal, then you need to make a proposal. See how other PEPs are written to see what level of effort you need to put in. Otherwise, this Idea thread is just going to continue to be undirected speculation. Right now you’ve asked for “any thoughts” and that’s what you’re getting.

1 Like

Don’t get me wrong, I’m not complaining (quite the opposite: “enjoy the brainstorming”), so thanks to all of you for your input. :heart:

When thinking about adding case NAMES from EXPR, this strikes me as adding the match statement to the match statement, if that makes sense?

The issue with the original example’s suggestion is there is way too much variability in the case handling. Implementing the desired behaviour with an if...elif... ladder and regular expressions is pretty clear and straightforward:

import re

def test(text):
    if m := re.fullmatch("prefix_(.*?)", text):
        cmd, = m.groups()
        print("got", cmd)
    elif m := re.fullmatch("Hello (.*?), (.*?)!", text):
        last_name, first_name = m.groups()
        print(last_name, first_name)
    else:
        print(f"Don't know what to do with {text}")
        
test("prefix_Order66")
test("Hello Phol, Tom!")
test("foo bar")

Unlike a match statement, this will blow up if passed a non-string, like test(100).

Otherwise, this looks kinda like a match statement, if f"{var}" in a case was matched by (.*?).

def test(text):
    match text:
        case f"prefix_{cmd}":
            print("got", cmd)
        case f"Hello {last_name}, {first_name}!":
            print(last_name, first_name)
        case _:
            print(f"Don't know what to do with {text}")

Except … interpolation of f"prefix_{cmd}" doesn’t require cmd to be an string; it could be a integer!

match text:
    case f"prefix_{cmd:d}":
        print("Integer command", cmd):
    case f"prefix_{cmd:f}":
        print("Fractional command", cmd)
    case f"prefix_{cmd}":
        print("Vanilla string command", cmd)

But this open a whole new can of worms, which we probably don’t want to head down. For example, could prefix_(1+1j) be matched by case f"prefix_{cmd!r}" and assign cmd = 1 + 1j???

I guess my instinct here is that if one is trying to handle text with a match such that the cases are different patterns (vs one pattern and the cases are on the regex group contents), this isn’t “structural” pattern matching anymore. Pattern matching on strings is parsing, and I would parse the text into objects then use match on the structure of the objects.

That’s not an actual rebuttal of the proposed idea, since it reduces down to “lol y do u want to do that”, but I think it’s why I’m not enthusiastic about it. Most of the proposed uses are things where above a certain scale I’d be tokenizing input and using a PEG / parser combinator library - but ultimately that’s just a personal bias.

The existing pattern types do handle list elements though, and being able to match on substrings does feel like a natural extension of that.

1 Like

That’s exactly where I’m coming from, mate. :smiley:

…juuuust kidding, took a look at the Structural Pattern Matching PEP and with the way Class patterns currently work this seems more difficult than I first thought.

I also stumbled upon the problem of partial string matching before and tried all kind of workaround but unfortunately they all seem a bit tedious/overly complex.

I definitely like the f-string approach that was suggested earlier on and would support the addition of to improve the match statement. :+1: