Next and filter patterns

Intro/Problem

Recently I was using pattern matching for the first time to parse through json documents, and I enjoyed it very much, finding it natural for the problem at hand.

I did however find myself trying to do something with a sequence pattern that just isn’t possible according to PEP 634. What I was trying to do was find the first occurrence of a subpattern within a sequence and bind it.

Example

The example json document is one of the follows and I was trying to parse them

# sources is a single string that is a url
{ "sources": "some string"} |
# sources is an object that has an attribute called url containing a string
{"sources": {"url": <some string>} }
# sources is an array of a single string
{"sources": [ "some string"]}
# sources is an array of a multiple strings
{"sources": [ "some strings1",  "some strings2", ...,  "some stringsN"]}
# sources is an array containing documents that at least one of which has
# an attribute called "url" containing a string to a url
{"sources": [{"url": "some string"}]

Current Solution

This is a copy and paste from what I am currently working with

# NOTE need to determine how to check if the string is a valid url within the pattern
match sources:
    case str() as url:
         ...
    case {'url': str() as url}:
         ...
    case [str() as url]:
         ...
    case list():
        # can't pattern match into the list more :(
        # assuming first string found with "url" in it, is what we want and go with that
        search = next((x for x in sources if "url" in x), None)
        match search:
            case {'url': str() as url):
                ...
            case _:
            # anything else means its a string, list of string, or None
            # and I doubt the homepage has "url" in it directly
                raise KeyError
    case _:
        raise KeyError

For the first 3 cases, pattern matching works wonderfully but it was less than ideal when trying to match a sequence pattern. When I first got into this I naively thought I would be able to check if any item in a sequence matched a subpattern and bind it. Reading PEP 634 more I realized the sequence pattern is heavily dependent on absolute position and as such I had to do what I did above.

Proposal

Add a next and filter pattern that would respectively

  • Match at the first item in sequence that matches the subpattern(s), else go to next case
  • Check all items in sequence against the subpattern(s), if there are any then match, else go to next case

I believe this would be very natural and useful to extend pattern matching on sequences

1 Like

So I see what you’re trying to say, but for the sake of enlightenment, imagine the syntax existed in a way you wanted it, how would you use it in this case?

Hi Tobias to answer your question let’s define some syntax here.

  • next(<pattern>|<pattern>|<pattern>|...)
    • The requirement for this pattern is that match be a sequence. Upon finding a single subpattern within the sequence, it stops and enters the case block, otherwise the next case block is executed.
  • filter(<pattern>|<pattern>|<pattern>|...)
    • The requirement for this pattern is that match be a sequence. Goes over the entire sequence and as long as a single subpattern is found then the case block is entered, other the next case block is executed.

Now in my example, there are 2 cases that pattern matching in its current form cannot fully satisfy.

Let’s assume that there is some function is_url that will determine if a str is a valid url.

Given that the following 3 cases fail then what we are looking at is a list. That list either contains a list of dictionaries (the final example listed in OP) or a list of strings (second to last example listed in OP).

match sources:
    case str() as url:
         ...
    case {'url': str() as url}:
         ...
    case [str() as url]:
         ...

With the proposed syntax this code

 case list():
        # can't pattern match into the list more :(
        # assuming first string found with "url" in it, is what we want and go with that
        search = next((x for x in sources if "url" in x), None)
        match search:
            case {'url': str() as url):
                ...
            case _:
            # anything else means its a string, list of string, or None
            # and I doubt the homepage has "url" in it directly
                raise KeyError

Would then become

case next({"url": str() as url}):
    # gives us the first dictionary containing a 'url' key mapping to a str
    ...
case filter(str() as url if is_url(url)) as urls:
    # gives us all items in the sequence that are strings which are valid urls
    ...

I do not know why you decided to go with next and filter as those are already python built-in functions and they perform totally unrelated tasks. Also, are the next and filter elements you’ve introduced functions or some other construct? and if they are functions, how do you plan on them coexisting with the current built in functions with similar names?
Thirdly, I feel like what you’re try to achieve can be achieved if one used the walrus operator to create a temporary variable to a dynamically computed iterable, though the syntax would be quite long and not simplified at all.
I haven’t looked into it, but I’m sure with some tinkering, you can pull off a similar effect with a class that has __match_args__ declared and used, perharps computed dynamically.
Finally, I think this matter of convenience isn’t so demanding to necessitate such a huge modification in the language, unless you probably provide a pretty solid reason as to why it should be included.

I used filter and next as a matter of convenience because I felt it was most relatable to how the patterns operate. The names in this case arnt the most important imo but rather how they operate in the scope of pattern matching.

I’m not sure what you mean by next and filter being used for completely unrelated tasks. It’s an idotmatic approach to find the first item in a list meeting some condition using next

I.e. next(x for x in sequence if x <meets some condition>)

The parallel is that we’re using next to find the first matching subpattern in the sequence.

Likewise filter will select the items from a sequence that match some condition. Again the parallel is that we would match on items in sequence that match some subpattern(s).

These builtins operate more generally on iterables rather sequence but am I doing that poor of a job explaining myself?

For what it’s worth, pep 622 has a filter pattern in the deferred section so others must see some value in it.

there’s already an existing built-in function called next which takes in an iterator and returns the next value of an iterator, in this case it might even be a generator, and should be called until the iterator raises a StopIteration error or default. its signature is like next(iterator,[default]). However you’re suggesting a next which is different in that it will take in a sequence, will there be need of making simultaneous calls to the next you’re suggesting ? if not, it’s different for the existing next and perhaps will collide in some way.

python also has a built-in function filter(function,iterable) which creates an iterator yielding a value for which the function results true. This is also different from the filter you’re suggesting which takes in a sequence.

So what I’m asking is this, wouldn’t this introduce inconsistency in the semantics of the functions in the first place, having two functions with the same name that work differently?
And that’s if you’re intending for them to be some sort of built-in function. If you’re intending to introduce next and filter as hard keywords in the language, that’s another story.

As I said previously, I used next and filter because of what I felt was idiomatic use cases in the application of a iterating over a sequence to find the first item matching some condition or all items matching some condition.

I’m not attempting to write a PEP level description. A different syntax could and should be used. I was just trying to convey my idea where a useful extension of pattern matching could be had when handling sequences and you don’t care about absolute position but merely the existence of something with that sequence…