Allow leading `|` in `match` or-patterns for better git diffs

Problem

I have recently converted a large project which heavily uses unions from if chains to match/case. The project now has ~150 match statements. Some cases have many possible values in the same branch, and see a lot of churn, so to keep diffs minimal and clean when a value is added or removed, I put each value in its own line with a trailing comma. Before match I would have written e.g.

if color in (
    Color.RED,
    Color.BLUE,
):
    return 'nice'

if isinstance(shape, (
    Square,
    Trapezoid,
):
    return 4

and adding new values at the start or end produces a clean diff:

 if color in (
     Color.RED,
     Color.BLUE,
+    Color.GREEN,
 ):
     return 'nice'

 if isinstance(shape, (
+    Kite,
     Square,
     Trapezoid,
 ):
     return 4

With match I now write it like this (using two different styles for demonstration purposes):

match color:
    case (
        Color.RED |
        Color.BLUE
    ):
        return 'nice'

match shape:
    case (
        Square
        | Trapezoid
    ):
        return 4

Besides it not being symmetric, the diffs are not as nice, touching unrelated lines:

 match color:
     case (
         Color.RED |
-        Color.BLUE
+        Color.BLUE |
+        Color.GREEN
     ):
         return 'nice'
 
 match shape:
     case (
-        Square
+        Kite
+        | Square
         | Trapezoid
     ):
         return 4

Proposed solution

Allow a leading | in or-patterns, so can write like this:

match color:
    case (
        | Color.RED
        | Color.BLUE
    ):
        return 'nice'

match shape:
    case (
        | Square
        | Trapezoid
    ):
        return 4

and the diff:

     case (
         | Color.RED
         | Color.BLUE
+        | Color.GREEN
     ):
         return 'nice'
 
 match shape:
     case (
+        | Kite
         | Square
         | Trapezoid
     ):

In practical terms, it means altering the grammar like so:

diff --git a/Grammar/python.gram b/Grammar/python.gram
index 51f846a57f4..51e4174f1aa 100644
--- a/Grammar/python.gram
+++ b/Grammar/python.gram
@@ -470,7 +470,7 @@ as_pattern[pattern_ty]:
     | invalid_as_pattern
 
 or_pattern[pattern_ty]:
-    | patterns[asdl_pattern_seq*]='|'.closed_pattern+ {
+    | '|'? patterns[asdl_pattern_seq*]='|'.closed_pattern+ {
         asdl_seq_LEN(patterns) == 1 ? asdl_seq_GET(patterns, 0) : _PyAST_MatchOr(patterns, EXTRA) }
 
 closed_pattern[pattern_ty] (memo):

TypeScript (in a type position) and other languages do this. Hopefully auto-formatters like black will format in this style (when the pattern is one-per-line).

Side note

BTW, there is a similar problem with multi-line unions – before:

Shape: TypeAlias = Union[
    Kite,
    Square,
    Trapezoid,
]

After:


Shape: TypeAlias = (
    Kite
    | Square
    | Trapezoid
)

But this is not the same part of the grammar and I am not proposing any changes for it in this proposal.

2 Likes

Would trailing pipes after the last element achieve the same thing? That would be my (slight) preference, and it’s similar to how trailing commas are allowed in lists and such

3 Likes

I think trailing pipes should not be preferred because PEP 8 says that line breaks should be done in a way that binary operators are at the beginning of a line (not at the end):

6 Likes

I don’t think pep8 should inform language design too much. Also, is the pipe in a pattern matching pattern even treated as an operator in Python.

I prefer leading | because it aligns more nicely, looking like a long line, and because it’s more common (in my experience). For example, the syntax of the Python grammar itself uses this style. But I am not attached to it, if people prefer trailing |, and it does not cause any grammar ambiguities (I didn’t check), then it’s fine by me.

2 Likes

The pipe operator is not a separator. Whether we are talking about the actual operator | (bitwise-or) or the match syntax |, it still represents a binary operator and not a separator.

We don’t say “If the sandwich filling is ham or cheese or salad or, then …” and we should not allow a trailing pipe following the last operand.

Nor do we say “if the filling is or ham or cheese or salad, then”.

There are precedents for this in grammar notations. Python’s PEG grammar, for example, supports leading | (see PEP 617 and python.gram), and I am somewhat supportive of allowing this in match patterns. Someone would have to take the lead on feature development for this, writing a PEP, and I’m not going to be that someone. (If someone has a decent draft I could see myself sponsoring it though.)

For the Union notation I recommend just sticking to the old Union[...] notation, it’s not going anywhere. Messing with actual binary operators would be too controversial.

3 Likes

I realize it doesn’t solve all cases, but this shape example reminds me of SQL, which has never allowed trailing commas. You learn pretty early on to always expand sequences at the end. Can’t you just add | Kite at the end?

1 Like

I’ve written a draft PEP for this: peps/pep-9999.rst at leading-pipe-in-or-pattern · bluetech/peps · GitHub
I tried to keep it short as befitting the fairly trivial matter, but long enough to cover everything I could think of.

And a draft PR: Allow leading `|` in match-case OR patterns by bluetech · Pull Request #1 · bluetech/cpython · GitHub

If I understand the process correctly, I now need a core developer sponsor. @guido, would you be willing to sponsor this PEP?

I still think “do nothing” is a valid option. You just change the way you add lines when expanding the set of cases:

match color:
    case (
        Color.RED
        | Color.BLUE
    ): ...

match color:
    case (
        Color.RED
        | Color.BLUE
        | Color.GREEN
    ): ...

You should discuss it, and explain why you’d reject it.

2 Likes

Agreed, it should be discussed. Skip, do you want to mentor Ran through the PEP process? I’m sure you still know how it goes. :slight_smile:

The reason I didn’t list “do nothing” in “Rejected Ideas” is that it’s implied - I figure it’s the job of the “Motivation” section to reject the “do nothing” idea.

The reasons I give in the PEP to reject “do nothing”/“add at the end” are:

  • The values do not align visually; with the leading | it looks nicer in my opinion
  • If the Color.RED is removed, need to modify the Color.BLUE line
  • If a value is added before the Color.RED (for whatever reason), need to change the Color.RED line
  • Can’t freely transpose or sort the lines

I’ll add: you had previously mentioned SQL. I write a lot more SQL than I do match statements. In SQL, I actually find the issue a lot more frequent and annoying. In SQL there’s always a special case in the SELECT list (no trailing comma), the FROM list (no trailing comma, although usually the lines are JOINs not comma-separated), the WHERE (no trailing/leading AND/OR), the ORDER BY and all the rest. Whenever I interactively edit SQL I inevitably want to temporarily comment out e.g. some SELECT value or a condition using the -- line comment syntax, but then that special case hits and I need to either use a clumsy /* .. */ comment, or edit adjacent lines. That at least is my experience.

Also, “add at the end” still requires modifying two lines when adding the second value.

I wish standard SQL would allow trailing commas but that’s probably not going to happen. At least newer SQL variants like EdgeQL seem to get it right.

All these seems like they could equally justify putting a leading boolean operator at the beginning of a boolean or arithmetic expression, which seems wrong.

if (
    thing1
    and thing2
    or thing3
):

Agreed. To put it another way, while the | in a match statement may syntactically be different from an operator, it looks like one, and people will naturally expect it to work the same.

I’m sympathetic with the idea here (I worked for years with SQL, and the inconsistency is genuinely inconvenient) but two things kill this proposal for me:

  1. The discrepancy between match-| and operators.
  2. The fact that the extra | is at the beginning - the optional trailing comma in lists, etc., is at the end, and I find that far more natural than starting the construct with an extra character.
2 Likes

You haven’t worked much with the Python PEG grammar I take it…

Sure. Happy to help. I will need to review the process myself…

I’m not sure if this comment was directed at me, but if it was, I don’t see how it’s relevant…? I was talking conceptually, not in terms of implementation.

But it’s not that important either way. I’m somewhat against the idea as described, because I don’t think it “fits well” with the rest of the lanuage, but I’m happy enough to simply ignore it if it gets implemented, so I’m no worse than -0 on it.

1 Like

I meant that the Python PEG grammar has this same feature (you can start with a leading ‘|’) and it makes a big (positive) difference for readability of the grammar. IIRC I mentioned this before in this thread. My snide remark was implying that if you’d worked with the PEG grammar file you’d probably appreciate the proposed feature more – I was not implying anything about the implementation of the feature, although I can see how you’d think that, since you would indeed have to work with that file to implement it… :slight_smile:

1 Like

I updated the draft PEP slightly to include reference to a few other languages - OCaml, Rust, Haskell.

Great, thanks!

If I’m understanding things correctly, the next steps are:

  • You (the Sponser) should deem the PEP ready for submission
  • I submit the PEP to the peps repository
  • After it’s merged, I submit it for discussion to the PEPs category
  • Then it’s accepted :slight_smile: