PEP 677 with an easier-to-parse and more expressive syntax

(This is a continuation of a discussion in PEP Idea: Extend spec of Callable to accept unpacked TypeDicts to specify keyword-only parameters where the thread-starter wished to focus back on the original topic.)

PEP 677 defined a new callable type syntax:

def flat_map(
    func: (int) -> list[int],
    values: list[int],
) -> list[int]:
    out = []
    for element in values:
        out.extend(f(element))
    return out

But the PEP was rejected. Guido summarized the rejection note here: PEP Idea: Extend spec of Callable to accept unpacked TypeDicts to specify keyword-only parameters - #17 by guido

One way to read it is that in order to get something like PEP 677 accepted, the following needs to be true:

  1. The syntax should be parsable without the new PEG parser.
  2. The syntax should allow expressing things that were previously not possible.

He then suggested a syntax which re-uses lambda expressions:

def flat_map(
    func: lambda(x: int) -> list[int]: ...,
    values: list[int],
) -> list[int]:
    out = []
    for element in values:
        out.extend(f(element))
    return out

The idea is essentially to allow annotating lambda with type annotations, and then to use a lambda which only returns Ellipsis to be used as a callable type. This is easier to parse due to the lambda keyword and it allows expressing things that were previously not possible, like keyword-only and optional arguments:

def f(func: lambda(y: float, *, z: bool = ...) -> bool: ...) -> None:
    pass

It’s probably uncontroversial to say that most people will not like the need for the Ellipsis return. The Ellipsis return will also look quite confusing when a function returns a callable. But if it was possible to remove that part (and also make function parameter names optional), it seems to me like this is a good syntax:

def flat_map(
    func: lambda(int) -> list[int],
    l: list[int],
) -> list[int]: ...

type StrTransform = lambda(str) -> str

def f() -> lambda(int, str) -> bool:
    ...

Why lambda instead of def?

At first I was skeptical, but I now think there are good reasons to choose lambda over def for this:

  • def really sounds like you’re defining something, which you aren’t really here
  • def looks a bit confusing in the annotation of a function parameter because there’s already a def there from the function itself
  • currently, def always expects a function name; lambda doesn’t, so lambda fits the job a bit better

What are potential downsides?

The major downside is probably that at first, function signatures with a lambda callable type will be harder to parse for the human eye. Especially when there is also a lambda expression default argument:

def flat_map(
    l: list[int],
    func: lambda(int) -> list[int] = lambda x: [x],
) -> list[int]: ...

But I think the parentheses make these sufficiently visually distinct, such that one will get used to it pretty quickly.

6 Likes

I like this. I’m quite heavy on generics, so an example for Concatenate :

type ExtraFnArg[**P, R] = Callable[Concatenate[str, P], R]

could be replicated with the lambda style syntax:

type ExtraFnArg[**P, R] = lambda(str, *args: P.args, **kwargs: P.kwargs) -> R

which is a bit more verbose but still works

Can you talk more about how you want parameters to work here? In your first example I see lambda(x: int) -> list[int]. On the face of it, that means a positional-or-keyword parameter named x, so only callables where the parameter is named x should be accepted. Which is a bit unfortunate, because usually when you accept a callable, you don’t care about the names of non-keyword arguments. So you should probably write lambda(x: int, /) -> list[int]: ..., but that still adds some noise (the parameter name and /) over the current syntax.

But then later you suggest making function parameter names optional: lambda(int) -> list[int]. Can you clarify more about how you intend that to work in the grammar? It’s not clear to me how it interacts with unannotated parameters. For example, could I write lambda(x, y: int) -> str and leave one parameter unannotated? But then if I remove the y parameter I’m left with lambda(x) -> str which looks like I have a parameter of type x.

The examples only show simple types like int and str, but in practice, people of course will often need more complex types, like list[int] or int | str. So would lambda(int | str) -> list[str] be allowed? If so, I’m not sure it can be parsed without the PEG grammar.

6 Likes

Jelle:

Can you talk more about how you want parameters to work here?

I’m not Thomas but I did have some thoughts about this long ago.

I had wanted the syntax to deviate from def parameter lists in one important way: If a parameter has no colon, it means an unnamed type instead of an untyped name.

In the pre-PEG-parser days we would have to specify the syntax as something like expr [':' expr], and then in a separate pass reject the first expr if it’s more than a bare NAME (this is what we did for NAME ['=' expr] too – the grammar said expr ['=' expr]).

But in the PEG parser it’s no problem to write expr | NAME ':' expr.

So in Jelle’s examples:

  • lambda(x, y: int) -> str has a first parameter that’s anonymous and of type x.
  • Same indeed for lambda(x) -> str.
  • And lambda(int | str) -> list[str] would indeed be allowed.

As I said above, in tools without a PEG parser you have to accept a slightly more lenient syntax and sort it out in a later phase, which I think is no big deal (since it had to be done for optional defaults in the past).

Thomas:

def f(func: lambda(y: float, *, z: bool = ...) -> bool: ...) -> None:

Something seems wrong here: it looks like the : ... after the second bool should be ,...?

FWIW, My own reason for not wanting def here is that I am used to grepping for def foobar( to find a function named foobar. (It’s also why async comes before def if present. :slight_smile:

Note that using a variant of lambda here solves another point of (some) contention from PEP 677: relative priorities of callable types and operators like |. The existing grammar has rules for how lambda interacts with other operators: lambda is the least binding, so lambda x, y: x|y means what it looks like, x | lambda: None is a syntax error, and lambda x: lambda y: x + y is a lambda returning a lambda. It makes sense to keep the same rules for the new variant using lambda.

PS. In Python 3 we removed some obscure Python 2 syntax where you could say lambda (x, y): ... and it would mean that it expected a single tuple of length 2 which would be unpacked into x and y. This came from the earliest days where a function parameter list was always such a tuple – a misfeature borrowed from ABC and soon corrected. I specifically wanted this removed from the lambda syntax so we might eventually be able to add parenthesized parameter lists to lambda. I would be very happy if that option was used for this purpose.

4 Likes

It was meant to indicate that z has a default value. But I agree that this looks confusing.

EDIT: ah no, wait, you said “second bool”. I was still using the “lambda with empty function body” syntax at this point in my post. But yes, in the syntax I describe at the end, this : ... shouldn’t be there.

I agree with this.

I would prefer if ... between def foo(...) and lambda(...) remained identical even if I have to write lambda(x: int, /).

2 Likes

Looking at today’s runtime behavior of lambda, I see that it disallows any kind of parenthesis in the argument specification:

>>> lambda (x): x
  File "<python-input-3>", line 1
    lambda (x): x
           ^^^
SyntaxError: Lambda expression parameters cannot be parenthesized

So, this gives us a lot of room to implement something new.


Jelle makes a good point that it’s not clear what lambda(int) is supposed to mean. (Is int the name of the function argument or a type?) There are pro and cons to both interpretations:

Option 1: lambda(int) refers to the name of the function argument

Under this option, the content of the parentheses after lambda is a normal function signature, as used in def.

This would be compatible with also enabling type annotations on lambda expressions, because it’s clear what this would mean then:

prod = lambda(x, y: int) -> int: x * y

Type annotations on lambda expressions isn’t possible today, so this might be nice. However, I’m not sure this is really that useful considering that you can always use a def instead of a lambda. Also, if a parenthesis after lambda can introduce both a type-annotated lambda function and a callable type annotation, then this might be confusing, and hard to implement.

Besides this, the main downside of this option seems to be increased verbosity.

For example Callable[Concatenate[bool, P], R] would have to be written

f: lambda(x: bool, *args: P.args, **kwargs: P.kwargs) -> R

instead of simply lambda(bool, **P) -> R with option 2 below.

Option 2: lambda(int) refers to the type of the function argument

Under this option, lambda(int, str) -> float straight-forwardly maps to Callable[[int, str], float]. Everything described in PEP 677 would work the same here, just with the lambda in front. But, in addition to what’s described in PEP 677, named arguments are also allowed:

f: lambda(int, y: float, *, z: bool = ..., **kwargs: str) -> bool

I think this option is incompatible with type annotations on lambda expressions because the example from above becomes ambiguous:

prod = lambda(x, y: int) -> int: x * y

So, to recap:

  • Option 1: verbose; perhaps confusing because it looks more like a lambda expression; would allow type annotations on lambda expressions
  • Option 2: compact notation; not compatible with typed lambda expressions

I think option 2 is a far cleaner way to do it. Being able to spell the name of keyword arguments, the position of positional ones, and syntax like *, / and x=... could be useful. I could imagine this being useful to annotate wrappers more specific, prove / disprove type equivalence of functions and much more. However, I would like to ask wether said changes would affect some typing.assert_type(foo, lambda(...) -> ...). I suppose the syntax will only be valid at type checking time, and retrieving it from __annotations__ would return a Callable[...], as the syntax can’t (yet) be evaluated at runtime, but that would require easier ways to spell names of arguments, *, / and =, and more for collections.abc.Callable (or typings version).

1 Like

Option 3: lambda[int] refers to the type of the function argument

Similarly as under Option 2, lambda[int, str] -> float straight-forwardly maps to Callable[[int, str], float].

Named arguments would also we allowed, but this time with no ambiguity:

ProdType = lambda[x, y: int] -> int

# vs 

prod = lambda(x, y: int) -> int: x * y
2 Likes

Well in that case I suppose Option 3 is the best (imo), as type checkers don’t (yet) support function calls in type hints.

1 Like

IIRC, the problem with “y: int” inside square brackets is that the colon is interpreted as a slice, which is quite weird from a runtime perspective, but maybe it can be made to work somehow…

2 Likes

There is no reason to turn it into a slice. This is completely new syntax, we can do whatever we want.

3 Likes

Would option 3 just be making lambda some weird keyword/alias for types.LambdaType then they use __class_getitem__ and cause the weird slicing issues with :?

I have no idea how this would get implemented I’m just naively leaning towards it being that thing from the types module I linked.

So maybe we just directly use LambdaType?

ProdType = LambdaType[x, y: int] -> int

(Slicing is something to be resolved, for sure.)

No. It would not have any relation to any existing constructs and can follow any rules we might want or don’t want. That is the central benefit of using a keyword.

2 Likes

Having just read PEP 677, I think it is a wonderful PEP, pythonic,* and should be resubmitted to the SC without modification if such a thing is possible. (Although, if the resubmission desires more features, as one of the “cryptic comments” implies, then just submit the “extended” version detailed in the same PEP.)

(I especially like its consideration that a => lambda syntax may be added to Python in the future, and how this would interact with -> syntax in the Python context where types are also values.)

The first time I tried to provide a type annotation for an argument that was a Callable, I did exactly what this pep proposed and was disappointed it did not work. In my view, there is very little point to adding any other Callable syntax to Python unless it is PEP-677 syntax; if you want a syntax that nobody will guess, we already have Callable.

It would also be very odd if the syntax for Callable involved the keyword lambda because… many Callables are not lambdas. As pointed out above, there is already a distinct LambdaType.

*If I were designing lambda and Callable syntax from the ground up, I would personally choose something completely different. But Python exists in the context of what it has been before.

1 Like

It’s worth noting that this option would prevent type parameters:

IdentityType = lambda[T](x: T) -> T

More generally, I would expect brackets to be related to type parameters in the context of a type expression.

1 Like

So, it seems to me that using square brackets is not a good option after all because we surely want to support generic callable types. For the two options I posted in the beginning, I’ve come to favor this one more:

def flat_map(
    func: lambda(int) -> list[int],
    l: list[int],
) -> list[int]:
    ...

type MyCallable[**P, R] = lambda(bool, **P) -> R

def f() -> lambda(int, *, suffix: str) -> bool:  # optionally with named args
    ...

because it’s more similar to how it works in other languages and it just results in more compact code.

One sort of interesting edge case is something like this:

type F[**P] = lambda(*, x: int, **P) -> bool

where we have keyword-only arguments and then we concatenate this with a ParamSpec. It seems a function of this type isn’t callable if I specify it as f: F[[int, int]], because I don’t know the argument names of the two int arguments. The problem here is that there is no syntax for ParamSpec that can specify argument names as well. So… ParamSpec after , *, might need to be disallowed?

(By the way, my goal was just to start a discussion on this; I don’t feel capable to be a PEP author for this, so I hope no one was waiting for me to do that.)

Ah, I remembered that we can just look at how callable protocols handle that edge case:

class F[**P](Protocol):
    def __call__(self, *, x: int, *args: P.args, **kwargs: P.kwargs) -> bool: ...

This is of course not allowed because it has two *. So, prohibiting **P after * is consistent with the existing behavior.

As mentioned in a prior discussion I think it’s better to avoid this glaring downside by reusing the name Callable, which is already widely known for typing a callable.

This can be achieved by making Callable a soft keyword with a grammar rule that prioritizes Callable(params) -> expression (where -> is required) over the rule that parses Callable as a name.

A quick search for /(?-i)\bCallable\(/ lang:python (calling Callable as a name) in all GitHub repos returns only less 10k hits so backtracking from this new parser rule should not cause a noticeable slowdown in parsing.