Expected behaviour for local function definitions in union types?

pyright outright disallows the use of union types with local function declarations: `pyright` mishandles optional local function declaration · Issue #9114 · microsoft/pyright · GitHub

While I can turn off pyright’s live typechecking as suggested in the bug report feedback, it seems wrong to me that this would be valid behaviour for a Python typechecker to implement. def and class statements are just syntactic conveniences for easily declaring new functions and types, they’re not privileged statements that override the way the language works.

It shouldn’t matter whether a callable is defined with def, or created via a factory function (or instantiating another object with a __call__ method), but that’s what pyright is doing here.

Edit: I tried an experiment to see if using a lambda instead of a def statement would make pyright happy about a Callable|None union type. It does making the typechecking errors go away, but the code linter (correctly) complains about the use of lambda in a simple assignment statement instead of using def.

5 Likes

Summarising the situation, pyright will accept code like the following:

def example(some_callback:Callable[[Any], None]|None) -> None:
    if some_callback is None:
        def _default_callback(param:Any) -> None:
            pass
        some_callback = _default_callback

But it will reject code that directly uses the parameter name when defining the nested function (a redeclaration error is reported against the parameter name, rather than treating the def statement as a type-narrowing operation the way a regular assignment statement gets handled):

def example(some_callback:Callable[[Any], None]|None) -> None:
    if some_callback is None:
        def some_callback(param:Any) -> None:
            pass

mypy accepts either form as valid (which seems like the correct behaviour).

1 Like

Agreed.

While I can see arguments that the first form is clearer, I’d argue that’s a distinction for a linter to make, not a type checker. As far as type checking (and language semantics) is concerned, a def statement (and equally, a class statement) is simply a particular form of assignment.

6 Likes

I think what the correct decision is depends to some degree on the current scope and the type the symbol currently has. mypy also emits errors for redefining functions/methods, since in a module or class scope it is a likely source of bugs, potentially indicating a function body that never gets executed, because the name gets assigned to a different function, before it has any chance to execute. So all type checkers have some rules that are more in the realm of linting than type safety, which is not a bad thing, since having proper semantic context can allow you detect some things that a simple linter would not be able to detect.

But in the given example I agree that it should be allowed, especially since within that branch some_callback is None, so you’re definitely not redefining a function and the type checker can know that.

You could still make an argument for flagging it in a context where you overwrite the parameter unconditionally, but in that case it would seem weird to treat functions differently and you should just flag all redefinitions. But there are also already linter rules for detecting function parameter names being reassigned in the body of the function, so I think there’s less of a need there.

I think type checkers are in a bit of a difficult spot here.

I think it’s reasonable to make a distinction between an assignment and a type declaration. An assignment just assigns a value; a declaration is a way of asserting, via the type checker, that no incompatible type will be assigned to this symbol.

Given that distinction, clearly x = 1 is just an assignment, and x: int (or x: int = 1) is a type declaration.

But I don’t think it’s at all obvious that under this scheme def and class statements should be considered “just like any other assignment,” rather than being considered a type declaration. It’s very common to have many assignment statements in a scope for a single name; it’s much, much less common to do this with a name defined via def or class. I suspect that allowing accidental shadowing of a def or class would cause many more false negatives (by which I mean, cases where the name was shadowed but that wasn’t intended, causing a bug) than the false positives that would be avoided. I think it’s a defensible tradeoff to require someone to be a bit more explicit when they want to do the less common thing here.

This isn’t a question of the language semantics; obviously the language semantics allow assigning any value to any name at any time. Allowing type declarations to restrict subsequent assignments (in a way the language semantics do not) is part of the core value proposition of a type checker, and this inherently requires making some judgment calls about what qualifies as a “declaration.”

Where I’m less convinced by pyright’s behavior is the requirement that a given name can only ever have one declared type in a scope, and declarations even have retroactive effect on prior assignments. I understand the simplicity reasons for choosing this, but I think it’s perhaps too restrictive in practice; I would rather allow explicit re-declarations, and have declarations only impact subsequent assignments. I think under such a scheme, @ncoghlan’s example would have just worked.

2 Likes

Interesting point. Here’s an example that uses ints rather than function types, to make the distinction clearer:

def f(x: int | None = None):
    if x is None:
        x: int = 12
    print(x+1)

This fails type checks for both mypy and pyright. Pyright complains about the second declaration obscuring the first, whereas mypy does that, but then fails to allow x+1 because x might be None. That second complaint seems like a bug to me - I guess it’s ignoring the assignment in its enthusiasm to complain about the redefinition…

If I leave off the : int both checkers allow the code just fine.

In the light of this, my problem here is that there’s no way of defining a function without it being considered as a declaration as well. If omitting the type annotations from the def statement made it behave like the simple x = 12 assignment, I could more happily accept the “redefinition” argument, but that’s not how it works (and it’s not actually what an untyped def even means, so it’s not a serious suggestion).

So yeah, I get that it’s not as obvious as I first believed, but I still think pyright’s behaviour is unhelpful, in that it forces you to write an unnecessary and arguably less clear workaround.

1 Like

Agreed. This is a concise summation of the “difficult spot” I think type checkers are in here. There are problems with never considering def or class to be a declaration, and problems with always considering it to be one, and no obvious syntax available for the author to explicitly choose.

I do think that allowing (and respecting) re-declarations, and not applying declarations retroactively, can eliminate most of the downsides of considering a def or a class to be a declaration, though.

I also don’t think mypy should be throwing away the narrowed type information due to the extra : int annotation in your example. I suspect it might have this behavior because of cases involving inferred generic types and invariance. I think there are better ways to handle that, but that’s another whole can of worms.

Is it reasonable to suggest that if the redefinition in the if block matches the type that the variable would have when the if block is bypassed, then it should be treated as a reassignment rather than a redefinition? I think that’s the case where there’s less justification for using a different name.

The key here is that the pattern is one of “use a sentinel to mark an omitted argument” and the important invariant is that after the code to set the default, the name has the type declared in the function header but without the sentinel as a possibility.

I think realistically, None is a bad sentinel that people use because it’s the most available and easy option, I’d rather have seen late-bound defaults (PEP 671) solve this.

Right now, the “better” way here:

def f(x: int = 12):
    print(x+1)

Well that’s great for this case, but not all function defaults would be immutable or might depend on other values.

The more general case that works currently:

_sentinel: Any = object()

def f(x: SomeType = _sentinel):
    if x is _sentinel:
        # expensive or mutable SomeType assignment

This has the problem of _sentinel not being type-safe use, and forgetting to check for a sentinel being a problem (though realistically, an unlikely one for someone to forget)

There have also been ideas for a typing construct for sentinel values, but I think these cover up deficiencies at the language level.


As for this issue in particular, I’ve argued for a while that type checkers should need to understand the language, and that analyzing control flow should be part of type checker’s responsibilities in understanding the possible types something can be. Even without that historical view though, this kind of assignment can be safely viewed as narrowing within the scope so long as no reassignment to the wider type happens afterward. reassignment to the original wider type should not be an error either, and should not be seen as the wider type until that assignment happens. So long as total use is consistent in the visible code paths, this remains safe.

The problem, at least in the form @ncoghlan originally stated it, isn’t about late or early binding. It’s about the fact that you can’t even write a default value that is a function (except in trivial cases where lambda is sufficient) because we don’t have generalised anonymous functions.

Like you say, None is merely an expedient sentinel here. PEP 661 proposes properly typed sentinel values, but that’s been languishing for some time.

Yeah, the initial error I got was for an implicit union type where one half of an if statement assigned None, while the other half declared a function. That one seemed reasonable to me.

The typechecking error only started feeling like a bug when the def statement overrode a prior explicit union type declaration.

I don’t think this is a clear enough situation to add MUST wording to the overall Python typing behaviour spec, so I’ll see if there’s an appropriate place for a paragraph like:

Type checkers MAY report an error when conditional execution of a def or class statement implicitly defines a union type. Type checkers SHOULD NOT report an error when conditional execution of a def or class statement narrows the inferred type of an explicitly declared union type.

and formulate that as a proposal for the typing council.

I’m afraid you’re way ahead of the current state of the Python typing specification here. Intentionally, the focus so far (in the short existence of the typing council and typing spec) has been on specifying aspects of typing that impact public APIs; the behavior of type declarations and type narrowing within a scope is really not specified at all at this point. So I think you’ll find there is no reasonable place to add your proposed paragraph, even if you could get the typing council to agree on it.

It will take some time (and a LOT of discussion) to specify any of this behavior, as there’s a lot of room for interpretation, each existing type checker behaves differently, and I’m not personally convinced any of them have found the right set of compromises yet (if there even is any “right” set of compromises.) And it’s not really even a priority yet to try to specify these areas.

4 Likes

Ah, OK, that makes sense. My original question from the pyright issue has been answered: given the current state of the typing specs, this really is up to the individual tools to define (at least for now).

Separating the function declaration from the union target assignment is a reasonable enough workaround in the meantime.

And in case it wasn’t clear: thanks to all the folks that have contributed to making Python’s static type checkers as capable as they are. I know it isn’t easy to accurately and concisely describe the expected behaviour of a language with these kinds of dynamic capabilities.

2 Likes