Interoperability issue with module level inline annotations

mikeshardmind · December 1, 2024, 3:52pm

Currently, at a module level with a publically exported symbol, pyright has unsafe behavior (edit see below, there are some questions as to whether pyright is actually unsafe, but the rules pyright is using are similarly not specified and this is a point where type checkers need a way to all be on the same page for what is allowed for a symbol)

Split off from a larger thread about inference, this is an example that contains an actual unsafety at a point of interoperability.

Inference on Literal Types

I find the current pyright’s behavior useful.
It lets me define a constant which appears immutable to the import sites, while remaining mutable in the defining file, allowing for late initialization.
# names.py
__all__ = ("NAME_TO_ID", "populate")

NAME_TO_ID: abc.Mapping[str, int] = {}
reveal_type(NAME_TO_ID)  # dict[str, int]


def populate(data: abc.Mapping[str, int], /) -> None:
    NAME_TO_ID.update(data)
# mod1.py
from names import NAME_TO_ID

reveal_type(NAME_TO_ID)  # abc.Mapping[str, int]
Note that this could already be achieved by introducing a private variable/constant typed as the mutable variant, but if pyright lets me save typing I’ll take it ¯_(ツ)_/¯

This allows assignability to names.NAME_TO_ID with an interface that is less specific than names.py actually requires.

For instance

import types
import names

names.NAME_TO_ID = types.MappingProxyType({})
names.populate()

It was noted in that thread that inline annotations do not have a precise meaning in the specification.

To better match developer intent and improve interoperability in this and future such issues, I’m proposing a concrete definition for inline annotations with assignment, as well as proposing that symbols that are part of a module’s namespace MUST be treated as invariant unless further assignment is forbidden with Final. It is also not enough for pyright’s special casing of SCREAMING_CASE being equivalent to Final, as this is visible to users of other type checkers that may not honor that convention as it is outside of the typing specification.

Justification for invariance: Assignment to names at a module scope is the same as updating a module’s mutable namespace dict.

The rules I’m proposing for inline annotations with

An inline annotation annotates an expectation of consistency with the expression on the RHS of the assignment.
An inline annotation specifies that the symbol on the LHS should be treated as no more specific than the annotation at the point of assignment. Further narrowing behavior is left up to type checker implementations but should be consistent with variance.

The second point here preserves developer intent in being able to state a less-specific public guaranteed type in modules without having to resort to a stub because a typechecker narrowed at time of assignment for an inline annotation.

saaketp · December 1, 2024, 4:08pm

Inference on Literal Types

__all__ = ("NAME_TO_ID", "populate")

NAME_TO_ID: abc.Mapping[str, int] = {}
reveal_type(NAME_TO_ID)  # dict[str, int]


def populate(data: abc.Mapping[str, int], /) -> None:
    NAME_TO_ID.update(data)

pyright doesn’t seem to actually allow this, it gives an error where you call .update.

Cannot access attribute "update" for class "Mapping[str, int]"
  Attribute "update" is unknown  (reportAttributeAccessIssue)

playground link

mikeshardmind · December 1, 2024, 4:19pm

Local use seemed to allow that, and the user in question also had it allowed. It seems pyright does not allow it now, but it has some other oddities where reveal_type is no longer working as it is supposed to. I’m honestly unsure what the discrepancy is, and it seems neither does pyright given this further example with a reveal type not matching the error.

Code sample in pyright playground

from collections import abc
from typing import reveal_type

__all__ = ("NAME_TO_ID",)

NAME_TO_ID: abc.Mapping[str, int] = {}
reveal_type(NAME_TO_ID)  # dict[str, int]
NAME_TO_ID.update({"a": 1})

and corresponding outputs:

Type of “NAME_TO_ID” is “dict[str, int]”
Cannot access attribute “update” for class “Mapping[str, int]”
Attribute “update” is unknown (reportAttributeAccessIssue)

However, whether pyright is now forbidding it or not isn’t necessary for it to remain true that this isn’t a specified behavior that typecheckers can rely upon when importing names defined at a module scope with an inline annotation, and is currently only something typecheckers are coincidentally cooperating on.

saaketp · December 1, 2024, 4:34pm

Michael H:

from collections import abc
from typing import reveal_type

__all__ = ("NAME_TO_ID",)

NAME_TO_ID: abc.Mapping[str, int] = {}
reveal_type(NAME_TO_ID)  # dict[str, int]
NAME_TO_ID.update({"a": 1})

This one doesn’t show any errors in the playground when I open, only the reveal_type output.
(Maybe some browser cache issue showing you older errors?).
And no error is fine for this case, because this case is safe, the dict.update here would run before you could assign a different type that doesn’t have .update method.

The first rule proposed seem to be already followed by pyright - dict is compatible with Mapping.
Second rule forbidding the local narrowing is not followed, but if it doesn’t cause safety issues, I don’t see why it should be mandated.

mikeshardmind · December 1, 2024, 4:46pm

seems so, a full refresh matches what you are seeing.

Nevertheless, I’d like this to be something specified in a reliable manner that works for interoperability and I don’t think the right call is to codify the specific behavior pyright is doing here, but a generalized and correct rule that can be consistent when considered in other contexts. I’d really not like to try and figure out under what specific cases the narrowing might be safe when in this case, we have a reason to know this should be invariant. It would be better to leave this to developer intent.

The only reason the second example is somewhat safe currently is an implementation detail of how imports work in CPython that isn’t guaranteed, and it can still be subverted, but doing so would be complicated and further distract from the core issue.

Eneg · December 1, 2024, 4:53pm

Seems like I was wrong. The Final annotation does matter: playground
(My actual use of this didn’t use SCREAMING_CASE but did use Final)

So this observation made by me is wrong, it does not treat them equivalent

mikeshardmind · December 1, 2024, 5:06pm

Focusing less on pyright here, I think the second rule is important for multiple reasons.

Developer intent should matter.
It should be obvious what is happening from a user perspective.
If inference was intended, the annotation could have been elided entirely.
This is something we can generalize here in this way where it applies to a public symbol and have a predictable consistent meaning in general.

The thread it was split off of had examples where the inference model for two type checkers diverged here, and the one with the correct to developer intent was treating the LHS as only the annotation, the other typechecker was inferring something that created a variance mismatch. I’d rather leave this resolvable by users in their own annotations than have a type checker guess at intent when having to consider the entire ecosystem and not just any one type checker.

Liz · December 1, 2024, 5:22pm

It’s not just typecheckers that would benefit from this. The ability to generate optimized modules using type information gets a little better if this were something that needed to be treated consitently, as then other tools could see that annotation and replace the dict with a more efficient structure in a read only context since it is declared as only needing to be a Mapping. I think the original example is a clear case where the developer has stated an incorrect type, they require more than the interface they have declared.

erictraut · December 1, 2024, 5:27pm

It’s not clear to me what interoperability issue exists here. If you’re able to, please provide a concrete example where pyright and mypy differ in their treatment of a symbol whose type is declared in one module and imported by a second module.

By “inline annotations”, I presume you mean an assignment statement that includes a type declaration? If that’s what you’re proposing, then I’m strongly opposed to applying any special meaning to an “inline annotation”. In fact, there’s broad consensus that a type annotation for a variable should be treated consistently regardless of whether it’s combined with an assignment. Mypy’s current behavior doesn’t conform with this, but mypy maintainers have been planning to change this for some time. This change will be somewhat disruptive for mypy users, so they’re saving it for the next major release (2.0).

x: Sequence[int]; x=[]
y: Sequence[int] = []

# These should reveal the same type
reveal_type(x)
reveal_type(y)

Type checkers perform local type narrowing within an execution context. Here, “execution context” means a body of code that has an independent code flow graph. In Python, execution contexts include modules, functions, and lambdas. The locally-narrowed type is displayed when using reveal_type. Don’t confuse this with a symbol’s declared type.

When accessing a symbol outside of an execution scope, pyright (consistent with other type checkers) uses the symbol’s declared type. For example, if you declare a symbol within one module and another module imports this symbol, its type is based on the declared type. The developer intent is clear from this declared type, and you should see consistent behavior across type checkers.

mikeshardmind · December 1, 2024, 6:03pm

I’ve been working on a combination type-checker and module pre-processor that can in some cases rewrite code. I’ve linked the precursor to it before. The diverging behavior here means I have to have a type checker that has to conflict with the type checker a lot of users already use because it’s packed into their IDE. It’s not safe to replace certain expressions without generating what would be a conflicting error message in the cases I’d need to reject that use the developer’s intent.

The problem with (pyright I guess, but in general) narrowing at site of annotations is precisely that this violates developer intent and doesn’t honor execution context.

foo: X = Y()

This is a single statement of assignment. At the same time as the assignment of an instance of Y, the developer is stating this is to be treated as X.

If you want to get down to the full order of operations, The actual order of operations here is that Y() is evaluated, then assigned to foo, where the developer has said use of foo should be consistent with X.

The developer has no means to indicate that it should not be narrowed by inference if inference is allowed. By contrast, if inference was intended and the user only cares about inferred consistency, the annotation could have been left out.

The developer did not do anything that should narrow this to Y after assignment at the time of saying “treat this as X”

The fact that this results in different behavior shows that this creates hoops a developer needs to jump through to have their intent honored:

def wrap_y_creation_as_x() -> X:
    return Y()

foo: X = wrap_y_creation_as_x()
foo.ThingFromY  # user intends this to error

(Note: this is required, cast requires turning off typechecking, and a type ignore both does that, and doesn’t work to prevent narrowing.)

Meanwhile, if this rule is adopted, the obvious fix, if the user meant Y, is:

foo: Y = Y()
foo.ThingFromY

The typing specification has this to say about function parameter annotations:

A type checker is expected to check the body of a checked function for consistency with the given annotations.

While it says nothing about inline annotations, ignoring developer intent with the temptation to guess, rather than honoring checking consistency specified means that developer intent is only respected in functions that are called, and only for parameters.

I agree with this, but I disagree with your conclusion here. I would argue that both pyright and mypy are failing this, and the rule I provided generalizes further and better. Assignment should not be treated as a narrowing action.

class A:  ...
class B(A):  ...

def foo(x: A = B()):
    # use x here

vs

x: A = B()
# use x here

In both cases, the developer has explicitly said x is a wider type than just B, enforcing B due to inference rather than enforcing A in either case is guessing that the developer is wrong about the annotation they provided, and instead of erroring, are silently guessing, leading to different issues.

The case in the other thread for this that wasn’t about public symbols was that function callers can reduce refactoring when declaring the minimum interface they require be returned if this is enforced, as it prevents accidental reliance on more than the developer specified a reliance on. This is an extremely useful property when using libraries that are slowly adopting typing to only rely on parts that you need to avoid future changes to more accurately describe something break your specific use. I view this one in particular as an interoperability concern, but from the user side, not the typechecker side.

mikeshardmind · December 1, 2024, 6:31pm

Here’s an example, though technically all the breakage is on one side, if you want me to extend it further, I can break both sides with similar means.

from collections.abc import Sequence
from mod2 import breaker

x: Sequence[int | str] = [1, 2, 3]
breaker(x)

y = [i+1 for i in x]

# mod2.py
from collections.abc import Sequence, MutableSequence

def breaker(x: Sequence[int | str]):
    if isinstance(x, MutableSequence):
        x.append("x")

If you want a simpler rule than the above, or to extend it with a 3rd rule, “assignment should not be treated as a narrowing action, but should be checked for consistency” would be the one with the least issues.

erictraut · December 1, 2024, 7:42pm

The example you’ve provided is not an “interop issue” as we normally use that term. If mod2.py were part of a library, the type of the symbol it exports (breaker) is precisely defined by the typing spec, and pyright and mypy are consistent in how they treat this type. Your sample demonstrates different local type narrowing behavior, but that’s not an “interop issue”.

I think there is an issue here, but it’s obscured by how you’ve framed the problem and the rules you’ve proposed. You’re focusing on the difference between “inlined annotations” versus non-inlined. I think that’s a red herring. As I stated above, I feel strongly that there should be no difference between the two, and there’s broad consensus on this point.

The real issue here is related to bidirectional type inference. In particular, how should the expression [1, 2, 3] be evaluated if the inference context is Sequence[int | str]? Should its inferred type be list[int] or list[int | str]?

def func(y: list[int]):
    x: Sequence[int | str]

    # Assignment-based narrowing:
    x = y
    reveal_type(x) # Pyright and mypy agree: list[int]

    # Bidirectional type inference:
    x = [1, 2, 3]
    reveal_type(x) # Pyright: list[int], mypy: list[int | str]

I’m not convicted that pyright’s bidirectional type inference behavior is correct here, and I’m open to changing it if there’s a good type-theory-based argument.

That said, I don’t think we’re anywhere near the point where we would want to define or mandate bidirectional type inference behaviors in the typing spec.

mikeshardmind · December 1, 2024, 7:46pm

I can provide a more detailed example that breaks the other side of the import statement, but it’s going to get complicated, as it relies on import loaders that are designed to defer execution until needed (hence me pointing out that the order of execution is only a CPython implementation detail, not a guarantee)

I was trying to avoid this by pointing out the theoretical unsoundness first, I’ll work up an example of this later.

oscarbenjamin · December 1, 2024, 7:49pm

The issue with MutableSequence doesn’t require any module-level variables:

from collections.abc import Sequence, MutableSequence

def func1() -> None:
    x: Sequence[int | str] = [1, 2, 3]
    func2(x)
    y = [i + 1 for i in x]
    print(y)

def func2(x: Sequence[int | str]):
    if isinstance(x, MutableSequence):
        x.append("x")

func1()

This is rejected by mypy but accepted by pyright. At runtime it fails.

If I change the annotation then both mypy and pyright will accept it and it still fails at runtime:

from collections.abc import Sequence, MutableSequence

def func1() -> None:
    x: Sequence[int] = [1, 2, 3]
    func2(x)
    y = [i + 1 for i in x]
    print(y)

def func2(x: Sequence[int | str]):
    if isinstance(x, MutableSequence):
        x.append("x")

func1()

mikeshardmind · December 1, 2024, 7:52pm

I’d also rather avoid specifying this in its entirety right now, but disallowing assignment to be narrowing is a step that will cover the currently visible places where typecheckers differ and bring them toward consistency internally across python typecheckers. This makes outcomes and intent clearer for end users of python the language, and reduces barriers to various things like contributing to projects that use a different type checker than you are used to. It also makes it behavior that is consistent with the meaning of type declarations in other languages^[1]. I’m 100% sure the rules above lead to consistent and sound behavior. (Some of them may require users relying on the narrowing to change an annotation)

If we later have rules for when it is safe to narrow at assignment without explicit checking (I don’t think it ever is, but I would need to seriously think much harder about this) we could adopt more complex rules.

Also in the other thread, it was pointed out that java and other typed languages refure to narrow for assignment. ↩︎

Liz · December 1, 2024, 8:21pm

If I may, I’d like to rephrase the rules at the beginning that @mikeshardmind came up with to eliminate the red herring, and borrow some of the consensus from the previous discussions about the matter, both recently and not.

Rewording this to generalize to the entire problem space requires more explicit rules, but it doesn’t require demanding anything about how type checkers implement it or what additional rules they might have to support developers with “likely error heuristics” (not not a type error) based on type information.

An annotation annotates an expectation of consistency with the symbol it annotates for all places the symbol is reachable from after the annotation.
In the case of multiple annotations, each annotation must be consistent given the execution context.
Re-annotation is allowed.
Widening to match an annotation is required for both function parameters being passed in matching the parameter annotation, and for the return type of functions to widen to match the current execution scope’s requested enforcement.
The above widening is a lossy operation, explicit renarrowing may be required to required to renarrow in some cases.
Where multiple code paths merge, the known type information should merge via a union as well.

Examples of this are a little tricker:

# reannotation
class A:  ...
class B(A):
    def foo(self) -> None:  ...
class C:  ...

x: A = B()
x.foo()  # expected error
x: B = B()  # reannotation allowed, it's consistent with prior stated intent.
x.foo()  # this is allowed
x: C = C()  # reannotation disallowed, it's inconsistent with prior stated intent.
x: A = A()  # reannotation disallowed, it's inconsistent with prior stated intent.
x: B = A()  # assignment dissallowed, value is inconsistent with both prior and current intent.

# merging code paths
from elsewhere import some_func  # () -> bool

class Base:
    def foo(self) -> None:  ...

class A(Base):
    a: int
class B(Base):
    b: int

if some_func():
    x: A = A()
else:
    x: B = B()  # prior annotation has no meaning here, we didn't come from there.

x.foo()  # allowed, both A and B have .foo()
x.a  # error expected
x.b  # error expected
x: A = x  # error expected
if isinstance(x, A):
    x.a  # allowed

mikeshardmind · December 1, 2024, 9:44pm

Oscar Benjamin:

If I change the annotation then both mypy and pyright will accept it and it still fails at runtime:

from collections.abc import Sequence, MutableSequence

def func1() -> None:
    x: Sequence[int] = [1, 2, 3]
    func2(x)
    y = [i + 1 for i in x]
    print(y)

def func2(x: Sequence[int | str]):
    if isinstance(x, MutableSequence):
        x.append("x")

func1()

I was avoiding presenting this version, because this version shows a different problem that exists with structural typing and current encoded information. Sequence and MutableSequence run into an unsafe internal generic type narrowing issue of a different kind because they aren’t considered to be mutually exclusive or retain variance information when intersected.

def foo(x: CovariantThing[_Co_T]):
    if isinstance(x, InvariantThing):
        # unsafe to do anything that isn't safe without this check.

While it looks like the first form is exhibiting this, it isn’t, it’s just what was a convenient application where everything goes wrong that was easy to demo, (It’s the weekend, I really didn’t want to write an ast based loader that steps through module execution as needed to show how something not guaranteed currently is what is relied upon here, and have postponed that to “later”)

It’s equivalent to this problem:

from collections.abc import Sequence
import random
opts: list[int | str] = [1, "1", 2, "2", 3, "3"] 

async def mutate(key: str, data: dict[str, Sequence[int | str]]):
    new_vals: list[int | str] = list(random.choices(opts, k=3))
    data.update({key: new_vals})

async def x():
    val: Sequence[int | str] = [1, 2, 3]
    data: dict[str, Sequence[int | str]] = {'0': val}
    await mutate("0", data)
    [i + 1 for i in data["0"]]

edit: example fixed, this is the problem of toy examples…

Which pyright does error for but doesn’t recognize the equivalence for modules

(the last line was previously)

[i + 1 for i in val]

with the imperfect analog of what is happening here.

It’s also possible to do the same with threading, and so on.

The only reason displaying this is so hard for the module case is the fact that at play for the module case, you have the dict being mutated being the module namespace dict, and lazy asynchronous execution of modules isn’t built into cpython, but the facilities to write it are and are under active research by multiple people for even how to include them into cpython for lazy imports. There are existing libraries that get partially to this already in the scientific python community.

mikeshardmind · December 1, 2024, 9:54pm

Thanks for rewriting these. I want to give it more thought, I’m not entirely sure your rules for reannotation are the minimum required and therefore what should be said if we want to improve consistency without blocking the ability for a correct, but opinionated type checker to interpret this in another way that is safe.

ImogenBits · December 1, 2024, 10:06pm

Is this not just another instance of the old topic of retaining narrowed type information while other operations might be changing the underlying object? We’ve seen many examples of that with narrowed info of a particular element in a collection, a variable across function calls, shared data in various parallel execution settings, etc. I’m not seeing what is new here that is specific to it being at the module level or related to the type annotation being on the same line as an assignment?

mikeshardmind · December 1, 2024, 10:13pm

Issue of it at a module level: internally modules are mutable-container-like. This isn’t specified for type checkers, this being specified independently of the annotation part should be clear and unambiguously good as it matches specified parts of python’s runtime (see link about module type in the first post)

The inline annotations isn’t just an issue inline, but in general. I would normally in a forum context edit the topic to reflect ongoing discussion updating to be more “covering it accurately”, but people have complained about this for mailing list use, the conversation above does get into where it’s more than an inline annotation issue, but that just makes it an issue with narrowing at the site of assignment, rather than narrowing at an annotation, not “not an issue”

Examples that don’t use mutable containers are difficult to use here, as stated above, demonstrating the issue means modifying import behavior without violating what is allowed by python for import behavior to defer execution. This goes just past what current import hooks do already into things actively being researched for use. Additionally, the issue itself here is the mutable namespace dict.