Inference on Literal Types

Dr-Irv · November 30, 2024, 4:22pm

Oscar Benjamin:

I generally want to have this separation between types as API contracts and inferred types for local code analysis and mostly I like the way that pyright handles this. A situation where pyright regularly frustrates me is actually the failure to separate these things for function parameters:
def func(numbers: list[int]):
    numbers = set(numbers)
    # set[int] not assignable to list[int]
Here I want the annotation list[int] to be a statement about the types that the function should accept as arguments i.e. I want it to be an exported type constraint. For local code analysis though I want to be able to rebind the local variable to another type. With mypy and pyright there isn’t any way that I can distinguish between an external annotation for the function parameter and an internal annotation for the local variable or any way to have a local annotation that does not apply to the whole function. It is straight-forward to work around this by having a new variable numbers_set = set(numbers) but adding annotations to a million line codebase is made much more laborious by this sort of unnecessary churn that doesn’t actually improve anything, doesn’t reduce bugs etc.

I disagree with you here. I actually find the pyright behavior here to be good. Changing the type of a variable via assignment is not a good coding practice, IMHO. In the case of my team, pyright has caught a few bugs when we did this by mistake.

I come to this with experience in working with large code bases written in strongly typed languages (C++, Java). If some form of those checks that exist in those languages are available in python, then code quality will improve for my team and decrease bugs by detecting them when writing new code.

I agree that annotating an existing large code base that has no annotations can be frustrating with all type checkers, because as an untyped language, python allows you to do things that are not “safe” with respect to strict typing. But if you are writing new code, any help the type checker can give you is a bonus.

Which brings me back to the example at the top of this long discussion that initiated the discussion… That was similar to new code a member of my team wrote, and it’s a case where the pyright behavior was non-intuitive because a type alias declared as a Literal was not treated the same as a type alias corresponding to a class.

oscarbenjamin · November 30, 2024, 7:30pm

I knew that some people would disagree with me. You say your background is C++ and Java in which such things are just impossible. I’m very comfortable with rebinding a variable to a new type because that is a natural thing to do in Python, and it often it just doesn’t make sense to use a new name. There is nothing weakly typed or unsafe about rebinding a name to a new type as long as it is done in all branches.

It seems that pyright has an option reportAssignmentType = false that can be used to prevent it reporting an error but then it gets the inference wrong:

def f(n: list[int]):
    n = set(n)
    reveal_type(n) # list[int]
    n2 = [1, 2, 3]
    n2 = set(n2)
    reveal_type(n2) # set[int]

Eneg · November 30, 2024, 7:41pm

Pyright honors SCREAMING_CASE = constant, without the need for Final. Assignment to it at any later point will be reported as an error.

mikeshardmind · November 30, 2024, 7:54pm

This is a standards problem because it is a public symbol exported by the module. Other type checkers can’t safely interact with that without defining it.

mikeshardmind · November 30, 2024, 8:12pm

I disagree with both of these, but I think there’s more to it here.

It shouldn’t matter what branches it happens in as long as it is type consistent with respect to the directed graph of code.
It should be possible but should require re-annotation (something some type checkers currently reject) if rebinding with a new type to ensure it isn’t an accidental conflict with a previously stated developer intent for the type of the value that symbol references.

Dr-Irv · December 1, 2024, 2:39am

Well, it may be natural to you and other Python programmers, but I would argue that it is not good programming practice. When I taught introductory computer programming at a university many years ago using an untyped language, students who did that would get docked a few points.

I don’t want get into a religious debate about programming style, but I’ve been writing code for 45 years, including development of commercial software, and have developed a set of guidelines and best practices that I use along with my team that has helped us produce more reliable and robust code. Changing the type of a variable is, IMHO, not a good practice, and makes code harder to understand.

ImogenBits · December 1, 2024, 4:31am

In the example you used for this, there is only a single scope. Type annotations applying to every occurrence of a variable in a scope is pretty standard behaviour afaik. Why would that be not be a reasonable assumption?

I can’t get either mypy or pyright to error on this without the ignore. What is the issue making it needed? In any case, wouldn’t the easiest (and best imo) fix be to just add an explicit annotation that supertypes both functions? That is what the developer actually wants here since that’s the interface you can use later.

I don’t think a proposal to just throw out all of the work done so far and all of the behaviour people are already relying on is realistic. Especially if the issue at hand is whether the spec should specify particular inference rules in fairly niche situations.

mikeshardmind · December 1, 2024, 1:17pm

The if branch is mutually exclusive to the else branch. You viewing this as normal is you internalizing rules that exist in type checkers that don’t match how python works.

No, This should naturally just be the union of the two branches. Inference should be fine here when used in a way that is valid with both branches.

If this was a one off occurrence, I would agree with you. This has been an ongoing issue where things that really should be specified aren’t and don’t get specified. There’s all sorts of real issues of specification that end up stalling out. An examples that’s been known problematic for years is the lack of a well-defined definition for Any as a base class. This was explicitly allowed in 3.11, and still does not have a well-defined meaning.

Another is type checkers not handling things like staticmethods properly matching the language.

This also isn’t just about inference, please see the examples above where typecheckers using this behavior also leaks to publically exported symbols without defined behavior creating unsafe narrowing.

oscarbenjamin · December 1, 2024, 1:38pm

Maybe there is not an agreement on what the word “scope” is intended to mean here. In usual non-typing Python terminology scopes are local, nonlocal, global etc. A function has one scope in which there is a namespace of local variables. If an annotation applies to the name of a local variable then it is natural that the scope of the annotation be the whole function because that is the scope of the name itself.

In typing, inference rules can be different in different branches of if/else as well as before or after asserts or other statements. That defines a different “scope” in some sense but I don’t know what is the right name for it.

mikeshardmind · December 1, 2024, 1:48pm

Perhaps we have non-useful pedanticness happening since the example and what the issue with the example were clearly expressed in a way that the specific murky word choice of scope shouldn’t have been unclear.

Here’s an unambiguous phrasing:

Type checkers take annotations and apply it poorly to code that is unreachable from the point of the annotation because they aren’t required to understand code flow in python.

ImogenBits · December 1, 2024, 3:07pm

I don’t think that is what is happening here at all. Pyright clearly does understand code flow, it heavily uses it in other cases. There was a conscious choice about what a type annotation means, namely that it defines what types can be assigned to a symbol and that this does not change throughout the lifetime (i.e. scope) of that symbol. If you want type checkers to make a different choice here, that’s totally fine but it doesn’t have anything to do with the spec specifying things about what code is reachable.

This also isn’t needless pedantry. If we want to talk about typing rules in a more useful way than “This should do whatever I personally think it should do in precisely this example” then we need to make clear rules what every component of the system means. There are several completely seperate issues being discussed in this thread and people are throwing them together because it’s all vaguely inference related. That just isn’t a productive way to work on things. The spec clarifying precisely what a type annotation expresses is completely unrelated to narrowing of expression types due to assignments, handling of literal types in collection literals, safety of assuming unassignability because a module-scoped var has an ALL_CAPS name, inference of unannotated symbol types, and probably 50 other things that have been discussed here that I’m forgetting now. If you want to talk about one of these things it’s probably best to make a new thread dedicated to that individual thing and concretely discuss only it. The general idea of code flow analysis just is too broad of a concept to make any concrete rules with.

mikeshardmind · December 1, 2024, 3:32pm

It both does have to do with the (lack) of specification around this, and that hits upon the general issue that the spec isn’t serving users.

It definitely is needless pedantry to intentionally pick a definition that would seem to make the example not work and instead of ask for a more specific definition, say the example has no issue when an issue of ergonomics was given. It should have been clear that the definition you picked couldn’t have been what was meant, and you chose to use that to say the example was fine rather than to ask for clarity. If you want better clarity and see an issue of it, ask for it, don’t intentionally use a definition that suits your needs but not the concrete example given.

mikeshardmind · December 1, 2024, 4:03pm

I feel quite the opposite. Without a holistic view of the problems users are facing, it is easy to write rules that later conflict with each other when people try to keep the rules too narrowly scoped. This is one of the issues I have with the way the specification was stitched together. When discussing the current state and issues that users are currently facing, getting information about related problems, even if they need to be addressed separately when it comes to specification language, is useful and important for creating a consistent system.

I have at this point split off the module-level inline annotation issue into its own thread as I feel I have enough working information and a demonstration of the current lack of specification at a point of interoperability to do so.