PEP 724: Stricter Type Guards

rchiodo · September 20, 2023, 4:16pm

One of the assumptions of this PEP was that TypeGuard functions like the one below were rare.

And that changing the TypeGuard definition to the expected definition would override the downsides of breaking these examples.

Introducing an alternative form, say StrictTypeGuard was seen as worse because:

TypeGuard would still behave in unexpected ways (like the str | int case I mentioned above)
People would have to learn this new ‘StrictTypeGuard’ and when to use it, making the type system even more complex

ntessore · September 20, 2023, 4:23pm

The examples I had in mind are not different from the one already given here. It is code that accepts type A | B where there is a fast code path for specific instances of A:

data: Array | Stream
if is_small_array(data):
    # array in memory and small enough to write synchronously
    write_array(data)  # expects Array
    return
else:
    # large or unknown size; pass to asynchronous writer
    return write_async(data)  # expects Array | Stream

NeilGirdhar · September 20, 2023, 4:46pm

I guess under PEP 724, you could write this:

data: Array | Stream
if is_array(data) and is_small(data):
    # array in memory and small enough to write synchronously
    write_array(data)  # expects Array
    return
else:
    # large or unknown size; pass to asynchronous writer
    return write_async(data)  # expects Array | Stream

where is_array is a type guard or instance check, and is small is an ordinary function.

To me, that seems more logical since the type logic is separated from the non-type logic.

If you’re worried about duplicating work in some cases, I think you can do all the work in one place and feed it to both functions:

stats = collect_stats(data)
if some_type_guard(data, stats) and some_ordinary_function(data, stats): ...

mdrissi · September 20, 2023, 5:17pm

On backwards compatibility, I think it is weird to have stronger requirement for pep here given that currently type checkers and typeshed make similar backwards incompatible changes regularly and generally rely on mypy primer/similar to estimate impact and then decide.

Often backwards incompatible changes do not affect most codebases and tend to hit trickier/advanced cases. My own codebases do use a lot of less common parts of type system that updating type checker version pretty regularly leads to type errors changing. If you visit release notes for mypy/pyright you will see that many times per year intentional behavior change is made. These changes generally do not involve pep process and are usually treated as library specific decision. For packaging equivalent, I would compare to many incompatible changes in setuptools/pip do not involve pep and are treated as internal issue/decision of that library.

The motivation for pep is that this is changing a previously stated pep behavior. But from the user experience the change here feels similar to many behavior changes made today that are backwards incompatible in some cases and are treated as type checker internal choice. One related issue is number of backwards incompatibilities in sense of what type errors are shown often are debateable bugs/trade offs in correctness vs pragmaticness. Type checkers sometimes allow code that in theory is unsafe and can lead to errors but usually in practice works. These deviations often are not documented as type standard are handled as type checker internal. Other area is that this pep focuses on inference/narrowing behavior. A lot of that behavior today is not defined by any pep. What are rules for match/isinstance and type checking? Type guards are meant to allow custom isinstance like narrowing but isinstance rules I do not think are currently defined anywhere and each type checker may vary. For most common cases they agree but advanced cases you will see differences.

Edit: One example very similar to this pep is how should TypeGuard and TypeVars interact with each other? If you have a function that uses typeguard and narrow a variable that is annotated as a type variable does generic ness of variable get preserved? I have few places in code that change type checking behavior depending on this but I think answer here is currently uncertain and place where type checker may choose either view across versions.

effigies · September 20, 2023, 5:27pm

If you’re using TypeGuards, you’re already pretty far down the typing rabbit hole, IMO. I would approach it as follows:

Here’s StrictTypeGuard. The True and False conditions both narrow the type. You will almost always want to use this one.
In some specific cases, such as when several conditions allow you to follow an optimized code path, you may want to use the weaker TypeGuard (so named for historical reasons). In this case, only the True condition narrows the type.

The point is that:

if can_fast_track(some_var):
    # fast track
else:
    # slow track

is not uncommon. It seems silly to lose the ability to type this when can_fast_track() implies a type narrowing, but not can_fast_track() doesn’t. Sure you can refactor, but I think progression of the typing sublanguage should decrease the amount of code I need to refactor to satisfy mypy, not increase it. I thought that was the overall goal of building out the type system.

sirosen · September 20, 2023, 7:54pm

Thanks for the correction – the stdlib docs make this sound like it’s unspecified today. And I think there was a sentence in the original PEP which looked like this too, but now I can’t find it.

This PEP says that mypy_primer was run on 25 code bases. That seems small? Can it be done on the whole list of mypy_primer projects?
Like you hinted, there are other code bases out there which we can’t reasonably include. I’m not sure if it’s possible to refine this strategy enough to build very high confidence.

I don’t have a better approach. In the presence of closed source projects, knowing the impact a priori is always a guess.

Gradual and backwards compatible change is much easier to reason about with new names. If gradual-ness is a goal, StrictTypeGuard is worth reconsidering.

Symbol names and runtime implementations are the parts which the stdlib controls. So they seem to me like the most powerful tools for keeping type checkers in sync. Plus, they’re the only thing which a library author can control – I can’t determine which type checker or what settings will be used by my users.

Typing symbols can also be deprecated and removed, without necessarily being moved or replaced. I’m not sure if it’s wise to pursue this thought, but if StrictTypeGuard appears in 3.13 and TypeGuard disappears in 3.15, it’s very easy as a user to reason about the change.

Given that StrictTypeGuard was rejected, it feels like gradual change and maximum backwards compatibility is not a goal in this case. (Which, as a typing user, is probably fine, IMO, but I wish there were a better way to assess that…)
But I think the Rejected Ideas section should acknowledge that it has the advantage of being gentler and more gradual as a change. And that the rejection is based on weighing that against the increase in complexity and subtlety and finding that the balance favors changing TypeGuard “in place”.

pf_moore · September 20, 2023, 10:32pm

The difference I see is that this is a change to behaviour that is specified in the CPython documentation, for a type that’s in the CPython stdlib. I’d view that as equivalent to a change to behaviour specified in a packaging interoperability standard, which is subject to strict backward compatibility requirements.

It’s already been pointed out that this breaks real-world existing code (where a type guard is used to trigger a fast path that only applies for a particular type). If I were the writer of such code, I wouldn’t be happy with viewing the change as “type checker internal choice” - I’d be complaining to the checker that it had broken my code. And I would not be happy to be told I should refactor perfectly good, working code, just to get the previous behaviour. Honestly, I’d probably react by saying “stuff this, then” and just revert to using a bool return type - which really isn’t helpful in any way for improving the type safety of my code.

My first reaction on seeing TypeGuard (as a result of reviewing this PEP!) was to think “that’s neat, it would be useful to be able to say that this check confirms we have a particular type”. I didn’t think of it as particularly “far down the rabbit hole”, but rather as a way of avoiding casts and asserts (which I consider ugly, and avoid at all costs) to tell the type system things that from my perspective it “should know”.

So I think you might find more people trying out type guards than you might expect… And they may well not be experts - another disadvantage of popularity

Note that my instinctive reaction was probably to expect the existing PEP 647 semantics. Although, to be honest, I don’t think I even thought about the “return False” case, so maybe it’s better to say that I had no expectations beyond PEP 647.

mdrissi · September 20, 2023, 10:58pm

The issue is changes of this nature happen very frequently today. PRs multiple times a week will change type checker/typeshed behavior on real code that leads to errors appearing/disappearing. If you complain about this kind of change then you should expect to regularly complain about changes that create incompatibilities of this severity on mypy primer.

In practice most of errors that change are considered rare enough to be acceptable. Or new behavior is decided by maintainers to be improvement without any PEP. If you look at pr history of mypy/typeshed/pyright they all have a CI check that measures change behavior on real code for some codebases. Some of those changes are bug fixes, but clear backward compatibility breaks are common experience for “advanced” enough typing. The word “advanced” is one very unclear part here. I don’t mean advanced as positive/negative just referencing less common features or areas where exact behavior is undefined today and common cases work out, but details vary by release/type checker.

As concrete example, Releases · microsoft/pyright · GitHub has weekly releases. Behavior changes are documented in weekly notes and there is backwards compatibility break where some code that created type errors before/passed type checking changed. Each week it is common today for “advanced” behaviors to be changed in backwards incompatible ways. Yesterday’s release notes even have a change that affects more code then this PEP based on primer checks and was behavior change considered internal type checker choice. My experience is upgrading type checker on large enough codebase (~30K lines of code) with high enough usage of typing features errors most of time change. I expect to add a few type ignores/adjust a couple lines most times I upgrade pyright/mypy version and am happily surprised if a new version doesn’t change some behavior. Sometimes new bug is discovered in my code, sometimes a type checker becomes more strict in a way that’s harmless for my code, sometimes I disagree but understand rule change. There was one recent change that led to many errors appearing for my codebase for a behavior I’d consider much more common then this PEP, how should passing **kwargs be type checked.

Lastly, I’m not against more stable specification. But today type specification is very incomplete today for behaviors like this so if you use features like typeguard/overload/generics/paramspecs heavily than instability is current norm. It doesn’t seem like this specific PEP should have higher backwards compatibility standard when many other changes done at same time in github issues/discussions will not go through PEP process and commonly (multiple times per month) lead to larger backward incompatibilities.

effigies · September 20, 2023, 11:58pm

I agree with this, and didn’t mean that you need to become a typing expert to see the value. (I definitely don’t consider myself one…) My point was just that by the time the problem these solve became clear to me, I think I was prepared to understand the distinction between TypeGuard and StrictTypeGuard semantics. I think we can have both and teach both, and that the reasons for rejecting StrictTypeGuard do not ring true for me as a non-expert.

pf_moore · September 21, 2023, 8:43am

All I can say to this is that I consider it a pretty terrible situation, if we’re trying to present type checking as mainstream. I’ve not encountered this myself, so maybe I don’t do “advanced” stuff, but I wouldn’t be happy if it happened.

I did say that my points were general, not specific to this PEP. But I think that with wider community participation - partly due to discussions now being on Discourse, and partly just increased popularity - I think there will be a demand for tighter control. And yes, that may mean “stop making breaking changes without a PEP”.

To be clear, I’m only talking about changes to behaviour documented in a PEP or the stdlib docs. The problem with this PEP is that it adds behaviour that the original PEP explicitly prohibited:

User-defined type guards apply narrowing only in the positive case (the if clause). The type is not narrowed in the negative case.

So to that extent, this PEP should be held to higher standards than one that simply gave meaning to a previously undefined behaviour.

apparebit · September 21, 2023, 4:52pm

I think @NeilGirdhar’s point is important: If you wanted to correctly annotate is_even_int(), then the type argument to TypeGuard shouldn’t be int but some qualified type. After all, the predicate is testing for something stronger, which I don’t think Python’s type annotations can currently express. So the better alternative for this case is to separate the type and value tests into two predicates.

NeilGirdhar · September 21, 2023, 5:11pm

You’re probably right that the situation can be significantly improved. But I think we have to recognize the benefits of the situation as well: The typing people have built a masterpiece in a few short years. And they’ve done that in part because they are free to innovate without PEPs and free to change their minds without years of deprecation periods.

Consider an issue like this: PEP 698 was updated after a five minute discussion rather than a whole other PEP. I agree with you wholeheartedly that the result of the discussion should be documented with typing.overload.

The code works exactly as it used to, but the type checking may produce different errors. That’s a much less significant problem, and I think it’s unfair to characterize it as “breaking”.

If a project wants to rely on type errors not changing, they can pin their type checker version. Most users don’t pin and are instead thrilled when type errors change. Usually they disappear, and so the only thing you need to do is remove ignore directives. Sometimes, they reveal problems you didn’t know about, which means adding ignore directives or refactoring. The code seems better after these changes. I encourage you to actually use type checking for a while to get a feel for this experience.

Funnily enough, there appeared a proposal for a new typing governance process. However, I personally think that the control should be unlike the PEP process, and not have the backwards compatibility guarantees and deprecation periods that are being suggested—at least not for most changes and at least not yet. In my opinion, (we) typing users benefit more from typing being flexible enough to change without such a heavy guarantees than we would gain from “tight control” and “strict guarantees”.

This a very good point. Maybe typing PEPs will be replaced by a different process (as per the new governance process). And although typing is documented in the standard library documentation, there may be a way to allow typing to keep changing quickly—either by editing old docs, or by documenting typing somewhere else.

We don’t even have one good example yet of a non-strict type guard, so why would you want to “have both”? In the long run, it seems that we only want the strict type guard?

Library users don’t see the type guards that are used in library code. They only see the interface. This change to type guards would not change the meaning of library interfaces.

sirosen · September 21, 2023, 5:11pm

I also consider it a pretty bad situation. Every time there’s a change in behavior and stuff breaks, it erodes the trust of people who added type hints without expecting this kind of ongoing maintenance burden. e.g. Look at the experience and tickets in the pallets projects, where type hints are a response to user requests, not the maintainers having an active interest in type hints.

To @mdrissi’s point, I don’t know that TypeGuard is a particularly important place to make this case, but sooner or later these kinds of backwards incompatible changes need to stop. Even if we set some basic time horizon like “by the end of 2025 we expect there to be a stable spec to which there will be no single-step backwards incompatible changes”, that would be better than iterating on this same pain-point in discussions endlessly. I feel like I’ve had this same conversation many times on this forum.

Yes, it’s the status-quo, but there needs to be some vision for getting away from it or at least reducing the level of churn to something more acceptable.

I think that proactively pursuing that state even at a micro-level, with additions like StrictTypeGuard, does a great deal of good.

One other thing I want to call out is that I don’t like qualifiers like “advanced” or “niche” for typing usages. If they describe me, then almost everyone using typing and running pyright or mypy on their code is “advanced”.

I write python code for the runtime and “explain that code” to the type checker. That jives with any libraries which had well-defined interfaces dating back to a pre-typing era of python, but which now want to provide type hints.

I will use whatever tools typing makes available to me to accomplish that goal. I’ll use a Protocol with __call__ instead of a Callable to express keyword arg types, because Callable can’t do it. Does that make me “advanced”? I have some object which accepts a callback, and the callback signature has kwargs. Sounds pretty ordinary. If we want to call expressing that case “advanced” then what realistic large programs don’t have advanced types?

I read the typing docs the same way I read the logging and re docs. I have a job to accomplish, and I read through the descriptions of my available tools until I find the right one. Yes, I have a good sense of which tool to reach for from experience, but no, I don’t know offhand how to write a non-capturing lookahead regex or a type var tuple. That’s what the manual is for.

sirosen · September 21, 2023, 5:21pm

I meant this more as a general point regarding backwards compatibility and changing the meaning of a symbol in typing, but let’s stick with how it applies to TypeGuard.

Suppose I have a library called my_guards which provides TypeGuard-typed interfaces. Taking an example from this thread:

from my_guards import is_small_array

As the author of my_guards, I’m using mypy, but a user, UserFoo, is using pyright.
pyright releases the change to implement new TypeGuard semantics, but mypy has not released yet. UserFoo sees a “broken library”, but I see all of my wonderful typing and runtime tests for is_small_array working fine. UserFoo files a ticket and I’m potentially confused because everything looks fine to me.

The maintainer situation here is not great. Without a mypy behavior for this, it’s harder for me to ship a fix for UserFoo because I can’t test it, and that’s even once I confirm that UserFoo is seeing a real error and isn’t just misusing the library.

These situations are real already with multiple type checkers. I’ve had threads where I learn that I’m using mypy and a maintainer is using pyright, and we get different types (“wrong” for me, “right” for the maintainer). Even with everyone ready to adapt to one another’s usage, and savvy about what’s happening, it’s unclear what to do about the conflicts.

Run through my example scenario again with the introduction of StrictTypeGuard. Note how pyright would release support for StrictTypeGuard ahead of mypy, but my my_guards library would remain “correct” at all times under both type checkers.

NeilGirdhar · September 21, 2023, 5:33pm

This is a really good point. Your desire for conformance to a standard may be shared by many people in typing, but as Jelle points out, diverging behavior may take a while to be resolved.

Yes, but in this case there are very few people using type guards in a non-strict way. Additionally, those people should probably not be using type guards in a non-strict way in the first place.

effigies · September 21, 2023, 6:20pm

I don’t think it’s the job of the typing community to decide whether I’m writing my Python code in a way where strict or non-strict type guards are a better model. I think its job should be to provide constructs that permit the way people actually write Python to be captured by the type system.

There are situations where adapting to the type checkers has made sense, because the failure reflected an actual ambiguity in the code – namely, a function can return many types, callers should check what they get back or use a more targeted function, or else they risk a failure with unexpected input.

This case is not at all the same. This is telling a developer who has written a function that is well-described by a weak type guard that, no, they should be writing functions that are well-described by strict type guards because most people’s use cases for type guards call for strict ones. Using a PEP to micromanage the decision of how to write if/else statements seems wildly inappropriate, no matter how small the number of refactors you will be forcing is perceived to be.

NeilGirdhar · September 21, 2023, 6:28pm

What do you mean by “Python code”? We’re only talking about typing code, right? And surely it’s the job of the typing community to decide about typing constructs.

You can write whatever condition you want in Python before and after PEP 724. After 724, you would only be able to use type guards in a strict way. I don’t really understand the pushback against that decision?

That hypothetical developer can simply change his return type to bool or refactor his code? That’s a small price to pay compared with the alternative, which is to have both strict and non-strict type guards for a few years while non-strict type guards are deprecated, and then finally removed. Why go through all of this trouble for a vanishingly small amount of code that must use non-strict type guards?

The if/else statements haven’t changed though. The code runs exactly as it used to.

effigies · September 21, 2023, 6:46pm

Several people in this thread are arguing that they should not be removed.

sirosen · September 21, 2023, 7:01pm

This is precisely the issue, right? Some of us want more stability even at the expense of adding more constructs and complexity.

The typing maintainer community generally wants the freedom to make these changes because they see them as better for the long-term health of the typing components of the language. But that’s not aligned with what a segment of the user community wants, which is for typing semantics to prioritize stability more highly.

If we’re going to talk about only the practical side of the matter, TypeGuard is probably fine to change in-place, as PEP 724 proposes, and that it’s no worse than other changes happening today. It wouldn’t impact much code as far as we can see from a very limited scan of open source projects. But there isn’t a very good rule right now for deciding what is okay to change and what isn’t, and this is part of a pattern of behavior which used to be fine but which I don’t think is long-term sustainable.

I’d like to see the attitude of typing shift, more towards stability at the expense of “cleanliness”, which is more like the stdlib. Try proposing a behavioral change to a stdlib function and you’ll probably be told “no” – even if your proposal would be an improvement for most users, the stability requirement for the stdlib is very high.
Changes are still made, but not without really good justification.

mdrissi · September 21, 2023, 7:29pm

I’ll start by noting I think past several posts on my side/others are more about typing system as a whole and fit better if split in separate topic (Typing Stability/Documentation) vs this PEP. Could be moved to new Typing discussion area. They are good discussion and I think stability of typing and expectation of typing peps is important, but they apply to all typing peps and not really this one in particular.

The main issue with this is many of these “advanced” behaviors were not worked out enough. PEPs/documentation we have today is designed for core ideas/goals of feature. Details are often incomplete and true complete spec with look closer to research paper and require a lot more formality. Typing historically has value of pragmaticness and that it’s better to enable safer code/easier typing features in spite of full specification being incomplete and unknown. If PEPs for typing feel like they are too detailed today, they are too light on rules/details for goal of consistency/stability.

I’ll use one of your examples too as it fits well.

I will use whatever tools typing makes available to me to accomplish that goal. I’ll use a Protocol with __call__ instead of a Callable to express keyword arg types, because Callable can’t do it. Does that make me “advanced”? I have some object which accepts a callback, and the callback signature has kwargs. Sounds pretty ordinary. If we want to call expressing that case “advanced” then what realistic large programs don’t have advanced types?

The usage of protocol for callable is known as a callback protocol. This usage is undefined by peps/standard library documentation. The rules for callback protocol have evolved in backwards incompatible ways over past year/two. Expecting stability here when you are relying on feature that was never specified in PEP/standard library seems difficult. I think mypy has documentation on this, while pyright does not and has discussion over github issues on details. Many features that users use like callable protocol, there typical usage you can reasonably infer, but details of their usage is undefined behavior/inconsistent.

This example is also nice as mypy’s behavior on Callable types and kwargs changed in past couple days where it’s about to start allowing TypedDicts to be unpacked there motivated by user request. That behavior change is safe in sense of it adding new feature and existing code with no errors will have no errors. It does mean code with errors before will no longer have errors for type feature not defined/decided by any pep process and is another inconsistency on Callables.

I think today typing works in practice because rough rules and adjusting details as user reports appear has been successful. I’ll often report “bugs” were issue ends up becoming discussion over what is right behavior here/was PEP even clear.

Edit: I also view type checking as closer to pylint/ruff checking. Runtime behavior of your code should not change in backwards incompatible ways. The inference rules/type errors you see from mypy should be similar to pylint/ruff in expectations. If pylint changes its rules to be smarter in some way that is not expected to go through any pep process. Code linters are allowed to make changes as the maintainers find reasonable. And if user runs pylint a library change can influence pylint analysis results similar to type checker. Main difference is pylint is generally laxer then mypy/pyright as it has less type inference knowledge.