A more useful and less divisive future for typing?

mikeshardmind · September 26, 2023, 10:13am

Specification (language level, currently just a collection of PEPs) describes the type system, type checkers (various) then choose what to do with them. There’s a bit more on how the standards haven’t actually spelled out everything that’s important or relevant to a type system, and good reasons to not have diverging behavior here, at least not largely diverging.

No, Anything I wrote here wouldn’t necessarily translate into the codebases of existing type-checkers.

Well, the changes I’d need are small and non-breaking, and only describe the type system, not runtime behavior. There’s no change to CPython needed, only to the specification of typing behavior. These changes are only to make one behavior specifically better defined, and specifically allow more from the type system.

Define subtyping for Python using the set-theoretic model. This model is 1:1 compatible with how type checkers currently treat subtyping but also covers cases the type system does not yet consider, such as Intersections. (Non-breaking, but gives a stronger foundation for future work)
State that type checkers are allowed (not required) to make complex type inferences based on consistency of usage even in untyped code and narrow Any based on observed use.
Type checkers may (not must) detect incompatible use of Any due to inconsistent use.

For people not using a type checker, nothing changes. For people using a type-checker, they may get more errors detected if the type-checker they use can suddenly detect more issues. It would be on type checkers to continue finding the balance spoken about with useful errors vs false positives spoken about above

Those are the only changes needed here for this, and I think that these two being specified is a good thing even if this never materialized fully. It’s one possible benefit from actually having definitions that are stronger and enable more.

Having an agreed-upon definition for subtyping is pretty important for a type system (we don’t have one of these currently)

The second one opens the door for type checkers to do more with untyped code for users without it violating if not the letter of specification, the original spirit that did not seem to envision there to be a good case to.

I think people overestimate what the required changes are for this, and it’s not even my primary goal in this discussion. It is to point out potential benefits we can gain over time by having better definitions and consistency in those definitions, and that stronger definitions do not inherently conflict with the other end of the spectrum on the usage of typing.

guido · September 26, 2023, 3:48pm

I’m sorry, I was on vacation last week and have had to skip some discussions. In this thread there’s a lot of reference to a “set-theoretic” type system for Python (IIUC it is Michael H’s proposal?) but from the one example given (about a function that uses x = x+1; print(x)) I cannot understand the proposal (other than that it might be going back to duck typing, and proposes a technique that AFAIK is used by pytype). Maybe someone can link me to a thread or blog post where this is discussed in more detail?

pf_moore · September 26, 2023, 4:05pm

The example was mine, and it was meant to counter the idea that making the type system more “set theoretic” would mean people would have less need to deal with complex types. My intention was precisely to make the point that type inference isn’t going to be able to handle duck typing, so there will always be a need for people to explicitly write types, and therefore being able to express “reasonable”^[1] type constraints concisely and understandably is important.

Sorry - my example wasn’t particularly good, and my follow-up didn’t help much as I rapidly got out of my depth with the talk of set-theoretic type inference…

whatever that means! ↩︎

hansgeunsmeyer · September 26, 2023, 5:51pm

I would say: Go for it. Why not? Put up a description of the system or, better, an alternative, improved implementation of the typing module (fixing some of the inconsistencies you have pointed to on the GitHub). Then everyone can determine whether this actually works in practice and improves the current system.

The current discussion seems too abstract (too futuristic?) to me. (It’s getting close to the level of abstraction in David Beazley’s PyCon talk - 2021 - “No, not typing. Types.” Subtitled “a tone poem” - You can find it on YouTube. Scroll to about 37:20 for the TL:TR “What was that?!” indeed It’s a very interesting and amusing talk, definitely relevant to “typing”, I think, though I cannot really say I know what it was about, even after watching the whole thing.)
(Beazley also makes some interesting comments about exceptions, asked about a comparison with exception handling in Rust: there is this strange thing that exceptions in Python seem to fall completely outside the typing stuff – at the very end of the talk: 46:40.)

sirosen · September 26, 2023, 6:42pm

I’d like to go back to the very title of this post and dig into something that was said in an earlier exchange. Paul called out the distinction between app developers and library developers – and he was under the impression that library development had been ommitted as an important use case. It seems that was a misunderstanding, and I’m sure it can be cleared up further by diving deeper at a technical level. But. That exchange was born out of a very frequent situation, in which advocates of a feature or change in typing speak primarily from the perspective of application developers and ignore or do not prioritize the needs of library developers.

I recall this in particular happening with Unpack[TypedDict] for kwargs, and again happening in a small way with StrictTypeGuard/TypeNarrower.

In pursuit of a “less divisive” future, I would like to see PEPs more directly address library author use cases. That will also have the beneficial effect of socializing the norm that “library code is also relevant”.
That could be as simple as a statement that a feature does not have obvious utility for libraries, but we are hopeful for future usages, etc. Discussing the usages for typing behaviors in these two different contexts – application vs library – is, IMO, sure to be beneficial in almost all cases.

pf_moore · September 26, 2023, 7:24pm

I was talking specifically about @mikeshardmind’s “set-theoretic” suggestions (as I understood them).

In terms of general typing, I think library development is covered fine. Sometimes I think people forget that library developers might want to define a permissive (in a typing sense) API, and doing so involves fairly “advanced” typing features. I say this because I’ve hear people state that “you shouldn’t need complex typing features in typical code” (or words to that effect). But the intention to cover library usage is clearly there, and I don’t want to complain just because people have a different perspective than I do.

In terms of @mikeshardmind’s proposal, he seemed to be suggesting that type systems should be able to infer most things without the developer’s intervention. In saying that, I feel that he had missed the library use case, and my “duck typing” example was an attempt to point that out. As a better example (I hope!) I don’t see how it’s even theoretically possible for a type checker to infer a useful type for

def low_bit(x):
    return x%2

The library author simply has to state their intent if the function is to be assigned any sort of usable type. After all, def low_bit(x:str) -> str is technically valid, but clearly not the author’s intent, here… And that brings us back to the type system having to make it easy for developers to express their intended types.

hansgeunsmeyer · September 26, 2023, 7:42pm

In isolation that seems definitely possible to me - and @mikeshardmind also suggested a way to do so - kind of reminiscent of the way Traits work in Rust (though Traits are not inferred). You would not infer one given native type, but a set of conformant types or traits which itself would be part of the type algebra. But… If the type inferencer has to do this in the context of a complete program, where the input x also needs to be traced back, wouldn’t the general problem quickly lead to an explosion of exponentially many possible code paths? So, could we have a program that is both correct and fast enough? I doubt that very much. But that’s a challenge to Mike (As far as I understood it the “intersection” problem would come into play here but I’ve a gut feeling that that maps to 3SAT or some other NP-complete problem…)

Despite this - having a type inferencer of some sort, even if not complete, or fast, might make tools like mypy more useful. But to see that, someone needs to write some prototype…

pf_moore · September 26, 2023, 8:09pm

To be clear, I said “useful”. I would intend low_bit("%d") to fail to type check, not to infer a return type of str. Making that distinction is what I consider “not even theoretically possible” as there’s no way a type checker could know that I don’t want to include string-formatting meanings of the % operator.

Maybe I’m not being clear enough. My point is that the developer has to be the one to state distinctions like this, and to do so without over-constraining requires complex types. The type checker can’t infer stuff like this. Maybe intersection types would make expressing things like this easier, but I’m not yet convinced - I want to express “integer-like type with a % operator which will take an actual int on the RHS”, not “Any minus str”…

As to why do I want to express this? The fundamental use case I’m thinking of is low_bit(x: int) -> int, but I don’t want to prevent people from using my library in ways I hadn’t expected. So I don’t want to over-constrain, because I don’t know what my users might want to do.

We’re very close to duck typing here, and honestly I’d be fine just leaving an API like this untyped so that duck typing can do its job. But people then ask for type annotations, so we come full circle. And at this point I’m very much repeating myself, so if you’re still not clear what my point is, we’ll probably just have to agree to differ.

hansgeunsmeyer · September 26, 2023, 8:22pm

It would not need to do so. The type checker would see the modulo-operator is applied. So, for this any-type-that-supports-modulo is the inferred “type” (in Rust this is a non-inferred Trait). The type-checker tries to resolve those “types” in the context of the current program (however “program” is defined). For instance somewhere else it calls low_bit(x) where x is a list and it knows that lists are not part of that “any-type-that-supports-modulo”, then it can decide there is a conflict.
If lists would support a modulo operator, then they would not be excluded.

The thing is also, if you don’t want to restrict usage, then you just don’t add type-hints. No one is arguing for no longer supporting duck-typing. But you cannot have it both ways here (you can of course in Python, I’m just being emphatic). It doesn’t really make sense to me to say: “I’ll restrict this to ints”, and also “I don’t want to exclude strings”. (Already if you do this now, you’ll run into trouble with mypy.)

So, I regard this as an unfortunate effect now of typing, that a Python programmer can indeed give only type-hints for a particular type, but wants to also give people the freedom to ignore those. I know this works, due to dichotomy between static-typing and runtime, and know it’s baked into the current system. But when I see that an external big library (pandas, numpy) uses particular types, I do see them as a contract I have to follow and which I only ignore at my own peril…

But I have never seen a concrete example of a widely-used library, like the ones I mentioned, where the addition of type-hints did not intend not constraining the supported types for function arguments.

BrenBarn · September 26, 2023, 8:44pm

I’m not sure, but the phrasing of your response suggests you may be missing the fact that strings do support a modulo operator. "%d" % 2 is valid and evaluates to the string '2'.

hansgeunsmeyer · September 26, 2023, 8:46pm

How could I forget that I guess the heat of discussion…
(Also the operators are actually different, of course, it’s just that they share the same representation.)
No matter, certain other examples could be given… I’ve updated my post to use lists.

BrenBarn · September 26, 2023, 8:50pm

I agree with this, and it’s kind of where I was heading with my earlier comments. @mikeshardmind said:

If that is the case, I don’t see why anything needs to be changed to support this new type theory. It could be implemented now. And if it is, and is more useful than the existing typecheckers, then great. But if it’s not, then the changes to the standard will just be retroactively perceived as misguided. Either way, I don’t see how adding it to the standard before seeing how it’s useful would help matters.

mikeshardmind · September 26, 2023, 9:00pm

I think part of people asking for type-hinting here would also be solved in a future where this is inferred, because on the user end, their tools would no longer yell at them, they are using it in a way that works according to the detected duck typing. (Ideally, tools also wouldn’t show a generated protocol directly, they’d show “objects that support: %, +, /” (etc). I think this would be reasonably possible for tooling to do for protocols that only describe a builtin under or intersections of them in the future.)

I think this has been an interesting line of discussion and I need to type up some more thoughts into an ideas post, but it highlights how while typing has mechanisms to support duck typing, they aren’t ergonomic and require more.

In your case of not actually wanting to support the string formatting case, but overall wanting duck typing, there’s not even an answer to this currently, you’d need Intersections and type negation to do it properly:

SupportsMod & ~str, where SupportsMod is a protocol for % support

This is a solution that composes naturally from what you described wanting, but isn’t expressible currently, nor is it necessarily obvious to those who haven’t been in the depths of typing already.

mikeshardmind · September 26, 2023, 9:03pm

Quite frankly, nobody should be doing that much work if there’s an opportunity that a different definition would be adopted while they are working on it that undermines that amount of work. There are benefits to having good definitions outside of such an endeavor, and it is to everyone’s benefit if we build things up in order. That means the foundational definitions should be addressed first.

hansgeunsmeyer · September 26, 2023, 9:39pm

I you set up a plan for that, or break down the task, I imagine there might be people here who’d volunteer to take on parts of that - even though it’s not coding Trying to come up with just a formal description, and do this within an established larger theoretical framework, and consider both the internal consistency and how this might be used, seems already a pretty big undertaking to me.

mikeshardmind · September 26, 2023, 9:51pm

It’s being discussed in various places already and seems to be something which won’t happen until this or as part of it The lack of formal definitions has been a pain point for other things already.

This was intended as a point to how better definitions can enable tooling to take on more for users and don’t necessarily conflict with making it easier for users under the general sentiment of the original post that we should be working towards a future for typing that enables and considers all audiences and figure out how we can best do that.

The tangential bit on smarter inference based on results from set-theoretic typing has a way of working already (it’s a solved problem in academia) that has a reasonable runtime for any non-infinitely recursive types or program structures. I imagine this would lead to an implementation that if it got to a certain recursion depth, it told the user why it gave up on detecting the type of that variable and which variable it needs more information about. Outside of the actual correct type of json/toml etc., I can’t think of much that’s actually infinitely recursive in that manner, and each of these is built up of a few base types that providing would be enough, and that it would be detectable what the type checker needs more info about (unless we also add Higher Kinded types, then we’re in unproven territory again).

I’ll break that tangent out into another post though, as even though it could help with typing being more accessible and less intrusive while getting more accurate (hence bringing it up here), it’s a “big idea” even if it’s conceptually simple in why it would be useful.

hansgeunsmeyer · September 26, 2023, 9:53pm

I’d be interested in reading up on that. Just throwing out some links to the main literature (in a separate thread) would already be welcome (to me at least).

guido · September 27, 2023, 3:31am

Well, okay, but now I have even less of an understanding of what “a set-theoretic type system” is supposed to be. Apart from you nobody has even responded to me (even though there have been 15 or so new messages in the thread) – is there really nobody who can catch me up? @mikeshardmind could you help me out?

mikeshardmind · September 27, 2023, 4:12am

@guido Sorry, I missed that in things to reply to. Set-theoretic type theory is a view of typing where most types and specifications of compatible types are treated as sets of types. This leads to subtyping being as simple as if the type or set of types is contained in another set of types. (subset comparisons) There’s been a lot of academic research and published proofs in this domain that show it’s an exceptionally good model for gradual typing, supporting everything python’s type system currently does and more. Because of the “And more” part, it would also give us a good theory to view when considering new features of the type system and how they can or cannot be safely composed. To that end, the point was by being more formal, we gain the ability to do things with the type system that are grounded in what is already proven, and we could more easily shift more of the work to tooling. (pytype as you pointed out does already do more inference in this direction, but there’s much it cannot do safely)

Good writeups of how the set-theoretic view of types translates to python came up with some of the discussion on intersections as well as in a discussion of if Never is actually consistent with other types, or only the absence of a type.

The main question that would come up if we were adopting it is in the linked bit on consistency, but to summarize:

Never is equivalent to the empty set of types in this model, and many operations reduce to simple set operations. This leaves a small question on if it should be considered a subtype or if the broader definition of subtyping in set-theoretic typing is only useful for type narrowing in this case (narrowing to no possible value, and therefore indicating some sort of error) as the bottom type is uninhabited by types that can exist at runtime.

edit: Every other existing decision is clearly compatible with it, even if needing a clarifying detail in definitions.An example of such a clarification is subclassing Any. Any in that context cannot be the top type, but can have a clearly specified meaning which matches the current meaning.

guido · September 27, 2023, 6:38pm

This actually sounds like exactly what we used as the foundation for subtyping in PEP 483. That PEP even considers intersection types (as an optional future extension), and it defines Any properly. It doesn’t define Never or Not, but I think those do fit in the framework laid out there. (Though there are practical issues with Not.)

At this point I am most interested in understanding which pressing questions about the current type system you think can be answered by using the set-theoretic framework that otherwise seem problematic (especially as surfaced through differences between type checkers). Second, if it can provide guidance towards resolution of the discussion around some proposed developments like Intersection.

Is the set-theoretic perspective affected by generics? Does it matter that we have some nominal types (regular subclassing) and some structural types (TypedDict, Protocol)? Does it help guide us through the minefield of higher-kinded-types?