A broader name -> type concept

boxed · April 21, 2023, 10:42am

(This is a Python specific follow up to Names can be so much more | En kodare which I wrote on my blog a while back)

The name → type mapping in Python is very tightly scoped to the variable or the function where that mapping is done. I would propose that it would be good if you could have bigger such scopes: per module, file, and project.

It’s quite common to see the exact same type definitions repeated ad nauseam in a code base. For Django user : User = request.user for example. I would propose that it would be much better if you could tell MyPy/PyCharm/Pylance/whatever that user means exactly User always, on the entire project.

This goes quite well with the Domain Driven Design ethos of having one name always mean exactly one type for a specific domain, and that domain can/should be mapped to a Python module or project.

kknechtel · April 21, 2023, 3:26pm

Module and file scope are the same thing, and a name can already have such a scope, in which case any associated annotation (which is not a proper “type” for a “variable”) has the same scope.

If you want names to share an annotation when they have separate scopes simply because they are spelled the same… I can’t think of any other programming language that does anything like that.

boxed · April 21, 2023, 4:10pm

Sorry, that was unclear. I meant like “module foo and everything it contains” as “module”. Meaning a rule would be applied to foo.*.

A variable can have such a scope, but the name → type mapping can’t apply to more than one logical scope. Clearly I was unclear so let’s take a concrete example:

magic rule here: request always means HttpRequest

def foo(request):
    pass # in here `request` is of type HttpRequest

def bar(request):
    pass # also here!

But I am suggesting these rules to go in separate configuration files, not inline like this. Hopefully I get the point across though.

Well yea, I do know it’s a novel idea. I’ve been trying to get people to just wrap their head around it for several years! I don’t think we should reject ideas that are newer than ~1975 though

One of the big problems with adding typing to big code bases is that it makes all the code worse, because you write foo : Foo everywhere, which doesn’t help. It in fact makes the code less readable, and makes those who don’t care about the types really hate typing.

I think we can make something that is much better, where we can eat the cake and have it too. Elegant Python code, with very minimum explicit typing and yet strictly typed. AND also enforce naming standards somewhat by making the typing system scream if you call your string variable request.

Rosuav · April 21, 2023, 4:49pm

Anders Hovmöller:

Clearly I was unclear so let’s take a concrete example:
magic rule here: request always means HttpRequest

def foo(request):
    pass # in here `request` is of type HttpRequest

def bar(request):
    pass # also here!
But I am suggesting these rules to go in separate configuration files, not inline like this. Hopefully I get the point across though.

kknechtel:

If you want names to share an annotation when they have separate scopes simply because they are spelled the same… I can’t think of any other programming language that does anything like that.

Well yea, I do know it’s a novel idea. I’ve been trying to get people to just wrap their head around it for several years! I don’t think we should reject ideas that are newer than ~1975 though

Now that you have an actual example, this sounds a lot better (which is often what happens when realistic examples are used - I strongly recommend them!).

One way to define this would be to allow a module-scope set of default annotations, which are applied automatically to any variable of that name, if and only if it doesn’t have an annotation of its own. So, for instance, you could do something like this:

class __DEFAULTS__:
    cur: psycopg2.extensions.cursor

def fetch_data():
    cur = conn.cursor() # uses the default annotation

def deltas():
    prev: int = get_value()
    while more_values():
        cur: int = get_value() # is typed as an integer
        yield cur - prev
        prev = cur

(I’m not saying that this is good code, but I will admit that use of both “cursor” and “current” in the same file HAS happened.)

If you’re consistent, or even mostly-consistent, with your naming, this could help somewhat. You could even decide that your parameters will always and only be annotated in this default way, thus guaranteeing consistency (while allowing non-parameter variables to be locally annotated).

The advanced concept of “this is a plural, it should be a collection” is a bit more problematic, though. For example, a person might be a dictionary with keys like name, profile pic, and a list of messages, but people might be a dictionary mapping user names to their full details. So both singular and plural are, at a concrete level, the same data type. (Yes, you could use a dataclass for a person, but maybe you got your data from JSON or something.) Still, simpler examples should show up in code review, so you’d get a chance to catch it.

pf_moore · April 21, 2023, 6:21pm

Typically, we don’t reject ideas just because they are new. We do tend to reject them if no-one else does them, though. Python has a huge user base, and it’s not the language you should be looking towards if you want innovative ideas and unique approaches. In fact, even before Python got as popular as it now is, it followed the same principle - integrating tried and tested approaches rather than adopting “experimental” ideas.

There are exceptions, and there is always a chance for a good idea to get accepted. But it’s a lot harder if there’s no “prior art”.

Regarding this specific idea, I don’t really like it. I understand the principle, and I see how it could be useful in an environment with relatively strict naming conventions, but in general I don’t think that it’s a typical sort of Python approach.

boxed · April 21, 2023, 8:14pm

In fact, even before Python got as popular as it now is, it followed the same principle - integrating tried and tested approaches rather than adopting “experimental” ideas.

I mean… it’s not like Python hasn’t done a lot of innovative stuff. That would cast Python in a bad light I think. Just the syntax of significant whitespace is pretty radical compared to most other languages. Still to this day!

but in general I don’t think that it’s a typical sort of Python approach.

What part of it is unpythonic? Can you be more specific?

I find it very pythonic. It is certainly DRY. That part is objectively true at least. One can argue that it violates “explicit is better than implicit”, I guess? Although I’d argue that point

I would argue that this is a proposal for “readability counts”, a part of the Zen that typing-everywhere is at odds with.

Oh, I almost forgot, I was gifted this little tool to explore the correlation between names and types (credit to asottile):

import ast
import collections
import sys


class V(ast.NodeVisitor):
    def __init__(self):
        self.name_to_type = collections.Counter()

    def visit_FunctionDef(self, node):
        for arg in node.args.posonlyargs + node.args.args + node.args.kwonlyargs:
            if arg.annotation:
                self.name_to_type[(arg.arg, ast.unparse(arg.annotation))] += 1
        if node.args.vararg and node.args.vararg.annotation:
            self.name_to_type[(f'*{node.args.vararg.arg}', ast.unparse(node.args.vararg.annotation))] += 1
        if node.args.kwarg and node.args.kwarg.annotation:
            self.name_to_type[(f'**{node.args.kwarg.arg}', ast.unparse(node.args.kwarg.annotation))] += 1
        self.generic_visit(node)

    visit_AsyncFunctionDef = visit_FunctionDef

    def visit_AnnAssign(self, node):
        if isinstance(node.target, ast.Name):
            name = node.target.id
        elif isinstance(node.target, ast.Attribute):
            name = node.target.attr
        else:
            raise NotImplementedError(node.target)
        self.name_to_type[(name, ast.unparse(node.annotation))] += 1
        self.generic_visit(node)


def main() -> int:
    v = V()
    for filename in sys.argv[1:]:
        with open(filename, 'rb') as f:
            contents = f.read()
        v.visit(ast.parse(contents, filename=filename))

    import pprint; pprint.pprint(v.name_to_type.most_common(20))

You can also just visually scan over stubs files to get a feel for this. For example django-stubs. We can take one file like django-stubs/django-stubs/template/response.pyi at master · typeddjango/django-stubs · GitHub and see that it’s quite a lot of mappings here that are duplicated in that file and with a big overlap with django-stubs/django-stubs/views/defaults.pyi at master · typeddjango/django-stubs · GitHub for example.

I also believe this kind of mapping becomes more and more useful the further out from the standar library. So a web app will benefit more than Django itself, which benefits more than the csv module e.g., which benefits more than re, etc.

Another nice thing about this idea is that it doesn’t really require big changes to the standard library, nor really a PEP, nor… well… anything really. One can potentially add it as an experimental feature in mypy, pylance, or some other static type checker, and try it out there.

I did try looking at this once but I got lost in the mypy code base and when asking for help I just got “why would you do that?” and no one understood what I was saying heh.

Just some help to find where I could hook into mypy to get a prototype going would be great!

mdrissi · April 21, 2023, 8:46pm

Answering the question of how you could prototype this in mypy I’d recommend here. That’s where mypy parses file in AST. You can then write a ast transformer similar to examples here to find all places a specific name appears (assignment statements/function arguments being big 2) and add a type annotation to them there if the name has default type defined. You can hardcode mapping initially and then if you wanted it to be a bit more usable update config logic. Mypy ast output from parse is not same as python ast (as it holds type info too) and equivalent of NodeTransformer can be found here.

Mypy does also support plugins and I’d review it’s plugin interface first to see if this feature can be implemented as a plugin. That’d make it much easier to maintain/not require it to go in mypy core.

Dutcho · April 23, 2023, 6:38am

Kind of prior art are:

Fortran 77, in which the first letter of a variable name indicates the type
The way to indicate the initial letter -> type mapping is an IMPLICIT statement (which, given “explicit is better then implicit”, is an ironic coincidence)
Many early (and even recent) dialects of BASIC, which had special-character suffices to indicate the type
IIRC $ indicates string, % integer, and no suffix float

Both are much wider than proposed here, and mostly abandoned since in newer versions of the languages. So likely this mostly a historical footnote to the discussion

boxed · April 23, 2023, 7:52am

There’s also hungarian notation, which in my opinion was just terrible as it wasn’t enforced and added noise (also the case for the two things you mention).

boxed · April 23, 2023, 8:03am

Thanks for this! I’ll have to take a look. The plugin hooks seem not usable for this purpose what I can tell. They all deal with classes and function signatures, which can partially do what I’m talking about, but it wouldn’t be able to handle this simple global variable:

name = 1

which should be a type error as a name should always be a str.

pf_moore · April 23, 2023, 10:37am

I think that’s the key point here - all the prior art suggests this is a pretty bad idea…

boxed · April 23, 2023, 11:08am

Many things are bad ideas when done slightly wrong. Like a car with no seatbelts

I will again point out that all the prior art isn’t actually prior art and significantly different from what I am talking about.

mdrissi · April 24, 2023, 7:48am

One sort of related thread. The goal here is to infer/guess type based on clues/context with parameter/variable name being one of them. It’s not so much for a type checker though, but more a tool that can fill in rough guess types for you, then you manually fix up to help speed up adding types to an untyped codebase. Probably a mixture of which imports are in the file/variable name/function name may get you a lot of common type patterns. Will probably work better for codebases that already have some type hints so you could collect some basic counts/stats from other files.

boxed · April 24, 2023, 8:02am

Yea I think that discussion you linked to also misses the forest for the trees. Advanced ML models, type inference statistics, complex mathematical systems… while a dict with name → type will give you 90% of the benefit, AND create new benefits.

jonburdo · April 26, 2023, 3:51pm

It is pretty nice to be able to specify types in a single place rather than everywhere when there’s a very clear, explicit way to do so. For example, I often see things like this, where the sub-class simply inherits types:

from abc import ABC
from typing import ClassVar

class A(ABC):
    x: ClassVar[int]
    y: ClassVar[float]
    z: ClassVar[str]

class B(A):
    x = 1
    y = 1.0
    z = "a"

Being able to do this more broadly would help a lot with readability and coding efficiently. Loosely similar to how ClassVar modifies the scope of a variable, something like ModuleDefault could indicate the default type:

user: ModuleDefault[User]

def get_user_info(user):
    pass

Maybe the name could even double as a type var? (Maybe not though - this is a separate matter)

def get_related_user(user) -> "user":
    pass

def get_user(user_id: "user" | int) -> "user":
    pass

A few other thoughts:

As I see it, this suggestion is much more inline with the existing type system than with efforts to use heuristics to infer types. It seems the idea here is enable explicit declaration of types.
Allowing users to give explicit meaning to names has been around for a while - pytest fixtures, typer/fastapi, etc.
A way to disable this for a particular block of code would be useful.

Overall, my opinion would heavily depend on how this is implemented, but I think there’s a way to do it well and the goal is pretty compelling.

boxed · April 26, 2023, 6:03pm

I like the idea of having these definitions in pyproject.toml or something like that as it’d be easier define global rules for your entire project, which I think of as the main and more powerful use case.

MegaIng · December 17, 2023, 1:59am

This is a slight resurrection, but I want to point out that there is prior art: Nim implements this feature, but limited to function arguments: Nim Manual

IMO this does help a lot with keeping nim code readable. Because of how procedural focused nim is (it doesn’t even really have classes builtin), it only applying to arguments there makes sense there. It isn’t a big leap to generalized that to all variables (not attributes probably) in python.

boxed · December 17, 2023, 6:45am

Ah. Nice! My suggestion would be to have bigger scopes than files for this feature though, but already on a file level it would be quite useful.

boxed · January 19, 2024, 8:32am

I made a very hacky implementation of this concept that you can find at GitHub - boxed/ivrit: Generate type stubs for your project based on name->type mapping configuration

It’s much more limited than the basic idea, due to limitations of what pyi files can do, but it’s surprisingly effective already actually.

kkirsche · January 19, 2024, 11:58am

Personally I think this approach incurs too much risk as it seems like it would mask errors during refactoring by relying on explicit types too heavily.

In the code bases I work on we used helper functions to provide type narrowing of unknown types like this. E.g.

def get_user() -> User:
    user = request.user
    if isinstance(user, User):
        return user
    raise TypeError(f"Unexpected non-user object received {user=}")

We then use this throughout the application ensuring we always use the correct type and detect errors due to refactoring mistakes.