I don't know my dumb code is readable, idiomatic or not

Hi,

Assume there is a data from external api (means I cannot change the shape of it).

# data from 3rd party API call.
data = {
    "AUTHOR": "guido",
    "TITLE": "python",
    "PUBLISHED": 1991,
    "QRCODE": "...",
}

I have Book dict for internal use (within application boundary).

from typing import TypedDict

class Book(TypedDict):
    author: str
    title: str

In order to fit the external data into book for internal use, I wrote as follows:

# availble approach 1.
my_favorite_book: Book
my_favorite_book = {
    k.lower(): v for k, v in data.items() if k.lower() in Book.__required_keys__
}

# available approach 2.
my_favorite_book = Book(
    author = data["AUTHOR"],
    title = data["TITLE"],
)

The Book only has 2 keys (author, title) in this example. However, there are cases where number of keys is between 10 and 30. I chose approach 1 but I am not sure it’s readable or idiomatic… Just put every key mappings manually (approach2) is better? Or any other suggestions?

The situation where I am in is that I have to convert many fileds with uppercase in partial fields with lowercase keys. For example:
{PRICE: ..., RISK: ..., RATIGING:...} -> {price:..., rating: ...} .

Thanks for reading!!

Things like checking that your data structure is given the right keys, not accepting or ignoring extra keys, turning to lowercase certain keys, are all runtime functionality.

TypedDict and your Book is for static type analysis. For example, declaring my_favorite_book: Book and then some tool other than Python warning you if the code ever assigns to my_favorite_book some dict that does not agree with the type. For example, here is Pylance complaining.

But it is code that runs fine in Python

image

I think you should rather have a dedicated class that encapsulates the functionality that you want. For example a dataclass.

from dataclasses import dataclass

@dataclass
class BookData:
  author: str
  title: str

This only accepts author and title

>>> my_favorite_book = Book(**data)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Book.__init__() got an unexpected keyword argument 'AUTHOR'

You can add the functionality to accept data and do all the processing that you need, like turning keys to lowercase. For example,

@classmethod
def from_dict(cls, env):      
  return cls(**{
    k.lower(): v for k, v in env.items() 
    if k.lower() in inspect.signature(cls).parameters
  })

Now, taking your question at face value, the first approach makes an assignment that does not agree with the type that was declared, while the second approach does. At least Pylance complains with the first and not the second.

The error message from Pylance is

Expression of type “dict[str, Unknown]” cannot be assigned to declared type “Book”

2 Likes

If you do a lot of this kind of processing, you might be interested in third-party libraries such as Pydantic. Otherwise I tend to agree with @franklinvp .

2 Likes

Hi, Franklinvp.
Thanks a lot for your detailed and kind explanations. It helps a lot.

Yeah… that’s the good part of mypy for me.
It’s just static type checker.

It’s totally my fault if I assign/update arbitrary keys.
But, it’s okay if I know what mypy does for me correctly…

(Although dataclass helps at initialization step constraining inputs, it’s totally okay such as book.flavor="sweet" after that)

To be honest, I am not good at determining which type of container is better for sepcific problem.
NamedTuple, TypedDict, Dataclass, (attr, cattr, msgspec, pydantic and many others).

I gave a shot to “just use dictionary” until I find myself a kind of dumb…

from typing import TypedDict, Any

class Book(TypedDict):
    """Book data."""

    author: str
    title: str


def create_book(
    raw: dict[str, Any],
) -> Book :
    """Creates a book given external arbitrary data."""
    required_fields = Book.__required_keys__
    lowercased_fields_in_raw = set(k.lower() for k in raw.keys())

    if not lowercased_fields_in_raw.issuperset(required_fields):
        raise KeyError("missing required keys.")

    return Book(
        **{
            k.lower(): v
            for k, v in raw.items()
            if k.lower() in Book.__required_keys__ | Book.__optional_keys__
        }
    )


external_data1 = {
    "AUTHOR": "guido",
    "QRCODE": "1234",
}

book1 = create_book(external_data1) # raise error

external_data2 = {
    "AUTHOR": "guido",
    "TITLE": "python"
    "QRCODE": "1234",
}

book2 = create_book(external_data2) # book. what I want.

Anyway, I think I have to get more experiences in Python…
Thanks again.

Um… I am wrong… :cry:

 error: Unsupported type "dict[str, Any]" for ** expansion in TypedDict  [typeddict-item]

Would it change anything if you replace this piece:

to this:

tmp_dict = {
            k.lower(): v
            for k, v in raw.items()
            if k.lower() in Book.__required_keys__ | Book.__optional_keys__
        }

return Book(**tmp_dict)

:question:

It emits the same error message :cry:
I think I have to take a different approach (if I want to stick to dumb principle).
Thanks you for the reply!

Quick question: does the error comes from Python or a static type checker?

It comes from mypy. (not python)

Well, then there is always the cast(...) function:

import typing

return typing.cast( Book, tmp_dict )
1 Like

Oh my…I passed mypy.

I don’t know there is cast at all…

Docs says

When using a cast, the type checker should blindly believe the programmer.

https://typing.readthedocs.io/en/latest/spec/directives.html#cast

I think I should reconsider the way I write code before passing mypy…

Thank you! :smile:

1 Like