Using pattern matching to check a dict, am I doing it right?

lmaurelli · March 8, 2024, 7:02am

I am coding the setter of a class member variable which should holds important key, value pairs used to control the flow of my script. My wish is to use pattern matching to ensure that the keys do exist and their values are as expected (some runtime type checking,. I tried the following, and it seems to work but I wonder if I am I missing something: e.g. I have just read the introducing tutorials of pattern matching yesterday. I’m using Python 3.11.7.

  @extra.setter
  def extra(self, value):
      processed_extra = {}
      self.remaining_extra = {}
      for key, val in value.items():
          match key, val:
              case "A", str(v) if v in Utility.ACTIVE_A:
                  processed_extra[key] = val
              case "A", str(v) if v in Utility.DEPRECATED_A:
                  logger.warning(f"The value [{v}] is deprecated!")
              case "first_bool" | "second_bool", bool():
                  processed_extra[key] = val
              case "first_list" | "second_list", []:
                  logger.warning(f"The value of [{key}] is an empty list, either fill it or don't add the key!")
              case "first_list" | "second_list", None:
                  processed_extra[key] = val
              case "first_list" | "second_list", [int(), *other] if all(isinstance(x, int) for x in other):
                  processed_extra[key] = val
              case "mypath", str(path) if pathlib.Path(path).is_dir():
                  processed_extra[key] = val
              case "database", "A" | "B" | "C":
                  processed_extra[key] = val
              case "environment", "A" | "B" | "C":
                  processed_extra[key] = val
              case "start_time", Timestamp():
                  processed_extra[key] = val
              case _:
                  # if not matched, add the item (key, val) to the dict
                  self.remaining_extra[key] = val
      
      NEEDED_KEYS = [
          "first_bool",
          "A",
          "second_bool",
          "database",
          "environment",
          "mypath",
          "first_list",
          "second_list",
          "start_time"
      ]
      if all(k in processed_extra for k in NEEDED_KEYS):
          self._extra = extra
      else:
          ValueError(f"Not all needed keys were correctly parsed, check {self.remaining_extra }")

onePythonUser · March 8, 2024, 3:56pm

Hi,

I referenced PEP 636 – Structural Pattern Matching: Tutorial. It looks like you’re implementing Adding conditions to patterns.

For example, in the following line of code:

case "A", str(v) if v in Utility.ACTIVE_A:

Should it include brackets as in:

case ["A", str(v)] if v in Utility.ACTIVE_A:

This would apply to the other case statements as well. To be consistent with the PEP 636 guidelines?

To be even more consistent with the tutorial, should it be (?):

case ["A", str(v)] if str(v) in Utility.ACTIVE_A:

jamestwebber · March 8, 2024, 4:16pm

I don’t think matching on str(v) is doing what you expect/want it to do. It’s not converting v to a string for later–v remains whatever type it was. "A", str(v) will only match key, val if str(val) == val

If Utility is a StrEnum this could work but in other cases it won’t.

I don’t think the PEP recommends anything in particular here, it’s up to the author and readability concerns.

onePythonUser · March 8, 2024, 4:22pm

Oh, ok. Thank you for making sense of it.

Appreciated.

jamestwebber · March 8, 2024, 4:39pm

Oh I actually thought I discarded that post! After re-reading the OP’s example it seemed like the type-checking aspect was desired.

onePythonUser · March 8, 2024, 4:51pm

Just out of curiosity, I have seen the term OP bounced around. What does that stand for?

onePythonUser · March 8, 2024, 5:02pm

So this is correct?

hansgeunsmeyer · March 8, 2024, 5:03pm

Hi @lmaurelli - I don’t have a direct answer to your question, but … do you believe this kind of code is maintainable? To me this seems unmaintainable code – too complicated for its own good – too complicated implies that you can very easily oversee little errors, forget things, and have bugs creep in.

If you need to set class members based on a dictionary (or based on a serialized JSON or yaml string loaded as dictionary), there seem to be better/more robust/simpler ways to do so. A straightforward way would be something like this (which can be expanded with setters too, if needed):

class MyData:
     def __init__(self, first_bool: bool, second_bool: bool, first_string: str):
         assert isinstance(first_bool, bool)
         assert isinstance(second_bool, bool)
         assert isinstance(first_string, str)
         self.first_bool = first_bool
         self.second_bool = second_bool
         self.first_string = first_string
     @staticmethod
     def from_dict(obj: dict) -> "MyData":
         return MyData(**obj)

Here the from_dict method will raise an exception when there are missing keys, extra keys, or when the values don’t match.
The ‘yaml’ module (pyyaml) also has support for directly mapping dictionaries to object, look for “YAMLObject” on https://pyyaml.org/wiki/PyYAMLDocumentation
And if you really want to dig into this, then there is also pydantic.

jamestwebber · March 8, 2024, 5:03pm

It stands for “original post” or “original poster” depending on context. The first post in the thread, or its author.

Old internet forum jargon

lmaurelli · March 11, 2024, 8:40am

As far as I understood, the line:
case ["A", str(v)] if v in Utility.ACTIVE_A:
has the same logic as:
case ["A", str() as v] if v in Utility.ACTIVE_A:
i.e. the second element in the pattern should be of the built-in class string and it is bind to the variable name v. In other words, in practice this should implement a run-time type check, shouldn’t it?
On the other hand:
case ["A", str() as v] if str(v) in Utility.ACTIVE_A:
Is like above but you cast the variable v in the condition (not needed since the logic should be, first check if the pattern matches, in case it matches (i.e. the element is of type string), then evaluates the condition. If the condition fails, the side-effect is that the binding of names keep, I don’t know if the pattern is actually “matched” in the sense the next case is not evaluated or not.

lmaurelli · March 11, 2024, 8:45am

Thanks @hansgeunsmeyer, I was refactoring the code in order to have a single entry-point/interface when doing the data validation of this dict, as it was spread into the code in many places and I prioritize this aspect and readibility. I was entusiastic about pattern matching as it seems the right choiche but you make me question my pick. The dict is created from CLI parsed arguments or config files to date (dicts in *.py files) to date, but in the future my wish is to read from enviromental variables as well.

hansgeunsmeyer · March 11, 2024, 1:50pm

I’ve had to design similar tools in the past, where some app would need to be configured with either a config file (or a few of those), plus potential overrides from the command line, plus possible default settings from the environment. This kind of code can become pretty messy very quickly. To keep all this tractable, I used dataclasses for my actual “object model”. So every configuration would be implemented as a dataclass. Internally, the main thing, imo, is that you do not want to use just a dictionary (not even a double-checked one) to represent your configuration state. Dictionaries can be changed too easily, the keys are too arbitrary. It’s hard to know just by code inspection what their contents are at any moment.
Dataclasses are ideal for representing configs.

But then the next thing is to have a reliable serialization/deserialization. There I sometimes used yaml objects or a method similar to the one in my earlier post. In order to hook this up to a command line parser, I sometimes used the HuggingFace argument parser. In the HuggingFace code they have exactly the same problem of needing to support complex configurations - from config files, env variables and command line. Their argument parser is a very nice example – which integrates well with argparse – of how to implement this. See in particular: hf_argparser#parse_args_into_dataclasses((Docs).

Basic usage is very simple (and more important: very easy to extend/customize):

from dataclasses import dataclass, field
from transformers import HfArgumentParser

@dataclass
class MyConfig:
     a: int = field(default=0, metadata={"help": "Some int"})
     b: str = field(default="?", metadata={"help": "Some string"})

parser = HfArgumentParser(MyConfig)  # contains multiple useful methods
args = parser.parse_args()  # gives you a Namespace like argparse does; the Namespace will in this case have `a` and `b` in it

HfArgumentParser(MyConfig).parse_args(["--help"])  # will print out the help, similar to argparse, but help is automatically generated from the MyConfig metadata

I don’t know if I should feel sorry or feel flattered that I made you doubt your original approach (match statements are nice!), but I guess it’s good to know your options

lmaurelli · March 16, 2024, 1:09pm

I was just busy with work tasks, but I feel luckly to get to discuss design choiches with other people. I appreciate your response very much!