I am coding the setter of a class member variable which should holds important key, value pairs used to control the flow of my script. My wish is to use pattern matching to ensure that the keys do exist and their values are as expected (some runtime type checking,. I tried the following, and it seems to work but I wonder if I am I missing something: e.g. I have just read the introducing tutorials of pattern matching yesterday. I’m using Python 3.11.7.
@extra.setter
def extra(self, value):
processed_extra = {}
self.remaining_extra = {}
for key, val in value.items():
match key, val:
case "A", str(v) if v in Utility.ACTIVE_A:
processed_extra[key] = val
case "A", str(v) if v in Utility.DEPRECATED_A:
logger.warning(f"The value [{v}] is deprecated!")
case "first_bool" | "second_bool", bool():
processed_extra[key] = val
case "first_list" | "second_list", []:
logger.warning(f"The value of [{key}] is an empty list, either fill it or don't add the key!")
case "first_list" | "second_list", None:
processed_extra[key] = val
case "first_list" | "second_list", [int(), *other] if all(isinstance(x, int) for x in other):
processed_extra[key] = val
case "mypath", str(path) if pathlib.Path(path).is_dir():
processed_extra[key] = val
case "database", "A" | "B" | "C":
processed_extra[key] = val
case "environment", "A" | "B" | "C":
processed_extra[key] = val
case "start_time", Timestamp():
processed_extra[key] = val
case _:
# if not matched, add the item (key, val) to the dict
self.remaining_extra[key] = val
NEEDED_KEYS = [
"first_bool",
"A",
"second_bool",
"database",
"environment",
"mypath",
"first_list",
"second_list",
"start_time"
]
if all(k in processed_extra for k in NEEDED_KEYS):
self._extra = extra
else:
ValueError(f"Not all needed keys were correctly parsed, check {self.remaining_extra }")
I don’t think matching on str(v) is doing what you expect/want it to do. It’s not converting v to a string for later–v remains whatever type it was. "A", str(v) will only match key, val if str(val) == val
If Utility is a StrEnum this could work but in other cases it won’t.
I don’t think the PEP recommends anything in particular here, it’s up to the author and readability concerns.
Hi @lmaurelli - I don’t have a direct answer to your question, but … do you believe this kind of code is maintainable? To me this seems unmaintainable code – too complicated for its own good – too complicated implies that you can very easily oversee little errors, forget things, and have bugs creep in.
If you need to set class members based on a dictionary (or based on a serialized JSON or yaml string loaded as dictionary), there seem to be better/more robust/simpler ways to do so. A straightforward way would be something like this (which can be expanded with setters too, if needed):
Here the from_dict method will raise an exception when there are missing keys, extra keys, or when the values don’t match.
The ‘yaml’ module (pyyaml) also has support for directly mapping dictionaries to object, look for “YAMLObject” on https://pyyaml.org/wiki/PyYAMLDocumentation
And if you really want to dig into this, then there is also pydantic.
As far as I understood, the line: case ["A", str(v)] if v in Utility.ACTIVE_A:
has the same logic as: case ["A", str() as v] if v in Utility.ACTIVE_A:
i.e. the second element in the pattern should be of the built-in class string and it is bind to the variable name v. In other words, in practice this should implement a run-time type check, shouldn’t it?
On the other hand: case ["A", str() as v] if str(v) in Utility.ACTIVE_A:
Is like above but you cast the variable v in the condition (not needed since the logic should be, first check if the pattern matches, in case it matches (i.e. the element is of type string), then evaluates the condition. If the condition fails, the side-effect is that the binding of names keep, I don’t know if the pattern is actually “matched” in the sense the next case is not evaluated or not.
Thanks @hansgeunsmeyer, I was refactoring the code in order to have a single entry-point/interface when doing the data validation of this dict, as it was spread into the code in many places and I prioritize this aspect and readibility. I was entusiastic about pattern matching as it seems the right choiche but you make me question my pick. The dict is created from CLI parsed arguments or config files to date (dicts in *.py files) to date, but in the future my wish is to read from enviromental variables as well.
I’ve had to design similar tools in the past, where some app would need to be configured with either a config file (or a few of those), plus potential overrides from the command line, plus possible default settings from the environment. This kind of code can become pretty messy very quickly. To keep all this tractable, I used dataclasses for my actual “object model”. So every configuration would be implemented as a dataclass. Internally, the main thing, imo, is that you do not want to use just a dictionary (not even a double-checked one) to represent your configuration state. Dictionaries can be changed too easily, the keys are too arbitrary. It’s hard to know just by code inspection what their contents are at any moment.
Dataclasses are ideal for representing configs.
But then the next thing is to have a reliable serialization/deserialization. There I sometimes used yaml objects or a method similar to the one in my earlier post. In order to hook this up to a command line parser, I sometimes used the HuggingFace argument parser. In the HuggingFace code they have exactly the same problem of needing to support complex configurations - from config files, env variables and command line. Their argument parser is a very nice example – which integrates well with argparse – of how to implement this. See in particular: hf_argparser#parse_args_into_dataclasses((Docs).
Basic usage is very simple (and more important: very easy to extend/customize):
from dataclasses import dataclass, field
from transformers import HfArgumentParser
@dataclass
class MyConfig:
a: int = field(default=0, metadata={"help": "Some int"})
b: str = field(default="?", metadata={"help": "Some string"})
parser = HfArgumentParser(MyConfig) # contains multiple useful methods
args = parser.parse_args() # gives you a Namespace like argparse does; the Namespace will in this case have `a` and `b` in it
HfArgumentParser(MyConfig).parse_args(["--help"]) # will print out the help, similar to argparse, but help is automatically generated from the MyConfig metadata
I don’t know if I should feel sorry or feel flattered that I made you doubt your original approach (match statements are nice!), but I guess it’s good to know your options