Dealing with (very) complex dictionary structures

I have a Dictionary containing downloaded information about the performance of my household electrical systems. To be clear, the API I am using returns a JSON array which requests is converting into the Dictionary. Its contents, displayed by print(), look like:

{‘energyDetails’: {‘timeUnit’: ‘DAY’, ‘unit’: ‘Wh’, ‘meters’: [{‘type’: ‘Production’, ‘values’: [{‘date’: ‘2024-12-19 00:00:00’, ‘value’: 9426.24}, {‘date’: ‘2024-12-20 00:00:00’, ‘value’: 902.317}, {‘date’: ‘2024-12-21 00:00:00’, ‘value’: 1082.8201}, {‘date’: ‘2024-12-22 00:00:00’, ‘value’: 2830.143},…]} {‘type’: ‘Consumption’, ‘values’: [{‘date’: ‘2024-12-19 00:00:00’, ‘value’: 45110.242}, {‘date’: ‘2024-12-20 00:00:00’, ‘value’: 46978.316}, {‘date’: ‘2024-12-21 00:00:00’, ‘value’: 55941.82}, {‘date’: ‘2024-12-22 00:00:00’, ‘value’: 63911.145},…]}]}}

In reality, there are a substantially larger number of date/value pairs in each meter type block than I have copied here. I need to iterate over the date/value pairs separately for each meter type to get cumulative totals, and I am having a lot of trouble figuring out how to reference the sub-dictionaries so that I can do this. The documentation I can find only seems to consider straightforward tree structures, and the structure I have here is significantly more complex than that. Any pointers will be appreciated!

A dictionary is a collection of key/value pairs. The value can itself be a dictionary.

If you know how to access a value given its key, then you already know to access a value of a subdictionary.

d = {...} # Your dict.
print(d['energyDetails']) # A subdict.
print(d['energyDetails']['timeUnit']) # A value from that subdict.

…and that is what all the documentation that I have read says! However, if you look carefully at the deeper structure of the dictionary I am dealing with, it doesn’t continue to follow the simple branching model implied by your post. At the next level, you have: ‘meters’: [{type: ‘Production’, ‘values’: [{‘date’: datestring, ‘value’: numericvalue}, …]} {‘type’: ‘Consumption’. ‘values’: [{…

I need to iterate through the date/value pairs within the directories corresponding to the different meter types.

So, the values of a dictionary are sometimes lists, and the members of those lists can be dictionaries.

What you have here, using an analogy, is states having counties, counties having multiples cities, cities having multiple neighborhoods, neighborhoods having multiple streets, streets having multiple houses. In Python jargon, it is referred to as having a nested “X”, where “X” can be dictionaries, lists, tuples, etc. In your script, you have shown that they are any of these. What you have to do is methodically go one by one (albeit a bit tedious due to the nature of the dictionary structure) until you get the value that you want.

Here is a simple example.

some_values = {'a': 1, 'b': [2,3,4,5], 'c': {'d': [66, 77], 'e': (88, 99, [100, 200, 300])}}

print(some_values['a'])              # 1
print(some_values['b'][2])           # 4
print(some_values['c']['d'][1])      # 77
print(some_values['c']['e'][1])      # 99
print(some_values['c']['e'][2][2])   # 300

Study this simple example but the fundamentals hold true for the “complex” dictionary in your script. If it helps, use a piece of scratch paper to keep track of the nested data structures as you’re traversing the dictionary.

You can also edit your dictionary to add spaces between key:value pairs to enhance readability.

Edit:
When you come across an opening {, [, or (, immediately look to see where the closing pair is located.

1 Like

For me it’s somewhat unclear what the structure is. But based on my understanding something like below can be written:

data = {"energyDetails":
           {"timeUnit": "DAY",
            "unit": "Wh",
            "meters": [
                        {"type": "Production",
                         "values": [
                            {"date": "2024-12-19 00:00:00", "value": 9426.24},
                            {"date": "2024-12-20 00:00:00", "value": 902.317},
                            {"date": "2024-12-21 00:00:00", "value": 1082.8201},
                            {"date": "2024-12-22 00:00:00", "value": 2830.143}
                                   ]
                        },
                        {"type": "Consumption",
                         "values": [
                             {"date": "2024-12-19 00:00:00", "value": 45110.242},
                             {"date": "2024-12-20 00:00:00", "value": 46978.316},
                             {"date": "2024-12-21 00:00:00", "value": 55941.82},
                             {"date": "2024-12-22 00:00:00", "value": 63911.145}
                                   ]
                        },
                        {"type": "Production",
                         "values": [
                            {"date": "2024-12-19 00:00:00", "value": 100000},
                            {"date": "2024-12-20 00:00:00", "value": 100000},
                            {"date": "2024-12-21 00:00:00", "value": 100000},
                            {"date": "2024-12-22 00:00:00", "value": 100000}
                                   ]
                        },
                      ]
             }
          }

def summarize(data, meter_type):
    total = 0
    for types in data["energyDetails"]["meters"]:
        if types["type"] == meter_type:
            total += sum(values["value"] for values in types["values"])
    return total


print(f'Production: {summarize(data, "Production")}, consumption: {summarize(data, "Consumption")}')

# Production: 414241.5201, consumption: 211941.523

Thanks! That looks like exactly what I need. I will play with your script and get back with any further questions. One thing I will have to handle is the fact that ‘value’ will sometimes be missing, which I think would cause your suggested script to throw an exception, but that is a problem with a number of solutions.

I have tweaked your summarize() script a bit to handle missing value tuples and allow the specification of a range of dates for the accumulation. The result is:

def summarize3(data, meter_type, from_date, to_date):
# from_date and to_date should be in the same text format as is used in the data
    total = 0
    for types in data["energyDetails"]["meters"]:
        if types["type"] == meter_type:
            for values in types["values"]:
            	if 'value' in values:
            		if (values["date"] >= from_date) and (values["date"] < to_date):
            			total += values["value"]
    return total

Thanks, again!