Serialize to JSON skipping some fields depending on value

franklinvp · November 15, 2024, 12:28pm

Question: How to json.dumps(d), skipping any field which value is of type A? I would like to avoid creating a temporary version of the dictionary d in which those fields are missing.

Suppose I have

class A:
  __slots__ = ('value',)
  def __init__(self, value):
    self.value = value
  def __repr__(self):
    return f'A(value={self.value})'
  def __str__(self):
    return self.value


d = {
  'id': 'aaaa',
  'count': 222,
  'child': {
    'id': 'bbb',
    'child': A('foreign object')
  }
  // Large dictionary with moderately deep nesting
}

One possible way that I see is:

Inherit from JSONEncoder, redefine its encode(...) method to not pass _one_shot=True to its iterencode(...) call. This, such that it doesn’t call its C implementation.
Redefine its _make_iterencode(...), specifically to modify its inner function _iterencode_dict(...), which is where I need to check the type of the value before the key and the separator ": " are yielded.

Is there a simpler alternative?

brass75 · November 15, 2024, 2:49pm

def dict_skipped_value_types(input: dict, skipped_value_types: list):
    rc = {}
    for k,v in input.items():
        if any(isinstance(v, skipped_type) for skipped_type in skipped_value_types):
            continue
        if isinstance(v, dict):
            rc[k] = dict_skipped_value_types(v, skipped_value_types)
            continue
        rc[k] = v
    return rc

def json_with_skipped_value_types(input: dict, skipped_value_types: list, *args, **kwargs):
    return json.dumps(dict_skipped_value_types(input, skipped_value_types), *args, **kwargs)

That should do it (I wrote this here and haven’t tested it so YMMV but the idea is sound.

franklinvp · November 15, 2024, 3:04pm

Yes, this is the solution using a temporary, rc in this case. Imagine the variable that you called input holds a really large dictionary that already occupies a sizable chunk of memory. If only a few fields are of type skipped_value_types, then rc will also be large. The data might not get duplicated, but still, it is a large dictionary. Also we would be doing two passes, one to exclude, and one to serialize. What I want is that the act of serializing skips the required fields. That way, no temporary, and a single pass over the data.

brass75 · November 15, 2024, 3:28pm

The would require implementing your own serializer as you described. You can bring this over to the ideas section as a proposal but you have the general idea for a solution now.

Bolle · November 18, 2024, 1:44pm

Can’t you build a small class that wraps your dict and implements items() such that it filters out the unwanted types? In the end, the core of the JSON encoder is a loop like this:

for key, value in obj.items():
    # Produce serialised representation

franklinvp · November 18, 2024, 2:28pm

That should work.

There is the annoyance that what comes out of the de-serializer are dict, but well …
I could replace them once and keep the structure in memory.

Bolle · November 19, 2024, 7:02am

Replying to myself here. It is actually not that simple. What items() returns is a dict_items object. This itself implements the iterator protocol so you can iterate over the keys and values in a loop. I experimented a little with the __iter__ and __next__ methods of the dict_items object to see if I can replace them with a custom function that drops the unwanted types, but no luck so far. I will keep trying and see if I can come up with something that works.

franklinvp · November 19, 2024, 4:17pm

It is fine. The wrapper class’ items() can just yield the pairs.

class A:
  pass

class D(dict):
  def items(self):
    for k in self.keys():
      v = self[k]
      if not isinstance(v, A): # The values that we want to exclude
        yield k, v

d = D({1: 111, 2: A(), 3: 333})

import json

json.dumps(d)  # returns '{"1": 111, "3": 333}'

Bolle · November 19, 2024, 4:45pm

Ah, cool! Great you got it working. I was overcomplicating things.