Customizable repr()

15r10nk · June 13, 2024, 3:25pm

hi, I’m the author of inline-snapshot which uses repr() to create the source representation from the value you want to snapshot.

But I have a problem with the way how repr() works.

>>> from enum import Enum
>>> E=Enum("E",["a","b"])
>>> repr(E.a)
'<E.a: 1>'
>>> repr([E.a])
'[<E.a: 1>]'
>>> repr(int)
"<class 'int'>"

There are some types which do not return a valid python representation, but could.
I know that it is not possible to change cpython just for my use cases, but I search for a way to customize it.

My current solution looks like this:

from functools import singledispatch
from unittest import mock

real_repr=repr

@singledispatch
def code_repr_dispatch(v):
    return real_repr(v)

def code_repr(obj):
    with mock.patch("builtins.repr", code_repr):
        return code_repr_dispatch(obj)

@code_repr_dispatch.register
def _(v: Enum):
    return f"{type(v).__qualname__}.{v.name}"

The problem is that I have to re-implement all possible container types to make it work recursively like

@code_repr_dispatch.register
def _(v: list):
    # repr calls actually code_repr
    return "[" + ", ".join(map(repr, v)) + "]"

This is because code_repr works recursive if it is called with repr() but not if it is called with f"{value!r}" or PyObject_Repr` in C.

My question now is: Does anyone know a way to make this work for the other two cases?

One of my Ideas is to change python and add a second optional argument to repr(obj, handler) . The handler would be called called before obj.__repr__ and can be used to overwrite the default behavior for a object. The implementation would be similar to my code_repr approach but on PyObject_Repr level. It would require a global thread-local variable to store the handler and to make recursive repr(obj) calls work. I don’t know if this idea would work out or lead to other issues.

A solution would not only be useful for inline-snapshot but also for reprlib which has the same problems with recursive calls to custom types.

My hope is to find a way to customize only the types which need customization and not all the container types. Maybe someone has an idea.

blhsing · June 14, 2024, 3:11am

To achieve what you want without making changes to CPython I would take an entirely different approach of building a dedicated parser that parses representations of all known types (i.e. a parser that understands '<...>' reprs, with a grammar slightly modified from Python’s) into an AST, from which a representation that can be evaluated by Python can be generated by applying custom repr logics to types with bad reprs.

skirpichev · June 14, 2024, 6:21am

According to docs, this is just “some useful description”, not something that could give you precise instructions to build the object.

blhsing · June 14, 2024, 6:37am

With the repr string alone, certainly not, but the OP is trying to make an improved repr function here, and the reprs of known types all give us enough details to identify objects being represented at the time of the repr calls. These details include either class names or object ids, with which we can obtain relevant objects by looking up names in local/closure/global namespaces, by evaluating the names, or by converting the identities to objects with ctypes.cast. We can then extract from these objects further details necessary to produce new reprs with enough information to rebuild the objects.

blhsing · June 14, 2024, 9:07am

Another entirely different approach that should work well for the purpose of your project:

Instead of trying to produce an evaluatable representation of an object so that it can be used as the expected value of an assertion in your project, take the repr of the object as-is as a string, and compare it with the repr of the target value:

from enum import Enum

class Snapshot:
    def __init__(self, snapshot):
        self.snapshot = snapshot

    def __eq__(self, value):
        return repr(value) == self.snapshot

E = Enum("E", ["a", "b"])

assert E.a == Snapshot('<E.a: 1>')
assert [E.a] == Snapshot('[<E.a: 1>]')
assert int == Snapshot("<class 'int'>")

You can add additional wildcard logics to allow matching reprs with object ids.

If every detail of an object is important, consider using a proper serializer such as pickle.dumps instead.

15r10nk · June 15, 2024, 6:23pm

Thank you for your tips Ben

I convert objects which have non parsable representations into assert [E.a] == snapshot([HasRepr("<E.a: 1>")]) as a kind of fallback in my current feature branch.

Doing it for the whole expression is not possible because inline-snapshot supports also <= in and snapshot[key]. I need the python objects in these cases.

This solution was also proposed in the original issue, but the problem is that two types might produce two repr strings which are indistinguishable from each other but need to be converted to different code.

Things like pandas DataFrames have completely different repr without <>

Going down this path could lead to a lot of problems later.

kknechtel · June 16, 2024, 2:02am

I think the underlying idea here is really that the algorithm for traversing an object graph (and detecting cycles) doesn’t seem to be factored out and exposed in the standard library. The built-in repr uses this, and a custom repr could basically just use the same algorithm but with a different method on each node. The same algorithm, fundamentally, is also needed by copy.deepcopy, as far as I’m aware.