Recursive structures in marshal

The marshal format was initially designed for serializing pre-parsed Python modules. It supports code objects (the module initialization code and the code for classes and functions) and since code object contains references to constants, it supports also simple data types which can be used in constants: integers, strings, tuples, frozensets, etc. For some reason the support of mutable collections (lists, dicts, sets) was also added. So now marshal can be used as a simple but fast data serialization format. Strong backward compatibility is supported, except for code objects, so it was not safe to load arbitrary marshal data evem if you do not execute it. Since 3.13 you can forbid marshalling and unmarshalling code objects and use it as data-only format (it is still not completely secure).

Initially, marshal only supported a tree data structure – serialized parent contains serialized children. But support of reference was added in 3., which allowed to save memory for repeated references to the same integer, string or tuple, but also allowed limited support of recursive structures. It was not so limited for initially supported types, but adding support for new types exposed the limitation of such support, For example, you can serialize a list containg a slice object referring to the parent list and successfully deserialize it, but when you try this when a slize object is the top object, you will get an error during deserialization (see Crash: marshal.loads SIGSEGV on self-referencing TYPE_TUPLE with FLAG_REF · Issue #148653 · python/cpython · GitHub). And you get other issue with frozendicts – the nested frozendict is deserialized as a dict (see Crash: marshal.loads SIGSEGV on self-referencing TYPE_TUPLE with FLAG_REF · Issue #148653 · python/cpython · GitHub). The marshal design does not allow supporting such kind of recursion, the only way to fix the deserialization bugs is to forbid serialization of them. They cannot occur in the pyc file and we did not get reports, so we can suppose that such kind of recusive structures are not used on practice. It was fixed in gh-148653: Fix some marshal errors related to recursive immutable objects by serhiy-storchaka · Pull Request #148698 · python/cpython · GitHub.

But there the same kind of recursion works in most cases with tuples. Both a list containg tuple containing the original list and a tuple containg list containing the original tuple work. Although the implementation creates partially initialized tuple and uses it before completing the initialization. This works for lists and like when “using” only means taking and storing the reference. But when such incomplete tuple occurs as a part of dict key or set element, “using” means calculation of the hash and comparison with other objects. It will crash. And even if not crash, the hash and the comparison result are invalid. The simplest way to solve this is to forbid any use of incompletely initialized tuple during deserialization, like for any other type. Yet one argument for forbidding any use, even storing a reference, is that currently it allows to create self-referencoing tuples which cannot be collected by the garbage collector, this lead to memory leaks.

Such a long preamble was needed to ask one question: Does anyone have any real world examples of the code that would be broken as a result of such a change? It will forbid recurion like “tuple → something (list, dict, etc) → original tuple”, but “something (list, dict, etc) → tuple → original something” will still be supported. I think this is very very unlikely, but I need to ask.