I was poking around in marshal.c the other day and noticed something funny. marshal.c knows how to handle scalar types like strings, ints, and floats, collection types like lists, dicts, and sets, and singleton values like None, True, False, and StopIteration.
… wait, what? StopIteration? Why does marshal handle StopIteration? There are no comments in marshal.c explaining why. It just goes ahead and marshals / unmarshals StopIteration like it was a perfectly sensible thing to do.
Can someone explain why marshal.c might have to serialize / unserialize StopIteration?
Perfectly valid! But ValueError is also a perfectly valid value to be using in code, and yet it’s not special-cased in marshal.c like StopIteration is.
Also, I’m not sure how marshal.c would encounter the valueStopIteration. I’d expect it to encounter the string'StopIteration', as an entry in a co_names table, for use by the LOAD_GLOBAL opcode, for example.
Generators, of course, have special handling for StopIteration. I’m guessing at the time the reason for using LOAD_CONST rather than LOAD_GLOBAL was to avoid funky behavior in case the name StopIteration was shadowed? Not sure though, maybe @tim.one remembers
We have a similar case today where we want assert to work correctly even if someone shadows the name AssertionError. Today we handle that via the dedicated LOAD_ASSERTION_ERROR opcode, but if we wanted to, we could just as well handle it by using LOAD_CONST and supporting AssertionError in marshal.
I presume the question is why StopIteration is apparently unique in being given an ascii code ('S"). I would assume that it has something to do with its somewhat unique role in iterators. To find out more, I would start with git blame on those specific lines of code.
Please do not do this. Even if it no longer used in pyc files, the marshal format is a general data format and is used for data serializing (taking in account the limitations). There are marshal files created by previous versions of Python. Serialization of code objects is not stable, every Python version is incompatible with other versions, but besides code objects, the rest of the marshal format support backward compatibility from beginning.
There was a precedence. TYPE_INT64 was added to support 64-bit ints on 64-bit platforms. This made pyc files platform depending. After unifying int and long the support of TYPE_INT64 was removed, and int outside 32-bit range was saved as long. But this caused problem with compatibility. The support of TYPE_INT64 was partially restored (reading only).