Why does marshal handle StopIteration?

I was poking around in marshal.c the other day and noticed something funny. marshal.c knows how to handle scalar types like strings, ints, and floats, collection types like lists, dicts, and sets, and singleton values like None, True, False, and StopIteration.

… wait, what? StopIteration? Why does marshal handle StopIteration? There are no comments in marshal.c explaining why. It just goes ahead and marshals / unmarshals StopIteration like it was a perfectly sensible thing to do.

Can someone explain why marshal.c might have to serialize / unserialize StopIteration?

1 Like

Isn’t marshal’s primary use for .pyc files and StopIteration is a perfectly valid value to be using in code?

I suggest linking to the specific bit of code.

It is!

Perfectly valid! But ValueError is also a perfectly valid value to be using in code, and yet it’s not special-cased in marshal.c like StopIteration is.

Also, I’m not sure how marshal.c would encounter the value StopIteration. I’d expect it to encounter the string 'StopIteration', as an entry in a co_names table, for use by the LOAD_GLOBAL opcode, for example.

1 Like

This behavior was added when generators were added to the language in 2001: Merging the gen-branch into the main line, at Guido's direction. Yay! · python/cpython@5ca576e · GitHub

Generators, of course, have special handling for StopIteration. I’m guessing at the time the reason for using LOAD_CONST rather than LOAD_GLOBAL was to avoid funky behavior in case the name StopIteration was shadowed? Not sure though, maybe @tim.one remembers :slight_smile:

We have a similar case today where we want assert to work correctly even if someone shadows the name AssertionError. Today we handle that via the dedicated LOAD_ASSERTION_ERROR opcode, but if we wanted to, we could just as well handle it by using LOAD_CONST and supporting AssertionError in marshal.

3 Likes

I presume the question is why StopIteration is apparently unique in being given an ascii code ('S"). I would assume that it has something to do with its somewhat unique role in iterators. To find out more, I would start with git blame on those specific lines of code.

Try removing it and see if anything breaks.

3 Likes

It looks like the need for the compiler to load StopIteration as a constant was removed not very long after, in Change the semantics of "return" in generators, as discussed on the · python/cpython@ad1a18b · GitHub – I think ever since then the support in marshal has been dead code. I expect you could remove it without consequence (except maybe to some third-party code doing something very odd and unsupported with marshal.)

Or, as I like to think of it, Implementation Jenga.

12 Likes

It’s worth noting that this behavior is documented explicitly, so I suppose removing it requires going through the usual deprecation hoops.

If you’re referring to the documentation on the marshal library module, it’s actually helpfully vague on this point. Emphasis mine:

1 Like

Please do not do this. Even if it no longer used in pyc files, the marshal format is a general data format and is used for data serializing (taking in account the limitations). There are marshal files created by previous versions of Python. Serialization of code objects is not stable, every Python version is incompatible with other versions, but besides code objects, the rest of the marshal format support backward compatibility from beginning.

There was a precedence. TYPE_INT64 was added to support 64-bit ints on 64-bit platforms. This made pyc files platform depending. After unifying int and long the support of TYPE_INT64 was removed, and int outside 32-bit range was saved as long. But this caused problem with compatibility. The support of TYPE_INT64 was partially restored (reading only).

3 Likes

I didn’t say ‘and commit’.

12 Likes

A bit off topic but it’s quite annoying that marshal is the encoding format for pstats, makes analysing performance on a different platform from where it was captured fiddly.