My research so far has told me:
-
Up until Python 3.2 (including all 2.x branches), Python (at least CPython) bytecode (.pyc) files consisted of: a 4-byte magic number (2 bytes identifying the bytecode version, plus a constant
0d 0a
used to detect corruption by reading or writing in text mode); a 4-byte timestamp; and a marshalled code object. -
Python 3.3 added a 4-byte value after the timestamp representing the length in bytes of the corresponding source code file.
-
Python 3.7 implemented PEP 552, thereby adding 4 bytes (used as a single boolean flag) before the timestamp field (when set, the “timestamp” Is interpreted as a SipHash instead - incidentally, the corresponding link in the PEP is broken).
What I don’t understand is why the source-length field was added. The “What’s New in 3.3” doc tells me:
importlib.abc.SourceLoader.path_mtime()
is now deprecated in favour ofimportlib.abc.SourceLoader.path_stats()
as bytecode files now store both the modification time and size of the source file the bytecode file was compiled from.
While there are many other mentions of changes to the import process and bytecode format, this was the only thing I could find in the document that concretely describes a purpose for, or use of, the new field. I also don’t see a PEP describing the addition of this field.
Surely it wasn’t added just so that importlib.abc.SourceLoader.path_stats()
could exist and get both pieces of information in the same place?
In what contexts does this information actually matter?