Is this the right forum for concerns about the standard library?
gzip compression, using class GzipFile from gzip.py, by default
inserts a timestamp to the compressed stream. If the optional
argument mtime is absent or None, then the current time is used [1].
This makes outputs non-deterministic, which can badly confuse
unsuspecting users: If you run “diff” over two outputs to see
whether they are unaffected by changes in your application,
then you would not expect that the *.gz binaries differ just
because they were created at different times.
I’d propose to introduce a new constant NO_TIMESTAMP as
possible value of mtime.
Furthermore, if policy about API changes allows, I’d suggest
that NO_TIMESTAMP become the new default value for mtime.
How to proceed from here? Is this the kind of proposals that
has to go through a PEP?
Could most of the benefit not be achieved by simply adding an explanation to the documentation, suggesting that if you need deterministic output you should pass mtime=0?
I have no opinion on the question of changing the default, as I’ve never needed gzip output to be deterministic.
Thanks for mtime=0. I wasn’t aware of that possibility. Does it prevent the timestamp from being written, or does it expand to a constant (incorrect, but almost always inconsequential) date? Epoch = 1970?
According to the docs here, “All gzip compressed streams are required to contain this timestamp field”. So it inserts a timestamp of 0, because you can’t omit the timestamp altogether.