Hi, had a few conversations during PyCon UK and it came up that there were capabilities in the MT stream encoder/decoder that were not available in the default implementation already exposed. While drafting a proposed implementation I also spoke with @hugovk about whether this is a good idea (threads) and he suggested I open a discussion thread here as well as a PR. And it looks like a PR needs an issue needs a discussion anyway?
So you ask, why would one want to do this?
it does speed things up (you have to explicitly ask for it)
this is the encoder the xz binary actually defaults to
you have control over things like block size, and the stream encoder makes different decisions how to encode stuff - important for some people (see Friday’s PyCon lightning talks for an interested user - being able to byte-for-byte recreate an existing archive - https://youtu.be/CouUftzuQVQ?t=2327)
the MT encoder even sets different flags in the header, it always encodes the orig/compressed size, … (again for those who care about being able to generate files indistinguishable from xz et al.)
from compression import zstd
options = {
zstd.CompressionParameter.checksum_flag: 1
}
with zstd.open("file.zst", "w", options=options) as f:
f.write(b"Mind if I squeeze in?")
FWIW, I think a dictionary based approach is more consistent with the way users currently supply filters in lzma. While I like the CompressionParameter design I think it is more important to be consistent within the module.