Every algorithm proposed for inclusion in the compression namespace handles files—except zlib. If we exclude gzip (which does handle files), what’s the actual rule? Or is this just inconsistent? Should authors have used “zlib” name for gzip too?
This argument applies equally well to every module under discussion.
The problem here is gzip should’ve been called gzfile to match tarfile and zipfile. But that’s historical now. (Personally I’d put them all under compression anyway, without removing any existing modules from the top level.)
I think it would be useful to look at other languages and how they refer to GZip. In the below table links that put GZip under a zlib package are considered “Compression” since zlib is a compression library.
| Language | Compression or Archive | Link |
|---|---|---|
| Golang | Compression | https://pkg.go.dev/compress/gzip |
| Ruby | Compression | class Zlib::GzipFile - Documentation for Ruby 3.5 |
| Rust | Compression | GitHub - rust-lang/flate2-rs: DEFLATE, gzip, and zlib bindings for Rust |
| Haskell | Compression | zlib: Compression and decompression in the gzip and zlib formats |
| C# | Compression | GZipStream Class (System.IO.Compression) | Microsoft Learn |
| Java | Archive | java.util.zip (Java Platform SE 8 ) |
| NodeJS | Compression? | Zlib | Node.js v23.11.0 Documentation |
| Web APIs | Compression | Compression Streams API - Web APIs | MDN |
| PHP | Compression | PHP: gzcompress - Manual |
| Perl | Compression | IO::Compress::Gzip - Write RFC 1952 files/buffers - Perldoc Browser |
Based on the above I think it is very common to refer to GZip as a compression format, so I think it should go under compression.
Of the above, Perl and C# refer to ZIP as part of the compress namespace, but the rest do not include it. So I think I will maintain that zipfile and tarfile probably shouldn’t go under compression.
Agreed. FWIW, the current Wikipedia entry is.
gzip is a file format and a software application used for file compression and decompression.
Based on the above table, I made a PR to update the PEP to include gzip under compression and expand the rejected ideas section.
I hope to submit the PEP to the Steering Council tomorrow, barring any further concerns. This would allow a few weeks before the beta cutoff for 3.14 to merge the implementation. In the mean time, I will be working on polishing up my branch so that I can break it up into mergeable MRs.
I would like to take this opportunity to mention that I have a few open issues on my CPython fork discussing API design choices for the new compression.zstd module. Mostly weighing keeping the module the same as the existing pyzstd API vs making it match existing compression modules in the standard library more.
Thank you again to everyone for the feedback and discussions!
Agreed, and perhaps we should have done that in Python 3.0?
I would hope that the nesting be kept to two levels, i.e. std.zstd, std.zlib, std.futures, etc. Namespaces should only be created to avoid conflicts, not for taxonomic purity.
Well, are we concretely planning to move zlib and lzma there? Otherwise, a partial taxonomy only makes things more confusing and awkward for the user.
Yep! The PEP enumerates the modules going under compression: PEP 784 – Adding Zstandard to the standard library | peps.python.org
New import names
compression.lzma,compression.bz2,compression.gzipandcompression.zlibwill be introduced in Python 3.14 re-exporting the contents of the existinglzma,bz2,gzipandzlibmodules respectively.
I think the question was whether they’d be removed from the top level. But this is also in the PEP, in the next section, with an overall process that takes ten years.
I believe this PEP should also address the potential future namespacing of the entire standard library, perhaps under std. In such a scenario, would modules like compression.zstd reside at std.compression.zstd or be moved to std.zstd?
Why should this PEP do that? There’s no point in putting nonbinding future plans in here, is there?
Exactly. We really have no idea what a stdlib reorg would look like.
I agree with Barry and James on this. Since we don’t know what a std namespace might look like, anything said in this PEP wouldn’t be much more than an educated guess based on a design sketch. I don’t think that’s a good basis to design a specification.
Timing wise, I don’t think there’s a need to specify it anyway. A future PEP introducing std can always make whatever proposal it wishes. That seems like a much better time to look at the location of these modules as presumably there will be a fully fleshed out design in hand.
The rejected ideas section says:
a future PEP introducing a
stdnamespace could always define that thecompressionsub-modules be flattened into thestdnamespace.
I think this allows for a future PEP to argue for flattening the namespace into e.g. std.zstd while not making that a requirement if we end up liking the compression package.
I missed this, but I’m happy with this. Thank you.
I may as well add my thoughts on the color of this bikeshed:
- Python should immediately reserve
std, if it hasn’t already. - This PEP, which is titled “Adding Zstandard to the standard library”, should do what it says on the tin. It should not be concerned with introducing a
compressionmodule or making changes to any existing compression module that is unrelated to Zstandard. The module that implements Zstandard should be namedstd.zstd(orstd.zstif people prefer that). - Moving forward, all new Python stdlib modules should be named
std.$whatever. Python should adopt a policy to never introduce any more stdlib modules outside ofstd.- I agree with the view from @pitrou that “the nesting be kept to two levels, i.e.
std.zstd,std.zlib,std.futures, etc. Namespaces should only be created to avoid conflicts, not for taxonomic purity”.
- I agree with the view from @pitrou that “the nesting be kept to two levels, i.e.
- There should not be any rush to migrate existing stdlib modules into
std. That should be thought about separately and proposed as a future PEP. - There should certainly not be a rushed decision to migrate existing stdlib modules into
compression. If this happened, and then later it was decided to migrate the stdlib tostd.*then Python will have had two large-scale name migrations in a short space of time, which isn’t useful for anyone.
There isn’t. This is part of a 10-year plan. If it doesn’t turn out to be successful, we can always take a different path.
It’s similar to my initial concern, but honestly, it’s not something we need to worry about right now—maybe in 10 years… or maybe not.
It probably is since no project exists with that name (Client Challenge), but that is outside the purview of this PEP. Someone will need to start a separate conversation for that.
Once again, off-topic and out of scope for this PEP as specified by the PEP authors. Please either help @malemburg write a PEP as he suggested he might, or write one yourself making the proposal.
The Steering Council officially accepts PEP 784, with the following changes requested:
- Please move the included
pyzstdlicense into theDoc/license.rstfile[1], under the section titled Licenses and Acknowledgements for Incorporated Software. - Please remove the details outlining the deprecation schedule for the existing modules slated to be aliased under the
compressionpackage name. If you want to keep the section, we suggest language such as “there are currently no plans to remove the old module names, and any deprecations are left to the future”. To make sure we get the language just right, please @-mention the SC on any PEP update PRs.
Thanks and congratulations to @emmatyping!
and remove from the top-level
LICENSEfile, which should only contain the stacked CPython licenses ↩︎
I’m thrilled about zstd in the standard library!
Can you speak towards compression.zstd and setting the decompressed size header in the zstd data? A few years ago tests with python-zstandard showed this header was important to limit memory usage on decompression at higher levels.
In python-zstandard,
>>> cctx = zstandard.ZstdCompressor()
>>> with cctx.stream_writer(fh, size=data_len)
and in pyzstd, there’s “* ZstdCompressor has an undocumented method to set the size, help(ZstdCompressor._set_pledged_input_size) to see the usage.”
It seems like compression.zstd may set the header when using the one-shot APIs or certain .flush() settings but to me this is not immediately clear. Or the updated libraries may allocate buffers less agressively on decompression, mooting the issue?
Thanks,
My understanding is that as of now, the library does nothing to provide the size nor expose an endpoint to the end user to provide it.
As a result, this falls into the “Note 3” about ZSTD_CCtx_setPledgedSrcSize usage on the manual:
Whenever all input data is provided and consumed in a single round,
for example with ZSTD_compress2(),
or invoking immediately ZSTD_compressStream2(,ZSTD_e_end),
this value is automatically overridden by srcSize instead.
After trying out, I can confirm that the value is set automatically in the header, when:
- the option
CompressionParameter.content_size_flagis used with value1 - and the data is is sent in one Python call:
- either using
compress - or using
ZstdCompressor’scompressmethod withmode=ZstdCompressor.FLUSH_FRAMEand assuming no data was previously sent to for compression to that instance since the last call withmode=ZstdCompressor.FLUSH_FRAME
- either using
Said simply, the following will add the size into the compressed data:
from compression import zstd
zstd.compress(data, options={zstd.CompressionParameter.content_size_flag: 1})
Adding an option to provide the size manually to ZstdCompressor instance could be an evolution of the module, but it has the clear drawback that it will probably break decompression of the output data if the size passed by the end user is not good. This is why it was previously undocumented in pyzstd.