PEP 784: Adding Zstandard to the standard library

elis.byberi · April 14, 2025, 4:57pm

Every algorithm proposed for inclusion in the compression namespace handles files—except zlib. If we exclude gzip (which does handle files), what’s the actual rule? Or is this just inconsistent? Should authors have used “zlib” name for gzip too?

steve.dower · April 14, 2025, 4:59pm

This argument applies equally well to every module under discussion.

The problem here is gzip should’ve been called gzfile to match tarfile and zipfile. But that’s historical now. (Personally I’d put them all under compression anyway, without removing any existing modules from the top level.)

emmatyping · April 14, 2025, 5:16pm

I think it would be useful to look at other languages and how they refer to GZip. In the below table links that put GZip under a zlib package are considered “Compression” since zlib is a compression library.

Language	Compression or Archive	Link
Golang	Compression	https://pkg.go.dev/compress/gzip
Ruby	Compression	class Zlib::GzipFile - Documentation for Ruby 3.5
Rust	Compression	GitHub - rust-lang/flate2-rs: DEFLATE, gzip, and zlib bindings for Rust
Haskell	Compression	zlib: Compression and decompression in the gzip and zlib formats
C#	Compression	GZipStream Class (System.IO.Compression) \| Microsoft Learn
Java	Archive	java.util.zip (Java Platform SE 8 )
NodeJS	Compression?	Zlib \| Node.js v23.11.0 Documentation
Web APIs	Compression	Compression Streams API - Web APIs \| MDN
PHP	Compression	PHP: gzcompress - Manual
Perl	Compression	IO::Compress::Gzip - Write RFC 1952 files/buffers - Perldoc Browser

Based on the above I think it is very common to refer to GZip as a compression format, so I think it should go under compression.

Of the above, Perl and C# refer to ZIP as part of the compress namespace, but the rest do not include it. So I think I will maintain that zipfile and tarfile probably shouldn’t go under compression.

gerardw · April 15, 2025, 5:09pm

Agreed. FWIW, the current Wikipedia entry is.
gzip is a file format and a software application used for file compression and decompression.

emmatyping · April 15, 2025, 7:16pm

Based on the above table, I made a PR to update the PEP to include gzip under compression and expand the rejected ideas section.

I hope to submit the PEP to the Steering Council tomorrow, barring any further concerns. This would allow a few weeks before the beta cutoff for 3.14 to merge the implementation. In the mean time, I will be working on polishing up my branch so that I can break it up into mergeable MRs.

I would like to take this opportunity to mention that I have a few open issues on my CPython fork discussing API design choices for the new compression.zstd module. Mostly weighing keeping the module the same as the existing pyzstd API vs making it match existing compression modules in the standard library more.

Thank you again to everyone for the feedback and discussions!

pitrou · April 16, 2025, 9:21am

Agreed, and perhaps we should have done that in Python 3.0?

I would hope that the nesting be kept to two levels, i.e. std.zstd, std.zlib, std.futures, etc. Namespaces should only be created to avoid conflicts, not for taxonomic purity.

pitrou · April 16, 2025, 9:28am

Well, are we concretely planning to move zlib and lzma there? Otherwise, a partial taxonomy only makes things more confusing and awkward for the user.

emmatyping · April 16, 2025, 3:35pm

Yep! The PEP enumerates the modules going under compression: PEP 784 – Adding Zstandard to the standard library | peps.python.org

New import names compression.lzma, compression.bz2, compression.gzip and compression.zlib will be introduced in Python 3.14 re-exporting the contents of the existing lzma, bz2, gzip and zlib modules respectively.

jamestwebber · April 16, 2025, 4:48pm

I think the question was whether they’d be removed from the top level. But this is also in the PEP, in the next section, with an overall process that takes ten years.

Monarch · April 16, 2025, 10:52pm

I believe this PEP should also address the potential future namespacing of the entire standard library, perhaps under std. In such a scenario, would modules like compression.zstd reside at std.compression.zstd or be moved to std.zstd?

jamestwebber · April 16, 2025, 11:06pm

Why should this PEP do that? There’s no point in putting nonbinding future plans in here, is there?

barry · April 16, 2025, 11:14pm

Exactly. We really have no idea what a stdlib reorg would look like.

emmatyping · April 16, 2025, 11:41pm

I agree with Barry and James on this. Since we don’t know what a std namespace might look like, anything said in this PEP wouldn’t be much more than an educated guess based on a design sketch. I don’t think that’s a good basis to design a specification.

Timing wise, I don’t think there’s a need to specify it anyway. A future PEP introducing std can always make whatever proposal it wishes. That seems like a much better time to look at the location of these modules as presumably there will be a fully fleshed out design in hand.

The rejected ideas section says:

a future PEP introducing a std namespace could always define that the compression sub-modules be flattened into the std namespace.

I think this allows for a future PEP to argue for flattening the namespace into e.g. std.zstd while not making that a requirement if we end up liking the compression package.

Monarch · April 17, 2025, 7:52am

I missed this, but I’m happy with this. Thank you.

rrolls · April 17, 2025, 4:59pm

I may as well add my thoughts on the color of this bikeshed:

Python should immediately reserve std, if it hasn’t already.
This PEP, which is titled “Adding Zstandard to the standard library”, should do what it says on the tin. It should not be concerned with introducing a compression module or making changes to any existing compression module that is unrelated to Zstandard. The module that implements Zstandard should be named std.zstd (or std.zst if people prefer that).
Moving forward, all new Python stdlib modules should be named std.$whatever. Python should adopt a policy to never introduce any more stdlib modules outside of std.
- I agree with the view from @pitrou that “the nesting be kept to two levels, i.e. std.zstd, std.zlib, std.futures, etc. Namespaces should only be created to avoid conflicts, not for taxonomic purity”.
There should not be any rush to migrate existing stdlib modules into std. That should be thought about separately and proposed as a future PEP.
There should certainly not be a rushed decision to migrate existing stdlib modules into compression. If this happened, and then later it was decided to migrate the stdlib to std.* then Python will have had two large-scale name migrations in a short space of time, which isn’t useful for anyone.

elis.byberi · April 17, 2025, 5:59pm

There isn’t. This is part of a 10-year plan. If it doesn’t turn out to be successful, we can always take a different path.
It’s similar to my initial concern, but honestly, it’s not something we need to worry about right now—maybe in 10 years… or maybe not.

brettcannon · April 24, 2025, 8:06pm

It probably is since no project exists with that name (Client Challenge), but that is outside the purview of this PEP. Someone will need to start a separate conversation for that.

Once again, off-topic and out of scope for this PEP as specified by the PEP authors. Please either help @malemburg write a PEP as he suggested he might, or write one yourself making the proposal.

barry · April 25, 2025, 8:48pm

The Steering Council officially accepts PEP 784, with the following changes requested:

Please move the included pyzstd license into the Doc/license.rst file^[1], under the section titled Licenses and Acknowledgements for Incorporated Software.
Please remove the details outlining the deprecation schedule for the existing modules slated to be aliased under the compression package name. If you want to keep the section, we suggest language such as “there are currently no plans to remove the old module names, and any deprecations are left to the future”. To make sure we get the language just right, please @-mention the SC on any PEP update PRs.

Thanks and congratulations to @emmatyping!

and remove from the top-level LICENSE file, which should only contain the stacked CPython licenses ↩︎

dholth · May 29, 2025, 2:28pm

I’m thrilled about zstd in the standard library!

Can you speak towards compression.zstd and setting the decompressed size header in the zstd data? A few years ago tests with python-zstandard showed this header was important to limit memory usage on decompression at higher levels.

In python-zstandard,

>>> cctx = zstandard.ZstdCompressor()
>>> with cctx.stream_writer(fh, size=data_len)

and in pyzstd, there’s “* ZstdCompressor has an undocumented method to set the size, help(ZstdCompressor._set_pledged_input_size) to see the usage.”

It seems like compression.zstd may set the header when using the one-shot APIs or certain .flush() settings but to me this is not immediately clear. Or the updated libraries may allocate buffers less agressively on decompression, mooting the issue?

Thanks,

Rogdham · May 29, 2025, 3:07pm

My understanding is that as of now, the library does nothing to provide the size nor expose an endpoint to the end user to provide it.

As a result, this falls into the “Note 3” about ZSTD_CCtx_setPledgedSrcSize usage on the manual:

Whenever all input data is provided and consumed in a single round,
for example with ZSTD_compress2(),
or invoking immediately ZSTD_compressStream2(,ZSTD_e_end),
this value is automatically overridden by srcSize instead.

After trying out, I can confirm that the value is set automatically in the header, when:

the option CompressionParameter.content_size_flag is used with value 1
and the data is is sent in one Python call:
- either using compress
- or using ZstdCompressor’s compress method with mode=ZstdCompressor.FLUSH_FRAME and assuming no data was previously sent to for compression to that instance since the last call with mode=ZstdCompressor.FLUSH_FRAME

Said simply, the following will add the size into the compressed data:

from compression import zstd
zstd.compress(data, options={zstd.CompressionParameter.content_size_flag: 1})

Adding an option to provide the size manually to ZstdCompressor instance could be an evolution of the module, but it has the clear drawback that it will probably break decompression of the output data if the size passed by the end user is not good. This is why it was previously undocumented in pyzstd.