PEP 784: Adding Zstandard to the standard library

jamestwebber · April 11, 2025, 8:35pm

I do think these goals are independent though. The value of getting it into 3.14 is that it accelerates packaging improvements by a year, which is nothing to sneeze at. That improves pypi and a lot of user experiences.

The experience from other wheel related proposals is that the oldest supported version is a blocker. Better to get something in now and move it later.

mikeshardmind · April 11, 2025, 8:50pm

Is the length of the name actually that much a problem? This might be a blind spot for me in terms of “bother” as I’ve always just done import concurrent.futures as cf Any place where I’m importing it, people working on the code understand that as easily as they would np for numpy, as a convention

In terms of the namespace being a failure (only having one member), I don’t think past is an indicator of future here. I think we already have ample evidence that long term, it would be ideal grouping for existing and future compression support. Unlike concurrency where there are function color problems, the compression apis are conceptually related in a way that lends to “just swap for the one you want”

jamestwebber · April 11, 2025, 10:19pm

<cries in-xml>

More seriously: I thought of that due to the above discussion. xml is an example where the submodules aren’t interchangeable but it is still a useful grouping of related functionality.

There are others though… asyncio is a big one

methane · April 11, 2025, 11:22pm

I don’t think so. compression package will remain forever, not temporal.

compression.zstd (or compress.zstd) doesn’t feel dirty.
On the other hand, adding ~lib or lib~ looks more dirty to me.

zstd is conflicting now. we can not add zstd without considering conflict.

thomas · April 12, 2025, 12:58am

It’s not whether we can live with what we have. It’s whether we are doing ourselves and the users a disservice by continuing to expand the big, messy, flat namespace for both the standard library and any third-party library. We have the opportunity here to provide some sensible organisation in one small part of the stdlib, because of the most sensible top-level name being unused. On top of that, the sensible names for the module being added are not unused, and not using the sensible top-level package leaves us with more inscrutable mess.

It’s all well and good thinking we can get something better, but so far we haven’t had any proposals for this “better” organisation actually pan out. The choice isn’t “compression.zstd” or “a better organised stdlib”. It’s “compression.zstd” or “confusing name for zstd and not a better organised anything”.

emmatyping · April 12, 2025, 1:47am

For what it’s worth, to evaluate how damaging introducing the name compression is, I did a search of GitHub for files containing import compression. This is, as far as I am aware, the only import pattern which would break by introducing compression at the toplevel. The total number of files found match was 82 (page may take a while to execute the search): Code search results · GitHub. For reference, there are almost 90 million Python files on GitHub.

So I think that reinforces my belief that putting compression at the toplevel will not break a large number of users.

emmatyping · April 12, 2025, 2:10am

Could you explain the issue with concurrent a bit more? I’m not sure I understand what the concern is and how that relates to compression.

ngoldbaum · April 12, 2025, 3:14am

Also it fixes a concrete issue on the free-threaded build: no zstandard support.

We’ve been trying to get a version of python-zstandard shipped with free-threaded support but have had to wait for code review feedback. If Python 3.14 ships zstandard support in the standard library then that provides an obvious migration path for tools that currently depend on python-zstandard.

The biggest concrete thing this would fix is making pip install hatch work on the free-threaded build, assuming hatch switches to using the standard library zstandard bindings in 3.14.

nas · April 12, 2025, 4:25am

Here’s another idea. Rather than compression.zstd, how about std.zstd? I think the idea of adding a top-level package to contain standard library modules has broad support. It’s the details about it that we haven’t yet figured out. For the 3.14 release, maybe the only thing in that package would be the zstd. Given more time we can come up with a well thought out plan to allow other standard library packages to be imported from there.

Doing this only requires a few decisions. What should the top-level package be called? I think std is fine. What would be our preferred name for this particular package? I think zstd and zstandard are the obvious choices.

BrenBarn · April 12, 2025, 5:47am

You replied to my comment with this and the tone of your comment seems to suggest you’re disagreeing with me, but the intention of my comment was agreement with basically everything you said here. I agree that the flat stdlib namespace is becoming a burden. I think having everything under a stdlib.* namespace would be an improvement.

malemburg · April 12, 2025, 8:44am

No, the discussion was about separating the addition of the module from adding a new top-level compression package.

The current issues we have with finding a non-conflicting name for the top-level Zstandard module name will go away once we have a top-level stdlib package. I also believe that people will greatly prefer e.g. std.zstd over having to write std.compression.zstd.

As such, the current choice for the Zstandard module name is only a temporary measure, until we have the stdlib package in place.

That’s a great idea

Do note that the top-level stdlib package name is not clear yet…

My favorite would be py, but std mentioned by @methane would work as well, as long as it’s short and somehow related to the Python stdlib.

py.zstd would be very close to the current name of the module (leaving aside the changes Emma mentioned). But it’s more likely going to be std, since that name seems to be reserved for internal use on PyPI already, whereas py was in use for a long time by the pytest maintainers.

Regardless, even if we end up with a py stdlib package, we could always link std.zlib back to py.zlib to remain backwards compatible.

So yes, that idea would resolve the naming conflict and will likely be forward compatible at the same time.

concurrent was added to the stdlib as package with the aim of having more sub-modules for concurrent processing related things in that package.

This hasn’t happened. threading, multiprocessing, subprocess, asyncio, etc. are still top-level modules (also see Concurrent Execution — Python 3.13.3 documentation and asyncio — Asynchronous I/O — Python 3.13.3 documentation).

Zheaoli · April 12, 2025, 9:37am

If the compression package is intended to exist permanently, then I acknowledge its necessity. However, we should clearly articulate in the PEP that the core value of introducing a compression top package isn’t merely to avoid namespace conflicts, but rather to unify the various compression implementations currently existing in CPython under a single namespace - similar to what we’ve done with hashlib. This provides organizational clarity and consistency across the standard library.
To many people, this might seem like just semantics, but I believe it’s important to explicitly state the purpose of introducing the compress top package in the PEP. Having clear documentation of intent will help guide future decisions about the standard library’s organization.

morotti · April 12, 2025, 12:36pm

Joining the bike shedding

I am also very strongly opposed to making a “compression.*” subpackage and moving libraries there.
I’d like to see that removed from the PEP asap, as a PEP about "adding compression library xxx to the standard library " shouldn’t be hiding the refactoring of every compression library on the second line of the description.

One issue I haven’t seen mentioned is the packaging tools and importing machinery relying on zlib to import and install modules, as packages are zip files and modules can be zip files. There is potential for some very nasty bugs if compression libraries (notably zlib) start moving into a namespace subpackage.
I have enough import issues on my work day that I’d very much prefer not moving compression libraries and certainly not into namespace packages.

If we want to future proof, I’d rather we consider adding new modules into a stdlib directory.

On the name:

existing zstandard package has 2 000 000 daily download PyPI Download Stats
existing zstd package has 50 000 daily downloads PyPI Download Stats
existing zstdlib package has 10 daily downloads (only 1 download yesterday). it might as well not exist

I think zstdlib is good if you want to avoid the conflict with packages on pypi.

There will probably be more cases of adding xxxlib for new standard modules, as there will always be existing packages on pypi covering new libraries faster than the interpreter.
I wouldn’t take tomli/tomllib as a reason to avoid doing that. It was just an unfortunate case where the name already ended with -li

For reference, the next compression library may be zlibng (or zlibnglib or whatever), as there is work ongoing to replace zlib with zlib-ng, maybe both imports will become a thing. (though we’re trying to replace zlib in place as zlib-ng can be compiled in compatibility mode to provide the same API).

jamestwebber · April 12, 2025, 3:06pm

I’ll just note that there are less than four weeks before the beta freeze for 3.14, and I’m not sure what the SC schedule is in the meantime^[1]. Actually adding the module should be a quick process but it’s not instant.

I would like to re-up my earlier suggestion–punt on the naming discussion^[2] and add _zstd as an experimental module in Python 3.14.

is that posted somewhere? ↩︎
which doesn’t seem like it’s wrapping up soon ↩︎

gerardw · April 12, 2025, 3:59pm

As an end user, given the choice between a stdlib package with a scary single underscore, which PEP 8 says means

weak “internal use” indicator

. and pypi.zstd with a long history and 175 github stars, I’d use the pypi package.

Nineteendo · April 12, 2025, 4:02pm

You still need to check that _zstd doesn’t get installed in the root namespace by the existing zstandard libraries:

zstd uses zstd
zstandard uses zstandard.backend_c
pyzstd uses pyzstd.c._zstd (a child package)

Seems OK.

jamestwebber · April 12, 2025, 4:07pm

That’s totally fine, though. The primary benefit of having _zstd in 3.14 is not for end users at all. It’s so that some future PEP like PEP 777: How to Re-invent the Wheel can point to this PEP and say “zstd has been available since 3.14 so we can now introduce it into the wheel format” and installers like pip and uv can write a shim to use the experimental version if they need to.

gerardw · April 12, 2025, 4:12pm

+1 for std.zstd in 3.14

Mature reasonable developers will get that a longstanding language will have some inconsistencies creep it. It’s not really a big deal. Just stick a one-liner in the Documentation:

While most standard library functions traditionally used a top level module, due to the widespread use of the pypi.org repository, new standard library modules are often added as std.module_name to avoid breaking existing code.

Newer, inexperienced pedantic developers will never be happy. e.g. complaining about the pre PEP-8 camel case logging functions.

emmatyping · April 12, 2025, 4:31pm

If we get a top-level stdlib package. It’s really hard for me to argue that we should use compression against an amorphous hypothetical. It’d be a huge change, and would break some user code regardless of what prefix is chosen (e.g. std is used as a module name 92 times on GitHub).

Also if we’re ok with std.zstd, and we aren’t sure if std as a whole is going to be approved (which we aren’t!), couldn’t we add compression, and in the future PEP that defines std, specify that sub-modules of compression would be moved under std directly? There’s a 5 year window before we even deprecate the existing module names. There’s plenty of time to introduce std and end up in the state that you are proposing.

The problem with this proposal is that I would be asking the Steering Council to pre-approve a PEP not laid out in this PEP, whose design has yet to be fleshed out. If I were in their shoes, I wouldn’t want to approve that.

I think compression is altogether quite different. Instead of making a namespace to have modules moved into in the future, this PEP proposes a namespace and a set of modules that should exist in it from the start. There will be compression.[zlib,lzma,bz2,zstd] in 3.14 (or at this rate, more likely 3.15). So there’s no reason that it would be stuck to a single module like concurrent.

Jelle · April 12, 2025, 6:14pm

This PEP proposes adding the new module to Python 3.14, which comes out in October. It seems rather pessimistic to think that nobody will be able to provide free-threaded zstd support on PyPI before then (either by improving an existing package or adding a new one).