PEP 784: Adding Zstandard to the standard library

Conflicts with zstandard, which with pyzstd are the main implementations currently used as far as I know (e.g. used by urllib3).

1 Like

I still have yet to see a proposed alternative name that doesn’t feel hacky. And as Steve said

Some issues with the names you proposed:

  • zstd1 - What happens when major version 2 is released? Also seems too easy to typo to zstd, potentially giving you the wrong module if zstd is installed in your site packages.
  • zstdcomp, zstdc - comp and c are not clearly compression, comp could stand for computation, compilation, composition, etc. c has even more meanings other than compression. I’m not convinced these suffixes provide much clarity
  • zst- Too easy to typo to zstd, and many people don’t know that zst is a suffix used for Zstandard compressed data. I think this would hurt discoverability.
  • libzstd - Of all of the proposals, this one is probably the most plausible replacement for compression.zstd I have seen. However, as a Windows user, it is quite weird to interact with libfoo projects when there is no convention to prefix libraries with lib. Also I would rather have zstdlib in the suffix to reduce autocomplete collisions. It also just feels hacky to append lib to any library that gets added to the stdlib…
  • pyzstd - The current reference implementation for adding Zstandard removes a number of items from the pyzstd API that are performance focused alternatives so that the API better matches existing compression library APIs. I think it would be confusing and unclear for existing users of pyzstd to deduce what exactly was removed.
  • zstd, zstandard - These names conflict with projects on PyPI that provide significantly different APIs, which would greatly complicate migration for their existing users.
3 Likes

If the introduction of a new top level name is a problem, could io.zstd work, or is compressed file IO a too large leap in scope for the io module?

zstdlib seems the best option to me. There’s plenty of precedent for the ā€˜lib’ suffix, and whilst it’s not the best name, it seems that choosing ā€˜zstd’ or ā€˜zstandard’ would be far more difficult in coordination and migration with users of affected PyPI packages. I agree with Hugo that I think this PEP should just focus on adding Zstandard support, as once you find an acceptable name, it seems there is general consensus on adding the module, and the namespace discussion can be had in a later PEP.

A

5 Likes

There’s an unrelated project on PyPI (a ā€œset of utilitiesā€) unfortunately. Having said that, it might still be a viable option if the author is willing to rename their project.

I appreciate that finding a good name in the top-level namespace is hard, but I agree with Hugo that the move to a compression namespace is probably too controversial for 3.14, whereas if there’s a possibility of getting zstd in 3.14, that extra year would be very helpful for packaging.

6 Likes

Renaming a stdlib module is going to be a bear, though.

I wonder, would it be reasonable and worthwhile to add zstd as an experimental module for 3.14 to allow a longer discussion for placement? e.g. make it available as _zstd or something, with the understanding that the module name is not subject to the standard deprecation timeline. This would allow future packaging tools to support 3.14 without committing to a location.

1 Like

Technically speaking, it could just be aliased under compression rather than renamed for whatever longer-term.

True, but using the namespace is a little less helpful if it stays in the top level as well. It could be deprecated and removed over a few releases, though

1 Like

So one question I have, given there are vocal opponents to introducing compression: is anyone opposed to naming the module libzstd? I don’t love that name, but I also am not completely opposed to it.

3 Likes

I prefer adding compression package.

This is bike-shedding, but considering the symmetry with hashlib, the name complib might also be good.

5 Likes

Yes, unless we’ve got definitive statements from every potential naming conflict that they’re unwilling to see the name added to the stdlib. If all the maintainers are threatening to rage quit the community or abuse us for taking a name that’s consistent with the rest of our packages, I’d consider this one, but otherwise I would strongly prefer something that looks vaguely like Python (and not like a Linux distro).

5 Likes

I don’t see the analogy with hashlib: it’s a module containing multiple functions (directly implemented or via openssl), not a package. The analog we have in the stdlib is concurrent, which has not been developped after the addition of futures.

I submit zstdmod instead of zstdlib ! (libzstd is too close to a C library name; our precedent is lib2to3 which I think was picked because we could not start with a number, but we also have idlelib and others)

3 Likes

How about zstandardlib?
A bit verbose, but there is no package with that name on PyPI and would be in the same pattern as hashlib, pathlib and the upcoming annotationlib and templatelib

I do like the idea of a package name to hold this new module, and providing some additional organizational benefit to the existing stdlib modules. I have a personal preference, but I’ll abstain from suggestion one here.

Something to consider perhaps is the documentation effects of adding a package. The Data Compression and Archiving section of the stdlib docs conflates both compression and archiving libraries under the same section. How would adding a new top level package name, and possibly aliasing the old names to new names under this package, improve the organization of this section?

3 Likes

I took the liberty to ask the author of the zstdlib package, Zachary Wimer (@zwimer), about using the name for the standard library.

He replied to say that he would consent to donating the name to the stdlib:

I’m happy to donate the name if it is desired for the python stdlib. If this name is chosen feel free to follow up here and let me know how you’d like to proceed / what is needed of me.

A

3 Likes

For me, It’s looks great to support ZSTD officially. But I think compression is a good way to fix the package naming conflict. The compression is just a temp patch. If we have more conflict in the future, patch for every packge will make the codebase dirty

So I think there are two different problem we may need to solve

  1. Should we support ZSTD? Of course yes
  2. How to solve the naming conflict between PyPi and stdlib? we may need two different PEP

As what @hugovk said. I prefer Focus this PEP only on adding Zstandard to the stdlib. . I think this is the core problem we need to solve in this discussion

To add to the bikeshed pile, I suggest the_facebook_compression_format_that_has_standard_in_the_name_but_is_not_a_standard. Nothing like that should exist in PyPI.

On a more serious note, I think the name should just be zstd or zstandard, even if the PyPI package authors don’t like it. At the same time, the sys.path order should be fixed to put the stdlib in front. That will also help many newbies who named their script where they are experimenting with, say, JSON parsing, json.py and have issues using the stdlib json module.

This is a totally separate proposal, it doesn’t belong in this PEP.

4 Likes

We’ve lived without it for this long. I don’t really see the urgency to rush it in if we can instead use the extra time to improve the organization of the stdlib.

Three cheers for this idea.

3 Likes

I’m really happy with any name, as long as it’s not something called compression.zstd. The long package name approach taken with concurrent has already failed on us. Let’s not make that same mistake again.

AFAIK, all other ā€œpackagesā€ in the stdlib are really just modules from the user perspective, except perhaps one: os.path, but that’s an alias name to a platform specific module and not really a package in itself.

I’ll get to work on a new PEP to suggest migrating the Python stdlib to a package in a backwards compatible way to eventually avoid naming collisions. When Python started, we did not have thousands of packages on a package index. I believe the time has come to realize that the issues with naming collisions are starting to hinder development. This thread is another good example of the issues involved.

Thanks for hanging in there, while the bike shedding continues :slight_smile:

13 Likes