PEP 784: Adding Zstandard to the standard library

Every algorithm proposed for inclusion in the compression namespace handles files—except zlib. If we exclude gzip (which does handle files), what’s the actual rule? Or is this just inconsistent? Should authors have used “zlib” name for gzip too?

1 Like

This argument applies equally well to every module under discussion.

The problem here is gzip should’ve been called gzfile to match tarfile and zipfile. But that’s historical now. (Personally I’d put them all under compression anyway, without removing any existing modules from the top level.)

8 Likes

I think it would be useful to look at other languages and how they refer to GZip. In the below table links that put GZip under a zlib package are considered “Compression” since zlib is a compression library.

Based on the above I think it is very common to refer to GZip as a compression format, so I think it should go under compression.

Of the above, Perl and C# refer to ZIP as part of the compress namespace, but the rest do not include it. So I think I will maintain that zipfile and tarfile probably shouldn’t go under compression.

9 Likes

Agreed. FWIW, the current Wikipedia entry is.
gzip is a file format and a software application used for file compression and decompression.

1 Like

Based on the above table, I made a PR to update the PEP to include gzip under compression and expand the rejected ideas section.

I hope to submit the PEP to the Steering Council tomorrow, barring any further concerns. This would allow a few weeks before the beta cutoff for 3.14 to merge the implementation. In the mean time, I will be working on polishing up my branch so that I can break it up into mergeable MRs.

I would like to take this opportunity to mention that I have a few open issues on my CPython fork discussing API design choices for the new compression.zstd module. Mostly weighing keeping the module the same as the existing pyzstd API vs making it match existing compression modules in the standard library more.

Thank you again to everyone for the feedback and discussions!

7 Likes

Agreed, and perhaps we should have done that in Python 3.0?

I would hope that the nesting be kept to two levels, i.e. std.zstd, std.zlib, std.futures, etc. Namespaces should only be created to avoid conflicts, not for taxonomic purity.

9 Likes

Well, are we concretely planning to move zlib and lzma there? Otherwise, a partial taxonomy only makes things more confusing and awkward for the user.

2 Likes

Yep! The PEP enumerates the modules going under compression: PEP 784 – Adding Zstandard to the standard library | peps.python.org

New import names compression.lzma, compression.bz2, compression.gzip and compression.zlib will be introduced in Python 3.14 re-exporting the contents of the existing lzma, bz2, gzip and zlib modules respectively.

I think the question was whether they’d be removed from the top level. But this is also in the PEP, in the next section, with an overall process that takes ten years.

5 Likes

I believe this PEP should also address the potential future namespacing of the entire standard library, perhaps under std. In such a scenario, would modules like compression.zstd reside at std.compression.zstd or be moved to std.zstd?

Why should this PEP do that? There’s no point in putting nonbinding future plans in here, is there?

7 Likes

Exactly. We really have no idea what a stdlib reorg would look like.

4 Likes

I agree with Barry and James on this. Since we don’t know what a std namespace might look like, anything said in this PEP wouldn’t be much more than an educated guess based on a design sketch. I don’t think that’s a good basis to design a specification.

Timing wise, I don’t think there’s a need to specify it anyway. A future PEP introducing std can always make whatever proposal it wishes. That seems like a much better time to look at the location of these modules as presumably there will be a fully fleshed out design in hand.

The rejected ideas section says:

a future PEP introducing a std namespace could always define that the compression sub-modules be flattened into the std namespace.

I think this allows for a future PEP to argue for flattening the namespace into e.g. std.zstd while not making that a requirement if we end up liking the compression package.

9 Likes

I missed this, but I’m happy with this. Thank you.

1 Like

I may as well add my thoughts on the color of this bikeshed:

  • Python should immediately reserve std, if it hasn’t already.
  • This PEP, which is titled “Adding Zstandard to the standard library”, should do what it says on the tin. It should not be concerned with introducing a compression module or making changes to any existing compression module that is unrelated to Zstandard. The module that implements Zstandard should be named std.zstd (or std.zst if people prefer that).
  • Moving forward, all new Python stdlib modules should be named std.$whatever. Python should adopt a policy to never introduce any more stdlib modules outside of std.
    • I agree with the view from @pitrou that “the nesting be kept to two levels, i.e. std.zstd, std.zlib, std.futures, etc. Namespaces should only be created to avoid conflicts, not for taxonomic purity”.
  • There should not be any rush to migrate existing stdlib modules into std. That should be thought about separately and proposed as a future PEP.
  • There should certainly not be a rushed decision to migrate existing stdlib modules into compression. If this happened, and then later it was decided to migrate the stdlib to std.* then Python will have had two large-scale name migrations in a short space of time, which isn’t useful for anyone.

There isn’t. This is part of a 10-year plan. If it doesn’t turn out to be successful, we can always take a different path.
It’s similar to my initial concern, but honestly, it’s not something we need to worry about right now—maybe in 10 years… or maybe not.

1 Like