I have a script that uses a new built-in package in Python 3.14:
from pathlib import Path
import tarfile
# List of NDJSON files you want to compress
fileToCompress = [r"E:\Personal Projects\tmp\tarFiles\chunk_0.ndjson",
r"E:\Personal Projects\tmp\tarFiles\chunk_0.ndjson",
r"E:\Personal Projects\tmp\tarFiles\chunk_0.ndjson"]
fileToCompress = [Path(p) for p in fileToCompress]
# Output archive name
outputArchive = r"E:\Personal Projects\tmp\tarFiles\dataset.tar.zst"
outputArchive = Path(outputArchive)
# Create a .tar.zst archive without including folder paths
with tarfile.open(outputArchive, "w:zst") as tar:
for file in fileToCompress:
# Use only the filename (no directories) inside the archive
tar.add(file, arcname = file.name)
print(f"Created archive: {outputArchive}")
A big benefit of Zstandard is its built-in parallel compression. You can see here for how to set the number of threads in compression.zstd.open(zstdDir, 'wb') as g.
Can we configure the number of threads in with tarfile.open(outputArchive, "w:zst") as tar?