Python 3.14: how to generate multiframe ZSTD file?

I would like to compress multiple ndjson files into ZSTD file. For random seeking later on, I would like that each file is compressed independently as a frame.

  1. With the Python binding python-zstandard, I can use its copy_stream:
import zstandard as zstd
from pathlib import Path

file_to_compress = [r"E:\Personal Projects\tmp\chunk_0.ndjson",
                    r"E:\Personal Projects\tmp\chunk_0.ndjson"]
file_to_compress = [Path(p) for p in file_to_compress]

output_file = r"E:\Personal Projects\tmp\dataset.zst"
output_file = Path(output_file)

cctx = zstd.ZstdCompressor(write_content_size=True,
                           write_checksum=True,
                           threads=5)

with open(output_file, "wb") as f_out:
    for src in file_to_compress:
        file_size = src.stat().st_size
        with open(src, "rb") as fin:
            cctx.copy_stream(fin, f_out, size=file_size)
  1. With the standard library compression.zstd,
from compression import zstd
from pathlib import Path

file_to_compress = [r"E:\Personal Projects\tmp\chunk_0.ndjson",
                    r"E:\Personal Projects\tmp\chunk_0.ndjson"]
file_to_compress = [Path(p) for p in file_to_compress]

output_file = r"E:\Personal Projects\tmp\dataset.zst"
output_file = Path(output_file)

options = {
    zstd.CompressionParameter.nb_workers: 10,
    zstd.CompressionParameter.content_size_flag: True,
    zstd.CompressionParameter.checksum_flag: True,
}

cctx = zstd.ZstdCompressor(options=options)

with open(output_file, "wb") as f_out:
    for src in file_to_compress:
        file_size = src.stat().st_size
        data = src.read_bytes()
        cctx.set_pledged_input_size(file_size)
        compressed_data = cctx.compress(data, mode=zstd.ZstdCompressor.FLUSH_FRAME)
        f_out.write(compressed_data)

One problem remains: in the second snippet with compression.zstd, the whole file is loaded into memory, possibly causing out-of-memory (OOM) problem.

How can we avoid OOM problem?

Hi @emmatyping, could you have a look at this question? Thank you for your help.