I compress two ndjson files into a multiframe ZST file where each ndjson is compressed into a frame. I have the following metadata meta_data (as a list) of the ZST file:
import zstandard as zstd
from pathlib import Path
input_file = r"E:\Personal projects\tmp\test.zst"
input_file = Path(output_file)
meta_data = [{'name' : 'chunk_0.ndjson',
'uncompressed_size' : 2147473321,
'compressed_offset' : 0,
'uncompressed_offset' : 0,
'compressed_size' : 175631248},
{'name' : 'chunk_1.ndjson',
'uncompressed_size' : 2147473321,
'compressed_offset' : 175631248,
'uncompressed_offset' : 2147473321,
'compressed_size' : 175631248}]
In Python, how can we leverage the above meta_data to seek to chunk_1.ndjson, start decompressing, and stream it line-by-line? In this way, we don’t need to
- decompress
chunk_0.ndjson, - load the whole compressed
chunk_1.ndjsoninto the memory.
Thank your for your help.