Corrupt input data when using lzma to decompress a file

Dear group, I am trying to use the built-in lzma module to decompress a .lzma file, but it failed with the following errors:

_lzma.LZMAError: Corrupt input data

The .lzma file was compressed using the public-domain C library written by Igor Pavlov, see

before compression, the binary buffer has a length of 1966104 bytes, after compression, the file, mat.lzma (can be downloaded from this link) has a length of 1536957 bytes.

when running file mat.lzma, it prints

mat.lzma: LZMA compressed data, non-streamed, size 1966104

I was able to decompress this file using either the C library mentioned above, or using the below NodeJS/JavaScript script (with either lzma-purejs or lzma npm modules)

const fs = require('fs')
const lzma = require('lzma-purejs')

async function main() {
  var data=lzma.decompressFile(fs.readFileSync('mat.lzma'));
  console.log(data.length)
}
main().then(() => console.log('Done'))

the above script corrected decoded the buffer:

$ node testlzma.js 
1966104
Done

however, using the below python script, I got an error

import lzma

filename='mat.lzma'
buf=lzma.open(filename,  format=lzma.FORMAT_ALONE).read();

print(len(buf))

error message:

$ python3 testlzma.py
Traceback (most recent call last):
  File "testlzma.py", line 4, in <module>
    buf=lzma.open(filename,  format=lzma.FORMAT_ALONE).read();
  File "/usr/lib/python3.6/lzma.py", line 200, in read
    return self._buffer.read(size)
  File "/usr/lib/python3.6/_compression.py", line 103, in read
    data = self._decompressor.decompress(rawblock, size)
_lzma.LZMAError: Corrupt input data

Because Igor Pavlov’s C library implements the original lzma algorithm, so I believe the FORMAT_ALONE flag was used correctly.

can someone take a look at this and let me know whether I used lzma module correctly or my file has some issues?

thanks

interestingly, the test file mat.lzma can be correctly decompressed using lzma -d on Ubuntu 20.04, but it gives an error on Ubuntu 18.04.

The two lzma commands are acutually different

fangq@ubuntu20_04:~$ lzma --version
LZMA command line tool 9.22
LZMA SDK 9.22

fangq@ubuntu20_04:~$ lzma -v -d mat.lzma
mat.lzma:	 21.83% -- replaced with mat
fangq@ubuntu18_04:~$ lzma --version
xz (XZ Utils) 5.2.2
liblzma 5.2.2

fangq@ubuntu18_04$ lzma -v -d mat.lzma
mat.lzma (1/1)
 99.9 %   1,500.9 KiB / 1,920.0 KiB = 0.782                                    
lzma: mat.lzma: Compressed data is corrupt
 99.9 %   1,500.9 KiB / 1,920.0 KiB = 0.782     

I’m not sure if this will help, but I have some notes about lzma which include this sample code:

#!/usr/bin/python3

import lzma

data_in=b'This is only a test'


file=lzma.LZMAFile("test.lzma", mode="wb")
file.write(data_in)
file.close()


file=lzma.LZMAFile("test.lzma", mode="rb")
data_out=file.read()
print(data_out)
file.close()

Seems to work just fine. However I do get the same error as you get when I try this with your file, so maybe the issue is with how your file has been compressed?

If you can decompress your test file with other lzma tools, but not with Python, you should report it to the Python bug tracker.

thanks, submitted a bug report at Corrupt input data when using lzma to decompress a file · Issue #92018 · python/cpython · GitHub

I am not sure if the lzma module in Python is a binding for liblzma, I also submitted a bug report to xz utils, where liblzma is provided