I have PDF files that I can’t open. I’ve tried numerous libraries but have yet to find success. Previously, I was able to open PDF files, but not these PDF files. I believe these files contain the esignature, which is why they cannot be read.
Please assist me in opening these files.
I understand that there are a lot of different varieties of PDF file. It is not surprising to find a set of PDF files which don’t work using techniques that worked for previous PDF files.
As far as I know, the Python standard library does not have any modules for operating on PDF files as data. Thus you will have to use some third-party module which is able to operate on your PDF files. You can search PyPI for “PDF” and look at the various packages which it suggests.
Does anyone reading this thread know of a Python library for operating on PDF files which can handle advanced topics like protected and signed PDF files?
Are you able to open these PDF files with non-Python PDF applications, such as Adobe Acrobat or GhostScript? That might help you understand which PDF features you need support for in the Python module which you choose.
I’m sorry I can’t give you a direct answer. Hopefully this might help move your investigation forward a step or two.
Do you want to open these PDF files using Python code? Why is it important that you open them with Python? There are PDF tools that do not involve Python.
Can you use Acrobat to open these files, and save them without digital signatures or protection? Can you do this to one or two files, to figure out a method? Can you do this to all of the files, to remove the parts of the PDF file which your Python module cannot work with?
Because I want to extract the data, and save it to an excel file.
I am using PyPDF2 to read the file, but when I try to print the text, nothing appears. I’m having the same problem with Colab and Vscode.