PDF File reader

rob42 · July 7, 2023, 1:13am

I’d use https://anonfiles.com/ and then send me the link.

PM it, if you rather not share it with all comers.

cadrenw · July 7, 2023, 1:16am

anonfiles is not available

rob42 · July 7, 2023, 1:17am

Google drive? Dropbox?

rob42 · July 7, 2023, 1:23am

Thank you.

As I suspected, that PDF does not contain any text; it’s an image of the text.

You may be able to use some OCR software on it, but that’s not something that I cam help you with.

cadrenw · July 7, 2023, 1:24am

ok thanks anyway !

cadrenw · July 7, 2023, 2:28am

import pytesseract
from pdf2image import convert_from_path

def extract_text_from_scanned_pdf(pdf_path):
    # Convert each page of the PDF to images
    images = convert_from_path(pdf_path)

    extracted_text = ''
    for image in images:
        # Perform OCR on each image
        text = pytesseract.image_to_string(image, lang='eng')
        extracted_text += text

    return extracted_text

def search_mix_design_number(text, desired_mix_design_number):
    # Search for the desired mix design number in the extracted text
    if desired_mix_design_number in text:
        return True
    else:
        return False

# Example usage
pdf_path = 'C:/Users/Operator/Onedrive/Desktop/Batch Reports/New Folder/1.pdf'  # Replace with the path to your PDF file
desired_mix_design_number = 'Mix Design Number 50'  # Replace with the desired mix design number

# Extract text from the scanned PDF
text = extract_text_from_scanned_pdf(pdf_path)

# Search for the mix design number
mix_design_found = search_mix_design_number(text, desired_mix_design_number)

# Print the result
if mix_design_found:
    print("Mix Design Number found in the extracted text")
else:
    print("Mix Design Number not found in the extracted text")

this works

rob42 · July 8, 2023, 2:53pm

Thank you for sharing the script.

I’ve not tried it myself, but it’s good to know that it’s here if need it.

Topic		Replies	Views
PDF Extraction with python wrappers Python Help	38	7197	January 15, 2024
Not able to read the pdf files Python Help	5	3004	September 12, 2022
To Get Font size of the text Python Help	1	4146	July 16, 2020
Extracting XML from PDFs Python Help help	3	1737	October 5, 2021
Convert PDF into TXT Python Help help	8	4333	April 12, 2023

PDF File reader

Related Topics