Struggle with Python Code

Dear Python Forum,

I hope this message finds you well. My name is Sateesh, and I am a rheumatologist by profession. In our private practice, we are facing a challenge with our Electronic Medical Record (EMR) system, which is not integrating properly with radiology labs for e-request forms.

Currently, I can obtain patient demographics from PDF files, and I have written Python code to extract this information. While I am able to retrieve the patient’s address using text extraction, I am encountering difficulties when trying to extract the Name, Date of Birth (DOB), Address, and Medicare Number—specifically with obtaining the address.

I am using PyPDF2 and Pdfminer.six for this task and would greatly appreciate any assistance you could offer to help resolve the address extraction issue.

Thank you in advance for your support.

code:
pdf_path= r”/Users/xxx/Desktop/xxx.pdf

def extract_patient_info(pdf_path):
try:
text = extract_text(pdf_path)

    # Compile regex patterns
    name_pattern = re.compile(r'Name:\s*(.*)')
    dob_pattern = re.compile(r'DOB:\s*(\d{2}/\d{2}/\d{4})')
    address_pattern = re.compile(r'Address:\s*([^]+(?:[ ]+)*)')
    medicare_pattern = re.compile(r'Medicare Number:\s*(\d{10})')

    # Search for patterns
    name_match = name_pattern.search(text)
    dob_match = dob_pattern.search(text)
    address_match = address_pattern.search(text)
    medicare_match = medicare_pattern.search(text)

    # Extract data
    name = name_match.group(1).strip() if name_match else None
    dob = dob_match.group(1).strip() if dob_match else None
    address = address_match.group(1).strip() if address_match else None
    medicare_number = medicare_match.group(1).strip() if medicare_match else None
    
    return {
        'Name': name,
        'DOB': dob,
        'Address': address,
        'Medicare Number': medicare_number
    }
except Exception as e:
    print(f"An error occurred: {e}")
    return None

Example usage

patient_info = extract_patient_info(pdf_path)
print(patient_info)

Warm regards,

Sateesh

The address regular expression is ill-formed as the ‘^’ between brackets ‘[…]’ negates the match. Don’t know what you try to match exactly, but this isn’t it.

You may have missed because the unqualified except hides that.

Thanks heaps

1 Like