Dear Python Forum,
I hope this message finds you well. My name is Sateesh, and I am a rheumatologist by profession. In our private practice, we are facing a challenge with our Electronic Medical Record (EMR) system, which is not integrating properly with radiology labs for e-request forms.
Currently, I can obtain patient demographics from PDF files, and I have written Python code to extract this information. While I am able to retrieve the patient’s address using text extraction, I am encountering difficulties when trying to extract the Name, Date of Birth (DOB), Address, and Medicare Number—specifically with obtaining the address.
I am using PyPDF2 and Pdfminer.six for this task and would greatly appreciate any assistance you could offer to help resolve the address extraction issue.
Thank you in advance for your support.
code:
pdf_path= r”/Users/xxx/Desktop/xxx.pdf
def extract_patient_info(pdf_path):
try:
text = extract_text(pdf_path)
# Compile regex patterns
name_pattern = re.compile(r'Name:\s*(.*)')
dob_pattern = re.compile(r'DOB:\s*(\d{2}/\d{2}/\d{4})')
address_pattern = re.compile(r'Address:\s*([^]+(?:[ ]+)*)')
medicare_pattern = re.compile(r'Medicare Number:\s*(\d{10})')
# Search for patterns
name_match = name_pattern.search(text)
dob_match = dob_pattern.search(text)
address_match = address_pattern.search(text)
medicare_match = medicare_pattern.search(text)
# Extract data
name = name_match.group(1).strip() if name_match else None
dob = dob_match.group(1).strip() if dob_match else None
address = address_match.group(1).strip() if address_match else None
medicare_number = medicare_match.group(1).strip() if medicare_match else None
return {
'Name': name,
'DOB': dob,
'Address': address,
'Medicare Number': medicare_number
}
except Exception as e:
print(f"An error occurred: {e}")
return None
Example usage
patient_info = extract_patient_info(pdf_path)
print(patient_info)
Warm regards,
Sateesh