Hi there,
It is first time for me here and I am completely new to python. I use ICD 10 codes from WHO website for analyzing national data that has a variable containing ICD 10 diagnosis code. One of the verification we initially do is to check for any wrong codes. To do so, we need the updated codes from the website (update occurs yearly). I tried and spent hours but gave up. I hope someone can help me with that. The website is as browser and it has dynamic nature which makes it difficult to scrap the codes from each chapter and sub chapter. This what I came up with but it suddenly stopped and looping again for the same section and codes and not moving to the next one.
`from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
Set up WebDriver
driver = webdriver.Chrome()
driver.implicitly_wait(10) # Adding an implicit wait
driver.get(“ICD-10 Version:2019”)
Wait for the main chapters to load
WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CLASS_NAME, “ygtvitem”))
)
Find all chapters under the given class and id
chapters = driver.find_elements(By.CSS_SELECTOR, “#ygtv1 .ygtvitem”)
for i in range(len(chapters)):
try:
# Re-find chapters
chapters = driver.find_elements(By.CSS_SELECTOR, “#ygtv1 .ygtvitem”)
chapter = chapters[i]
# Click on each chapter to expand it
chapter.click()
time.sleep(5) # Increasing the wait time to ensure the sub-items load
print(f"Clicked on chapter: {chapter.text}")
# Find all sub-items
sub_items = chapter.find_elements(By.CSS_SELECTOR, ".ygtvitem a")
for j in range(len(sub_items)):
try:
# Re-find sub-items
sub_items = chapter.find_elements(By.CSS_SELECTOR, ".ygtvitem a")
sub_item = sub_items[j]
sub_item.click()
time.sleep(5) # Increasing the wait time to ensure the content loads
print(f"Clicked on sub-item: {sub_item.text}")
# Extract text under the class "code"
codes = driver.find_elements(By.CLASS_NAME, "code")
for code in codes:
print(f"Code text: {code.text}") # Print or save the extracted text
# Go back to the main page
driver.back()
WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CLASS_NAME, "ygtvitem"))
)
time.sleep(2) # Adding a short wait to ensure the main page loads
except Exception as e:
print(f"Error with sub-item: {e}")
except Exception as e:
print(f"Error with chapter: {e}")
Close the driver
driver.quit()
`