Need help regarding a selenium web scraping

Om29 · October 28, 2023, 12:13pm

(Getting an thread error in a script built for scraping comapny name from LinkedIn profiles)

In this code I’m using an Xpath extracted from the inspect option of the LinkedIn profile , the Xpath is taken from the element code of the company’s location which is available in the experience section on the person’s profile.
The goal here is to extract the company name from the profile for which i need to copy the exact full Xpath from the inspect option and then paste it into the code itself.
The goal is to create a common Xpath that will work for every profile and i won’t need to change the Xpath manually everytime.
The current code is in working state as I’m putting the Xpath manually but I need to make it automated by putting an common Xpath for every profile.

Here’s the code and the error:

#Code:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

def auto_login(username, password, profile_url, company_xpath):
try:
driver = webdriver.Chrome()
driver.get(“LinkedIn Login, Sign in | LinkedIn”)

    username_input = driver.find_element(By.ID, 'username')
    password_input = driver.find_element(By.ID, `'password')`

    username_input.send_keys(username)
    password_input.send_keys(password)

    driver.find_element(By.CSS_SELECTOR, 'button[type="submit"]').click()
    wait = WebDriverWait(driver, 80)
    wait.until(EC.url_contains("https://www.linkedin.com/feed/"))

    driver.get(profile_url)

    wait.until(EC.visibility_of_element_located((By.XPATH, company_xpath)))
    company_element = driver.find_element(By.XPATH, company_xpath)
    company_name = company_element.text.strip()

    driver.quit()
    return company_name
except Exception as e:
    print("An error occurred:", e)
    return None

username = ‘mail’
password = ‘pass’

profile_url = ‘Jayram Waghmare - Data Analyst - EarlySalary- Instant Salary Advance for Employees | LinkedIn’

company_xpath = ‘/html/body/div[5]/div[3]/div/div/div[2]/div/div/main/section[4]/div[3]/ul/li[1]/div/div[2]/div/div[1]/span[1]/span[1]’

company_name = auto_login(username, password, profile_url, company_xpath)

if company_name is not None:
print(“Company Name:”, company_name)

#Error:
(This error occurs when i try to use a common Xpath)

An error occurred: Message:
Stacktrace:
GetHandleVerifier [0x00007FF764C97D12+55474]
(No symbol) [0x00007FF764C077C2]
(No symbol) [0x00007FF764ABE0EB]
(No symbol) [0x00007FF764AFEBAC]
(No symbol) [0x00007FF764AFED2C]
(No symbol) [0x00007FF764B39F77]
(No symbol) [0x00007FF764B1F19F]
(No symbol) [0x00007FF764B37EF2]
(No symbol) [0x00007FF764B1EF33]
(No symbol) [0x00007FF764AF3D41]
(No symbol) [0x00007FF764AF4F84]
GetHandleVerifier [0x00007FF764FFB762+3609346]
GetHandleVerifier [0x00007FF765051A80+3962400]
GetHandleVerifier [0x00007FF765049F0F+3930799]
GetHandleVerifier [0x00007FF764D33CA6+694342]
(No symbol) [0x00007FF764C12218]
(No symbol) [0x00007FF764C0E484]
(No symbol) [0x00007FF764C0E5B2]
(No symbol) [0x00007FF764BFEE13]
BaseThreadInitThunk [0x00007FFD0D587344+20]
RtlUserThreadStart [0x00007FFD0DA826B1+33]

Process finished with exit code 0

kyle · October 28, 2023, 1:21pm

Hey there are you sure you can automate checking every profile… the last time I checked if you tried to check a profile you must be logged in first… which means for every profile you will have to be logged in to view a profile… Have you managed automating the login process tho?
also check this No.13 LInk

nmstoker · October 28, 2023, 3:01pm

Have you been able to confirm you can definitely scrape content via your selenium setup for something on a simple site?

This would help you work out whether your setup is fundamentally broken or if it’s something specific to what you’re trying on LinkedIn

Om29 · October 30, 2023, 2:55pm

Yes i have managed to automate the login process by putting the credentials in the code itself

Om29 · October 30, 2023, 2:58pm

Yes, we can scrape from LinkedIn, before extracting the compnay name we had extracted some general details like name/ headline/ connections and locations. We were successful in extracting these details but we are facing some issue in extracting the company name

nmstoker · October 30, 2023, 4:58pm

Seems like it would help to know which statement it’s going wrong at.

Have you got any ideas how you could work that out?

Om29 · November 1, 2023, 10:42am

The problem is we have to get a different xpath for every profile differently.
Our aim is to find a commom xpath or define a common element that will always call the company’s location and locate the company name

nmstoker · November 1, 2023, 11:00am

I still think you should figure out which precise statement is crashing: your traceback isn’t that useful

And then it sounds like you need to work on less specific xpaths (Google / read up on how they work, that’s not a Python thing really)

Good luck

Om29 · November 1, 2023, 1:06pm

Ok sure, appreciate the feedback!