from selenium import webdriver
from selenium.webdriver.common.by import By
from time import sleep
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait
import time
import pandas as pd
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
import csv
driver = webdriver.Chrome()
driver.get('https://ae.godaddy.com/domain-value-appraisal/appraisal/?domainToCheck=carsdriver')
driver.implicitly_wait(10)
driver.maximize_window()
r = 1
templist = []
while(1):
try:
domain1 = driver.find_element(By.XPATH, '/html/body/div[3]/div/div/div[4]/div[1]/div/div/div/div[2]/div/div/ul/div[1]/li/table/tbody/tr[1]').text()
domain2 = driver.find_element(By.XPATH, '/html/body/div[3]/div/div/div[4]/div[1]/div/div/div/div[2]/div/div/ul/div[1]/li/table/tbody/tr[2]').text()
domain3 = driver.find_element(By.XPATH, '/html/body/div[3]/div/div/div[4]/div[1]/div/div/div/div[2]/div/div/ul/div[1]/li/table/tbody/tr[3]').text()
Table_dict = {'domain1': domain1,
'domain1': domain2,
'domain1': domain3}
templist.append(Table_dict)
df = pd.DataFrame(templist)
r = 1
# if there are no more table data to scrape
except NoSuchElementException:
break
# saving the dataframe to a csv
df.to_csv('table.csv')
driver.close()
the results: name ‘df’ is not defined
I’m confused where the error came from.
Thanks,
… because Table_dict is only ever going to have one key value pair: 'domain1': domain3 – all three keys are the same.
That’s not the only things that I see as odd about your code, but it may put you back on track.
Maybe initialize the DataFrame with df = pd.DataFrame({'nd': ['No data']}) so that even if the try/except does fail, then (least ways) you’ll not get a NameError
Yes, because you were not shown code to solve the problem. You were shown code to test and understand the problem. Notice in the output: it shows you the a result, but not the b result. Instead the exception is thrown. That means, somewhere between the a print and the b print, something must have gone wrong to cause the exception.
What do you suppose could go wrong here? (Hint: What if that element is not in the page - what should the .text() result be? Where would it come from?)
If that happens, do you understand why df does not get defined? (Hint: where in the code does df get defined? Did it happen yet?)
I’ve not tried to do what you’re trying to do, but my feeling is that sites such as the one you’re trying a ‘scrape’, are going to have some kind of defense so that it’s not so easy to do as would be the case for a site that has good old html tables, for which the likes of pd.read_html() could be deployed.
Maybe someone with better knowledge about the topic can help.