Need: name 'df' is not defined

fihriali · April 13, 2023, 1:14am

please I need help, I run this code:

from selenium import webdriver
from selenium.webdriver.common.by import By
from time import sleep
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait
import time
import pandas as pd
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
import csv

driver = webdriver.Chrome()

driver.get('https://ae.godaddy.com/domain-value-appraisal/appraisal/?domainToCheck=carsdriver')
driver.implicitly_wait(10)
driver.maximize_window()
r = 1
templist = []

while(1):
    try:
        domain1 = driver.find_element(By.XPATH, '/html/body/div[3]/div/div/div[4]/div[1]/div/div/div/div[2]/div/div/ul/div[1]/li/table/tbody/tr[1]').text()
        domain2 = driver.find_element(By.XPATH, '/html/body/div[3]/div/div/div[4]/div[1]/div/div/div/div[2]/div/div/ul/div[1]/li/table/tbody/tr[2]').text()
        domain3 = driver.find_element(By.XPATH, '/html/body/div[3]/div/div/div[4]/div[1]/div/div/div/div[2]/div/div/ul/div[1]/li/table/tbody/tr[3]').text()

        Table_dict = {'domain1': domain1,
                      'domain1': domain2,
                      'domain1': domain3}
        templist.append(Table_dict)
        df = pd.DataFrame(templist)
 
        r = 1


        # if there are no more table data to scrape
    except NoSuchElementException:
        break
# saving the dataframe to a csv
df.to_csv('table.csv')
driver.close()

the results: name ‘df’ is not defined
I’m confused where the error came from.
Thanks,

NeilGirdhar · April 13, 2023, 1:17am

Try this:

ali:

from selenium import webdriver
from selenium.webdriver.common.by import By
from time import sleep
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait
import time
import pandas as pd
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
import csv

driver = webdriver.Chrome()

driver.get('https://ae.godaddy.com/domain-value-appraisal/appraisal/?domainToCheck=carsdriver')
driver.implicitly_wait(10)
driver.maximize_window()
r = 1
templist = []
while(1):
    try:
        print("a")
        domain1 = driver.find_element(By.XPATH, '/html/body/div[3]/div/div/div[4]/div[1]/div/div/div/div[2]/div/div/ul/div[1]/li/table/tbody/tr[1]').text()
        print("b")
        domain2 = driver.find_element(By.XPATH, '/html/body/div[3]/div/div/div[4]/div[1]/div/div/div/div[2]/div/div/ul/div[1]/li/table/tbody/tr[2]').text()
        print("c")
        domain3 = driver.find_element(By.XPATH, '/html/body/div[3]/div/div/div[4]/div[1]/div/div/div/div[2]/div/div/ul/div[1]/li/table/tbody/tr[3]').text()
        print("d")

        Table_dict = {'domain1': domain1,
                      'domain1': domain2,
                      'domain1': domain3}
        templist.append(Table_dict)
        print("e")
        df = pd.DataFrame(templist)
        print("f")
 
        r = 1


        # if there are no more table data to scrape
    except NoSuchElementException:
        print("exception")
        break
print("g")
# saving the dataframe to a csv
df.to_csv('table.csv')
driver.close()

This is a good way to figure out for yourself why it’s not working

fihriali · April 13, 2023, 2:22am

Thanks, but same problem.
image_2023-04-13_022024253
Could the website cause this error, or is it just the code?

rob42 · April 13, 2023, 3:59am

If the try/except fails, then df = pd.DataFrame(templist) will not get executed, which means df.to_csv('table.csv') will throw a NameError

Also, I don’t understand…

Table_dict = {'domain1': domain1,
              'domain1': domain2,
              'domain1': domain3
              }

… because Table_dict is only ever going to have one key value pair: 'domain1': domain3 – all three keys are the same.

That’s not the only things that I see as odd about your code, but it may put you back on track.

Maybe initialize the DataFrame with df = pd.DataFrame({'nd': ['No data']}) so that even if the try/except does fail, then (least ways) you’ll not get a NameError

fihriali · April 13, 2023, 4:43am

Thanks, I changed that:

Table_dict = {'domain1': domain1,
              'domain2': domain2,
              'domain3': domain3
              }

To put you in the picture, the table I want to extract from: https://ae.godaddy.com/domain-value-appraisal/appraisal/?domainToCheck=carsdriver, is
image_2023-04-13_043226148
there is three inside it, that’s why I used domain1, domain2, domain3.
Is there another way to extract this table? Thanks

kknechtel · April 13, 2023, 4:49am

Yes, because you were not shown code to solve the problem. You were shown code to test and understand the problem. Notice in the output: it shows you the a result, but not the b result. Instead the exception is thrown. That means, somewhere between the a print and the b print, something must have gone wrong to cause the exception.

That code is:

domain1 = driver.find_element(By.XPATH, '/html/body/div[3]/div/div/div[4]/div[1]/div/div/div/div[2]/div/div/ul/div[1]/li/table/tbody/tr[1]').text()

What do you suppose could go wrong here? (Hint: What if that element is not in the page - what should the .text() result be? Where would it come from?)

If that happens, do you understand why df does not get defined? (Hint: where in the code does df get defined? Did it happen yet?)

rob42 · April 13, 2023, 5:14am

Ah, yes; the context does help.

I’ve not tried to do what you’re trying to do, but my feeling is that sites such as the one you’re trying a ‘scrape’, are going to have some kind of defense so that it’s not so easy to do as would be the case for a site that has good old html tables, for which the likes of pd.read_html() could be deployed.

Maybe someone with better knowledge about the topic can help.

NeilGirdhar · April 13, 2023, 7:34am

I know it’s the same problem, but hopefully I gave you enough information to figure it out