Trouble importing libraries

You are likely not setting up logging to do anything.
Just use print for now.

Also main does not do anything with restaurants, i guess because you are still developing the code.

I don’t know how to move on. Any hints that you could offer?

Delete the calls to logging and use print.

If your code does not print then it must be stopping early.

Add a print at every step of the function to find where it breaks.

1 Like

I have added a print at every step of the function but the only print I get is the last one:

The script is running.
[]

This is the modified script with the prints:

import requests
import bs4
import sys

def get_restaurants(query):
    """Gets a list of restaurants from Google Maps.

    Args:
        query: The query to search for.

    Returns:
        A list of restaurant objects. Each restaurant object has the following properties:
            name: The name of the restaurant.
            address: The address of the restaurant.
            phone_number: The phone number of the restaurant.
            website: The website of the restaurant.
            rating: The rating of the restaurant.
            reviews: The number of reviews for the restaurant.
        """

    **url = "https://www.google.com/maps/search/mexican+restaurants+in+athens,+greece/@37.971871,23.7168781,12z/data=!3m1!4b1?entry=ttu".format(query)**
**    response = requests.get(url)**
**    soup = bs4.BeautifulSoup(response.content, "html.parser")**

**    restaurants = []**
**    for restaurant in soup.find_all("div", class_="section-result"):**
**        print("Found a restaurant")**
**        try:**
**            name = restaurant.find("div", class_="section-result__title").text.strip()**
**        except AttributeError:**
**            name = None**
**        try:**
**            address = restaurant.find("div", class_="section-result__address").text.strip()**
**        except AttributeError:**
**            address = None**
**        try:**
**            phone_number = restaurant.find("div", class_="section-result__phone").text.strip()**
**        except AttributeError:**
**            phone_number = None**
**        try:**
**            website = restaurant.find("div", class_="section-result__website").text.strip()**
**        except AttributeError:**
**            website = None**
**        try:**
**            rating = restaurant.find("div", class_="section-result__rating").text.strip()**
**        except AttributeError:**
**            rating = None**
**        try:**
**            reviews = restaurant.find("div", class_="section-result__reviews").text.strip()**
**        except AttributeError:**
**            reviews = None**

**        restaurant_info = {**
**            "name": name,**
**            "address": address,**
**            "phone_number": phone_number,**
**            "website": website,**
**            "rating": rating,**
**            "reviews": reviews,**
**        }**

**        print(restaurant_info)**

**        restaurants.append(restaurant_info)**

**    return restaurants**

**def main():**
**    """This is the main action of the script.**

**    It calls the `get_restaurants()` function to get a list of restaurants from Google Maps.**
**    It then prints the list of restaurants.**
**    """**

**    query = sys.argv[1]**
**    restaurants = get_restaurants(query)**

    print("The script is running.")
    **print(restaurants)**

if __name__ == "__main__":
    main()

I think that everything marked in the script is not running. Am I correct?

You are saying that find_all finds no elements on the HTML page?

What is query I don’t see it being substituted into the URL.

Have you printed out the HTML that is returned from the query?
I get a 302 redirect, are you following the redirect?
When you do and look at the HTML you will see that there are no

with class “section-result”.

Unless the Javascript for the page is run you will have no data to work with.

In which case you will need to concider using selenium to automate running a browser instance.

I would remove all the try: expect AttributeError: and replace with checking the results of the find and find_all function. Check for None being returned etc.

Ok, I am using Selenium and I am trying to export the output to a csv file. I have managed to open the browser to the Google Maps search and it then closes, as it should. The file is created with the header row, as stated, but I am not managing to get an output of the restaurants into the file, probably because I am not managing to get an output. I am not getting any error messages and I have debugged the script with none showing. Here is the modified script:

import requests
import bs4
from selenium import webdriver
import backports.csv as csv
import pathlib

def get_restaurants(query):
    """Gets a list of restaurants from Google Maps.

    Args:
        query: The query to search for.

    Returns:
        A list of restaurant objects. Each restaurant object has the following properties:
            name: The name of the restaurant.
            address: The address of the restaurant.
            phone_number: The phone number of the restaurant.
            website: The website of the restaurant.
            rating: The rating of the restaurant.
            reviews: The number of reviews for the restaurant.
        """

    browser = webdriver.Firefox()
    url = "https://www.google.com/maps/search/{}+restaurants+in+athens,+greece/@37.971871,23.7168781,12z/data=!3m1!4b1?entry=ttu".format(query)
    browser.get(url)

    soup = bs4.BeautifulSoup(browser.page_source, "html.parser")

    restaurants = []
    for restaurant in soup.find_all("div", class_="business-listing"):
        name = restaurant.find("div", class_="section-result__title").text.strip()
        address = restaurant.find("div", class_="section-result__address").text.strip()
        phone_number = restaurant.find("div", class_="section-result__phone").text.strip()
        website = restaurant.find("div", class_="section-result__website").text.strip()
        rating = restaurant.find("div", class_="section-result__rating").text.strip()
        reviews = restaurant.find("div", class_="section-result__reviews").text.strip()

        restaurant_info = Restaurant(
            name=name,
            address=address,
            phone_number=phone_number,
            website=website,
            rating=rating,
            reviews=reviews,
        )

        restaurants.append(restaurant_info)

    browser.close()

    return restaurants


def main():
    """This is the main action of the script.

    It calls the `get_restaurants()` function to get a list of restaurants from Google Maps.
    It then exports the list of restaurants to a CSV file.
    """

    query = "Mexican restaurants in Athens, Greece"
    restaurants = get_restaurants(query)

    # Create a CSV file.
    path = pathlib.Path("mexican_restaurants.csv")
    file = path.open("w", encoding="utf-8")
    writer = csv.writer(file)
    writer.writerow(["Name", "Address", "Phone Number", "Website", "Rating", "Reviews"])
    for restaurant in restaurants:
        restaurant_data = [restaurant.name, restaurant.address, restaurant.phone_number, restaurant.website, restaurant.rating, restaurant.reviews]
        writer.writerow(restaurant_data)


if __name__ == "__main__":
    main()

You need to wait for the javascript to render the html.
Usually I add calls to selenium to wait for an element to exist on the page.
The you can grab the page and parse it.

Debugging code requires that you test the assumptions in your code.
You cannot find the elements on the page. You assumed they are present.
Debug that assumption by printing out the page source and check if that is true,

See this about waiting Waiting Strategies | Selenium

I expect if you save the page into a file with your current implementation you will see that the page is lacking all the info you want to parse out of it.

I often browse interactively to a page i want to process and then inspect the page witht the browser debug tools to find an element to wait on.

I remembered that before a scraper can go onto the actual google maps search, there is a button that accepts terms that the user has to click on. I added the click and now the window opens and stays open, while gathering data. I have checked in the browser’s network activity that it is actually doing so. I am trying to limit the information so that images are not saved and also, to save the scraped data to the csv file, in case of premature closure of the browser window. I am working on it at this moment, so I don’t have a script to show you right now. I’ll post it as soon as I have it. I am very grateful for your help.