Why my code does not iterate over all the pages of a website(web scraping)?

beginner999 · December 31, 2023, 1:38pm

i want to web scrap the “search:pc” part of the website called jumia. i wanted to iterate over all the pages ,but unfortunetly it didn’t work , i don’t why it overwrites the file while it is outside theloop. and by using:

with pd.ExcelWriter("output.xlsx", engine="openpyxl", mode="a", if_sheet_exists="replace") as writer:
        pop.to_excel(writer, sheet_name="sheet1"

instead of:

with open(f"output.xlsx" ,"a") :
    with pd.ExcelWriter("output.xlsx") as writer:
        pop.to_excel(writer,sheet_name="sheet2")

but it results in an error:

File "c:\Users\hp\Desktop\python_projects\test3.py", line 40, in <module>
    find_computers()
  File "c:\Users\hp\Desktop\python_projects\test3.py", line 33, in find_computers
    with pd.ExcelWriter("output.xlsx", engine="openpyxl", mode="a", if_sheet_exists="replace") as writer:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\hp\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\io\excel\_openpyxl.py", line 61, in __init__
    super().__init__(
  File "C:\Users\hp\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\io\excel\_base.py", line 1263, in __init__  
    self._handles = get_handle(
                    ^^^^^^^^^^^
  File "C:\Users\hp\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\io\common.py", line 872, in get_handle
    handle = open(handle, ioargs.mode)
             ^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'output.xlsx'

this is my actual code:

import numpy as np
import pandas as pd
from bs4 import BeautifulSoup
import requests
import time
import openpyxl
import os
from bs4 import Tag


def find_computers():
  n=1
  while n<=50:
    html_text=requests.get(f"https://www.jumia.ma/catalog/?q=pc&page={n}#catalog-listing").text
    soup=BeautifulSoup(html_text,"lxml")
    computers=soup.find_all("a",class_="core")
    df={"price": [],"original price": [],"promo":[]}
    computer_name_list=[]
    for computer in computers:
        computer_name=computer.find("h3",class_="name").text.strip()
        price=computer.find("div",class_="prc").text.strip()
        original_price_element=computer.find("div",class_="old")
        original_price=original_price_element.text.strip() if isinstance(original_price_element, Tag) else "N/A"
        promo_element = computer.find("div", class_="bdg _dsct _sm")
        promo = promo_element.text.strip() if isinstance(promo_element, Tag) else "N/A"
        df["price"].append(price)
        df["original price"].append(original_price)
        df["promo"].append(promo)
        computer_name_list.append(computer_name)
    n+=1
  pop=pd.DataFrame(df,index=computer_name_list)
  pd.set_option('colheader_justify', 'center')
  with pd.ExcelWriter("output.xlsx") as writer:
      pop.to_excel(writer,sheet_name="sheet2")
  


if __name__=="__main__":
  while True:
        find_computers()
        time_s = 10
        time.sleep(6 * time_s)

I will be thankful if someone can guide me.

barry-scott · December 31, 2023, 4:16pm

This error is why the code does not work.

The error is explaining that output.xlsx cannot be found.

If you do have a output.xlsx file then when the python program runs it is not in the same current directory and therefore cannot find it. In that case you can cd to the folder that has the file or use the full path to the file in your code.

If the file does not exist then I assume that you need to create it in the right place.

Topic		Replies	Views
Dynamic web scrapper Python Help help	13	252	February 12, 2024
Complete newbie, that simply hasn't a clue needs pointing in the right direction Python Help help	7	331	February 2, 2023
How to iterate through all the files in the directory Python Help	15	10393	November 20, 2022
Loop over a file starting at a specific location write that line plus all lines until another condition is met Python Help help	0	1052	June 15, 2021
Python webscraping : table in the same url, but in multiple "pages" Python Help	16	683	January 18, 2024

Why my code does not iterate over all the pages of a website(web scraping)?

Related Topics