NOOB needs help iterating spreadsheet

Noob looking for help iterating through an excel spreadsheet. I get the results I want by manually changing the row values from column A, B, and I. Please see my script below. Hoping to automate by having python check all rows in the columns. Help would be appreciated and thank you in advance!!

  • Regards Dale
from openpyxl import load_workbook

wb = load_workbook(r'C:\Users\admin\Documents\Spreadshet.xlsx')
ws1 = wb['Sheet1']  # Replace 'Sheet1' with the actual sheet name

# Example: Read values from column A, B, I and assign variables to each in Sheet1
id = ws1['A3'].value
name = ws1['B3'].value
url = ws1['I3'].value

#Get results from variable 'url'
import requests
from bs4 import BeautifulSoup
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
title = soup.title.string
truncated_title = title[:7]  # Truncate to the first 7 characters

print(id, name, truncated_title)

Are you familiar with writing loops in general?

Let’s assume for a moment that there is no built-in openpyxl functionality for this (I don’t use it, so I don’t know off-hand), and think about the problem logically.

So, we want to run a loop. The plan is that each time through the loop should handle one row of the data, correct?

Which rows do we need to handle? I assume you can think of a rule that gives you the row numbers. Do we know them in advance (a for loop), or do we keep trying and checking to see when we run out (a while loop)?

What steps does the code need to take for each row? That part of the code goes inside the loop.

If we calculate an id, name and url value at the start of the loop, is that enough information to do the rest of the work for that row? Or does anything else seem like it would need to change?

And then, we can get to the meat of it - actually calculating those values. When we write the loop, what information is going to change each time through the loop? We’ll have a row number, right? So, an integer, yes? Let’s say we are on row 5 of the loop (an integer value), and we have that number in a variable row. Given that, how would we create the string 'A5'? (The 'A' part of the string is constant.) If we do that in place of the 'A3' already in the code, then we should get the right id value for each row. Do you see why?

Finally, considering the sort of loop we chose - how do we get a row number that changes each time?

If you can answer all these questions, it should be obvious how to put the pieces together to solve the problem. If not, then you know more specifically what to ask about. Calling yourself a “noob” isn’t helpful - analyzing problems by breaking them down into pieces is.

Karl, I am grateful for your investment of time and your insightful reply - thank you!

Mind giving me a hand with this, Karl? I’m short on time and not sure how to proceed.

Which is the starting row? Is it row 3?

Which is the final row? Is it the last row of the sheet?

The last row of the sheet is given by ws1.max_row.

You can iterate over the rows like this:

for r in range(3, ws1.max_row + 1):
    id = ws1['A' + str(row)].value
    # etc.

Note that Python uses “open” ranges, which means that range, for example, needs to be given an ending value of ws1.max_row + 1 instead of ws1.max_row, otherwise it’ll ignore the final row.

Thanks Matthew. The starting row is 2 and last row is 1877. Would that make the range statement like this?

for r in range(2, ws1.1877 + 1):

No. ws1.1877 isn’t valid Python.

If the last row is 1877 then it would be:

for r in range(2, 1877 + 1):

or:

for r in range(2, 1878):

What is the value of ws1.max_row? If it’s 1877 then you can replace the 1877 with ws1.max_row, which will have the advantage that if, in the future, there’s a different number of rows in the sheet, you won’t have to modify the program to compensate.

It works sorta. It only prints the results for row 1653 (rows 1654 to 1877 were blank). I was expecting it to print results for each row checked. Here is the revised code:

from openpyxl import load_workbook

wb = load_workbook(r'C:\Users\admin\Documents\Sheet.xlsx')
ws1 = wb['Sheet1']  # Replace 'Sheet1' with the actual sheet name

# Example: Read values from column A, B, I and assign variables to each in Sheet1
for r in range(3, 1653 + 1):
    id = ws1['A' + str(r)].value
    name = ws1['B' + str(r)].value
    url = ws1['I' + str(r)].value

#Get results from variable 'url'
import requests
from bs4 import BeautifulSoup
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
title = soup.title.string
truncated_title = title[:7]  # Truncate to the first 7 characters

print(id, name, truncated_title)

That’s because of the indentation.

Indentation is an important part of Python’s syntax.

The for loop repeats only the 3 lines that are indented.

The following, unindented lines, are run afterwards.

That fixed it - thank you Matthew!!