Python newbie need help with Syntax error

Hello,

I am a data analytics student and we’re looking at web crawlers/spiders. Instructor illustrated sample code to crawl wikipedia.com but I get the following error as shown in the attached image:

Can anyone help me with this?

TIA,
Rich

It’s missing a closing quote.

Hi.

You’ve mixed tabs and spaces for indentation.

I did this here:

I configured my editor to show unprintable characters. Do you notice this?

You should review your last or previous line (maybe others too).

Where, exactly, please?

My editor (Jupyter) does not allow that.
Moving my cursor to every space/character, I don’t see any evidence of an unprintable character.

Try:

  1. Erase the ‘;’ in line 11
  2. Erase everything below line 11 (or fix its indentation)

Actually, it’s missing not just a closing quote.

Look at line 11:

listOfLinks.append("\"{0}\ ->\"{1}\";.format(masterLinkList)

I think that should be:

listOfLinks.append("\"{0}\" -> \"{1}\"").format(masterLinkList))

I think it should be:

listOfLinks.append("\"{0}\" -> \"{1}\"".format(masterLinkList))

Ah, correct!

Thanks everyone for your suggestions, but so far, nothing is working.

Can anyone explain how proper indentation works in Python?
Also, does the end of each line with an If statement require a colon “:” character? Required with nested loops also?

Thank You!

You should not be crawling Wikipedia: Wikipedia:Database download - Wikipedia
Quote:

Please do not use a web crawler to download large numbers of articles. Aggressive crawling of the server can cause a dramatic slow-down of Wikipedia.

You should also not randomly be crawling any internet sites, unless you respect the site’s policies and robot.txt (see for instance: https://en.wikipedia.org/robots.txt and if you don’t know what that is, see: robots.txt - Wikipedia).

1 Like

I understand what you’re saying, but this is a simple, classroom assignment. This is not a serious effort to crawl, but only to demonstrate the process to us students.

I suggest you read this: Design and History FAQ — Python 3.13.3 documentation

If you want to go deeper: 2. Lexical analysis — Python 3.13.3 documentation

Yes, it does. The ‘:’ says to Python “I’ll start a nested block”

Maybe you need to study Python’s fundamentals before doing something bigger.
Look around the official tutorial.

I’m a firm believer in ethical classroom assignments though. Even if it’s “just an assignment”, it should be something that is legal, ethical, and not violating a site’s terms of service. All it would require is selecting a different site to scrape.

2 Likes

Some sites were built this purpose: https://www.scrapethissite.com/

3 Likes

Did not know about that, that’s perfect. I don’t know why some educators seem to think that the only way to teach these techniques is to use a famous site like Wikipedia.

1 Like