Invalid syntax in Windows VS

Hi there, I am a python beginner. I wrote a few lines in visual studio by watching YouTube videos. But nothing has worked. Here are the lines of codes:

import snscrape.modules.twitter as sntw
import pandas as pd

tweet_data =

for i, tweet in enumerate (sntw.TwitterSearchScraper.(‘COVID-19 deaths since:2019-01-01 until:2022-08-31’).get_items()):
for i > 1000
break

tweet_data.append([tweets.date, tweets.content, tweets.user.name, tweets.url])

df = pd.dataframe(tweet_data, columns = [“Date”,“Tweets”,“Username”,“URL”])

By Tom Chen via Discussions on Python.org at 15Sep2022 03:30:

Hi there, I am a python beginner. I wrote a few lines in visual studio
by watching YouTube videos. But nothing has worked. Here are the lines
of codes:

A note for the future: enclose code (and programme output and errors) in
triple backticks, like this:

 ```
 your code here
 ```

This preserves the indentation and punctuation.

Anyway, to your code:

 import snscrape.modules.twitter as sntw
 import pandas as pd
 tweet_data = []

 for i, tweet in enumerate (sntw.TwitterSearchScraper.('COVID-19 deaths since:2019-01-01 until:2022-08-31').get_items()):

So far so good. This fetches tweets and enumerates them in pairs with an
index named i, eg (0,tweet), (1,tweet2), etc.

This is likely where your problem occurred:

 for i > 1000
 break

I expect that this should look like:

 if i > 1000:
     break

Compount statements like for and if end with a colon, as in the
for-loop above. You want an if-statement here, and the code inside the
if-statement is what should happen if the condition is true. The
if-statement above becomes true when i>1000 i.e. after the first 1000
tweets retrieved. When true, it runs break, which exits the for-loop.

 tweet_data.append([tweets.date, tweets.content, tweets.user.name, tweets.url])

If the if-statement condition isn’t true, a list of various things
from the tweet is appended to the larger list tweet_data.

 df = pd.dataframe(tweet_data, columns = ["Date","Tweets","Username","URL"])

and this final line creates a DataFrame from the data in tweet_data.

Cheers,
Cameron Simpson cs@cskk.id.au

Hi Cameron,

Thank you for helping me understand the for-loop.
I added the colon after 1000. Still, there returned an error report.
Here are the codes.

import snscrape.modules.twitter as sntw
import pandas as pd

tweet_data = []



for i, tweet in enumerate (sntw.TwitterSearchScraper.("COVID-19 deaths since:2019-01-01 until:2022-08-31").get_items()):
    for i > 1000:
    break

    tweet_data.append([tweets.date, tweets.content, tweets.user.name, tweets.url])

    df = pd.dataframe(tweet_data, columns = ["Date","Tweets","Username","URL"])

And here is the error report from Python.

Traceback (most recent call last):
  File "C:\Users\OX\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\OX\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "c:\program files\microsoft visual studio\2022\community\common7\ide\extensions\microsoft\python\core\debugpy\__main__.py", line 45, in <module>
    cli.main()
  File "c:\program files\microsoft visual studio\2022\community\common7\ide\extensions\microsoft\python\core\debugpy/..\debugpy\server\cli.py", line 444, in main
    run()
  File "c:\program files\microsoft visual studio\2022\community\common7\ide\extensions\microsoft\python\core\debugpy/..\debugpy\server\cli.py", line 285, in run_file
    runpy.run_path(target_as_str, run_name=compat.force_str("__main__"))
  File "C:\Users\OX\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 288, in run_path
    code, fname = _get_code_from_file(run_name, path_name)
  File "C:\Users\OX\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 257, in _get_code_from_file
    code = compile(f.read(), fname, 'exec')
  File "C:\Users\OX\source\repos\My first dashboard\My first dashboard\My_first_dashboard.py", line 8
    for i, tweet in enumerate (sntw.TwitterSearchScraper.("COVID-19 deaths since:2019-01-01 until:2022-08-31").get_items()):
                                                         ^
SyntaxError: invalid syntax
Press any key to continue . . .

Could you please take another look at it?

Best,
Tom

Full stops in Python are used to denote attributes (data and functions/methods) . So what your writing here is:

  • TwitterSearchScraper is an attribute of the sntw module
  • ("COVID-19 deaths since:2019-01-01 until:2022-08-31") is an attribute of the TwitterSearchScraper
  • get_items() is an attribute of ("COVID-19 deaths since:2019-01-01 until:2022-08-31")

Now ("COVID-19 deaths since:2019-01-01 until:2022-08-31") is not a valid Python name for a data item or a method. That causes the “Invalid syntax”. So I think you want to get rid of the full stop after TwitterSearchScraper. That will make TwitterSearchScraper a function that is called with "COVID-19 deaths since:2019-01-01 until:2022-08-31" as its first and only parameter. Probably the parameter is what you want TwitterSearchScraper to look for.

By Tom Chen via Discussions on Python.org at 18Sep2022 13:38:

I added the colon after 1000. Still, there returned an error report.
Here are the codes.

[.......]
for i, tweet in enumerate (sntw.TwitterSearchScraper.("COVID-19 deaths 
since:2019-01-01 until:2022-08-31").get_items()):
   for i > 1000:
   break

Menno Hölscher has detailed the error at the bottom. But I wanted to
point at the 2 lines aove. They cannot be what’s really in your code,
because they are invalid:

  • you mean if, not for; a for-loop has a different syntax nd that
    line should raise a SyntaxError
  • the break is not indented; compount statements ending in a colon
    require a suite of code indented below them, and there isn’t one
    because break is not indented

Please make sure you paste the code exactly as it was when you got the
error traceback.

Anyway, Menno seems to have pointed at the main problem.

Cheers,
Cameron Simpson cs@cskk.id.au

Thank you very much, Menno and Cameron! I really appreciate your kindness.
I have tried to get quick at knowing scraping data from Twitter. But I do not understand how Python works, thus having made some elementary mistakes.
I have changed the codes to the following, with no luck yet.

import snscrape.modules.twitter as sntw
import pandas as pd

tweet_data = []


for i, tweets in enumerate (sntw.TwitterSearchScraper ("COVID-19 deaths since:2019-01-01 until:2022-08-31").get_items()):
    if i > 1000:
        break

    tweet_data.append([tweets.date, tweets.content, tweets.url])

df = pd.DataFrame(tweet_data, columns = ["Date","Tweets","URL"]) 

The Python programme simply exits with no results at all.

Best,
Tom

I copied the same codes from Web Scraping with Python – How to Scrape Data from Twitter using Tweepy and Snscrape, and ran them in Visual Studio or Python 3.10. Nothing works. I always get “Syntax Error: invalid syntax”.

>>> import snscrape.modules.twitter as sntwitter
>>> import pandas as pd
>>>
>>> attributes_container = []
>>>
>>> for i,tweet in enumerate(sntwitter.TwitterSearchScraper('sex for grades since:2021-07-05 until:2022-07-06').get_items()):
...     if i>150:
...         break
...     attributes_container.append([tweet.user.username, tweet.date, tweet.likeCount, tweet.sourceLabel, tweet.content])
...
... tweets_df = pd.DataFrame(attributes_container, columns=["User", "Date Created", "Number of Likes", "Source of Tweet", "Tweet"])
  File "<stdin>", line 6
    tweets_df = pd.DataFrame(attributes_container, columns=["User", "Date Created", "Number of Likes", "Source of Tweet", "Tweet"])
    ^^^^^^^^^
SyntaxError: invalid syntax

It looks like there could be invisible characters causing the error in the text you pasted into the REPL. REPL is an interactive mode of the Python interpreter. It runs when you start Python without parameters.

Rather than pasting the code into REPL it is better to put it into a text file and execute the file using python3 / python / py (the command used to start Python on your platform).

Use a text editor which can show invisible characters, examine the file using a hexadecimal viewer or rewrite possible problematic parts manually.

1 Like

Good, no errors.

What did you expect to see? Let’s follow your program:

  • From tweet_data = [] to tweet_data.append([tweets.date, tweets.content, tweets.url]) you construct a list of the tweets. This list is a memory structure, it will store your data in memory.
  • Next you create a DataFrame from it. DataFrame is a matrix-like structure again in memory.
  • Your program ends. Python and the OS make the memory you have been using inaccessible.

Nowhere you show the data, or save it to disk. For saving it, the DataFrame has many ways to save the data, look at the method with names starting with to_, like to_excel and to_csv.

It will pay off to follow a tutorial or course on Python. The investment will make you very much more able to solve things on your own, making the process a lot faster.

1 Like

By Tom Chen via Discussions on Python.org at 20Sep2022 08:38:

I have tried to get quick at knowing scraping data from Twitter. But I
do not understand how Python works, thus having made some elementary
mistakes.
I have changed the codes to the following, with no luck yet.

import snscrape.modules.twitter as sntw
import pandas as pd

tweet_data = []


for i, tweets in enumerate (sntw.TwitterSearchScraper ("COVID-19 deaths since:2019-01-01 until:2022-08-31").get_items()):
   if i > 1000:
       break

   tweet_data.append([tweets.date, tweets.content, tweets.url])

df = pd.DataFrame(tweet_data, columns = ["Date","Tweets","URL"])

The Python programme simply exits with no results at all.

That is because you haven’t shown the results anywhere, just placed them
in the “df” variable.

Try adding the line:

 print(df)

at the bottom of the code above.

When you run a Python script as a command, such as:

 python your-script-here.py

it does exactly what’s in the script.

When you use the Python interactive prompt, for example by running the
command python and typing Python code to it, that environment prints
the results of expressions as you go, as a convenience. Example:

 >>> 3 + 5
 8
 >>> x = 3 + 5
 >>> print(x)
 8

The first line computes the value 8 and prints it for you because it
was not assigned to a variable. The second line computes 8 and stores
it in the variable x, and does not print it out for you. The third
line print(x) prints the value stored in x.

The same with your code above: if it runs without error, there should be
results in the variable df. You need to print(df) to see those
results.

Cheers,
Cameron Simpson cs@cskk.id.au

1 Like

By Tom Chen via Discussions on Python.org at 20Sep2022 09:08:

I copied the same codes from
Web Scraping with Python – How to Scrape Data from Twitter using Tweepy and Snscrape, and
ran them in Visual Studio or Python 3.10. Nothing works. I always get
“Syntax Error: invalid syntax”.

There is a clue at your Python prompt. Observe:

 >>> import snscrape.modules.twitter as sntwitter
 >>> import pandas as pd
 >>>
 >>> attributes_container = []
 >>>
 >>> for i,tweet in enumerate(sntwitter.TwitterSearchScraper('sex for grades since:2021-07-05 until:2022-07-06').get_items()):
 ...     if i>150:
 ...         break
 ...     attributes_container.append([tweet.user.username, tweet.date, tweet.likeCount, tweet.sourceLabel, tweet.content])
 ...
 ... tweets_df = pd.DataFrame(attributes_container, columns=["User", "Date Created", "Number of Likes", "Source of Tweet", "Tweet"])
   File "<stdin>", line 6
     tweets_df = pd.DataFrame(attributes_container, columns=["User", "Date Created", "Number of Likes", "Source of Tweet", "Tweet"])
     ^^^^^^^^^
 SyntaxError: invalid syntax

When you go to type tweets_df=......, the prompt is still ..., which
indicates that the prompt still thinks your for-loop is incomplete.
Because of that, it expects the tweets_df=.... to be part of the loop
body, and so it should be indents like the if-statement.

However, it isn’t intended as part of the loop body.

My belief is that you used copy/paste to give this code to the
interactive prompt, and that the copied text had some whitespace
(probably just spaces) in the blank line. Here’s me trying to reproduce
this:

 >>> for x in 1, 2, 3:
 ...   print(x)
 ...
 ... foo
   File "<stdin>", line 4
     foo
     ^
 SyntaxError: invalid syntax

It isn’t visible, but I deliberately typed a couple of space on the
blank line.

The interactive prompt does not behave exactly like a normal Python
script. Because it tries to gather up things as you type them so that
it can run them immediately, it uses some little cues to decide whether
you’ve typed something which needs one line or several lines.

If you type some spaces on a blank line, it decides there’s more code to
come, which is probably what caused your SyntaxError.

Try copy/pasting the for-loop but not the following blank line. Then
press Enter/Return to end the loop (the prompt needs that little cue to
decide the loop is complete). Then type or copy the tweets_df=
assignment line.

Most of us write code in files, and run the file. For example we might
put your little programme there in a file called get_tweets.py, and
run it as:

 python get_tweets.py

This avoids the foibles of the interactive prompt, and also means you
get to edit the file to make changes and retry instead of having to
retype the whole thing again from scratch.

Cheers,
Cameron Simpson cs@cskk.id.au

1 Like

Sometimes it helps to spread the code out over more lines by using named variables. Either the error will go away or it will be easier to identify where the error is coming from:

import snscrape.modules.twitter as sntwitter
import pandas as pd

attributes_container = []
searchterms = 'sex for grades since:2021-07-05 until:2022-07-06'

for i,tweet in enumerate(sntwitter.TwitterSearchScraper(searchterms).get_items()):
    if i > 150:
        break
    data = [tweet.user.username, tweet.date, tweet.likeCount, tweet.sourceLabel, tweet.content]
    attributes_container.append(data)
    columns = ["User", "Date Created", "Number of Likes", "Source of Tweet", "Tweet"]
    tweets_df = pd.DataFrame(attributes_container, columns=columns)

When copying and pasting from a website, sometimes it can include invisible (to a human!) characters which confuse the interpreter. To make things more confusing, passing that code through a different website, or email, can sometimes remove those invisible characters.

It is rare for this to happen, but when it does, it is horribly frustrating to debug.

1 Like

Oh well spotted!

Shouldn’t that give an IndentationError instead of a SyntaxError? We should raise a ticket for this.

(I don’t have time now, but if nobody beats me to it, I can do it over the next couple of days.)

3 Likes

By Steven D’Aprano via Discussions on Python.org at 20Sep2022 21:53:

Oh well spotted!

Shouldn’t that give an IndentationError instead of a SyntaxError? We should raise a ticket for this.

(I don’t have time now, but if nobody beats me to it, I can do it over
the next couple of days.)

Maybe, but is it an IndentationError? The tweeds_df= is correctly
indented for the author’s intent, and legally indented for a standalone
Python file. It doesn’t work in the interactive prompt because of how it
assembles code for execution, because of the spaces on the preceding
line.

Maybe the interactive prompt should have a more evident way of
highlighting the whitespace-only line which makes it think the for-loop
continued?

Cheers,
Cameron Simpson cs@cskk.id.au

1 Like

Assuming we can’t fix the interactive interpreter, then yes, its an IndentationError. The interpreter is expecting an indented line, and not getting one, so it raises SyntaxError.

This works fine:

>>> for i in (1, 2):
...     pass
...     
...     pass
... 
>>> 

but if the second pass isn’t indented, you get an uninformative SyntaxError:

>>> for i in (1, 2):
...     pass
...     
... pass
  File "<stdin>", line 4
    pass
    ^^^^
SyntaxError: invalid syntax

The only difference between the two examples is the lack of indentation, hence, IndentationError.

The status quo is the worst of both worlds:

  • the interactive interpreter can’t handle all legal Python code;
  • and when it fails, you don’t even get a useful error message.

This is backwards compatible too: IndentationError is a subclass of SyntaxError.

Ideally the exception should come with a hint to either indent the failing line, or to remove whitespace from the previous line.

1 Like

I am grateful to @cameron, @Mholscher, Václav Brožík, Steven D’Aprano for taking your time to help me. My little programme is now less important than what I have learned from your discussion and generosity.