Help with analyzing VAERS data. I am a new python student help appreciated


I am a newbie to Python and coding in general and am doing my masters in Data Science. for one of my projects I am deciding to analyze deaths in the VAERS 2021 data on covid 19 vaccines. I need help. First of all the deaths are recorded as either blank or “Y” which means yes the individual died. I want to get some basic statistics, looking at the age range/sex or individuals died, is there a trend etc. Data was obtained from here: VAERS - Data Sets

here are my columns of interest

I need help figuring how to do this (should I fill in the null data? how can I get graphs and stats out of this?)

Thanks very much!



This seems to be more of a data science question than a Python question, FYI. If data is a pandas DataFrame (and if it isn’t, it almost certainly should be), then the Pandas docs starting with the beginner tutorial should guide you on how to perform basic statistics, plots and other operations. Some tips to get your started:

  • data.describe() and will give you some basic stats
  • You can use data.replace() to replace blanks with N, or whatever else you choose
  • data["COLUMN_NAME"].value_counts() will give you the counts of the individual values for that column (e.g. data["DIED"].value_counts() the number that did and didn’t die)
  • data.plot() will give you various basic plots of the data, e.g. data.plot(x="AGE_YRS", kind="hist") to get a histogram of deaths by age, for instance
  • Consider Seaborn for more advanced plotting (e.g. relationships and trends between all or some of the variables)
  • You can compute a trend by fitting a simple OLS linear regression model, e.g. with statsmodels or sklearn, or do so manually

Best of luck!

Thanks these were so helpful!! much appreciated. And thanks for redirecting me to that tutorial! here is what I have achieved so far:

Histogram of deaths (Yes or no) by age:

Anyone know how I could further divide this by Gender? they all have male/female attributes. so for example for died and YES there would be two histograms showing the distribution of male vs female

You could use seaborn.histplot and code gender as hue, with multiple=dodge, do a 2D histogram with both an x and a y, or do a grid of some sort. But again, this is really more of a data science or pandas/matplotlib/seaborn question than a core Python one.