Calculate a t-test returning the test value an p-value

Working on a class assignment, given the instructions, can you help me understand what is wrong with my code?

I need to calculate a t-test returning the test value and the p-value for two sets of variables. As in the previous challenge, you’ll use a preloaded DataFrame that contains information about medical costs in different regions of the country and across other demographics, including age, sex, BMI (body mass index), number of children, and smoker vs. non-smoker.

For this challenge, you’ll focus on evaluating the sets sex and charges, and sex and bmi.

You’ll also start with preloaded code that creates two separate DataFrames, each containing all the columns and rows for sex equal to male or female, and the resulting DataFrames are named df_male and df_female, respectively. Your solution will use these new DataFrames. Also, you’ll be assigning values to two variables at once, as you did in the previous challenge.

  1. For the t-test between sex and charges, assign the test value to the variable tc and the p-value to pc. Print out both results.
  2. For the t-test between sex and bmi, assign the test value to the variable tb and the p-value to pb. Print out both results.

You’ll notice that only one of them has a significant p-value (less than 0.05), and your solution should confirm that it is the test that evaluates charges, not bmi.

CODE

# Import libraries
import pandas as pd
from scipy import stats
from scipy.stats import ttest_ind

df = pd.read_csv('https://tf-assets-prod.s3.amazonaws.com/tf-curric/data-analytics-bootcamp/medicalcosts.csv')

# Create two separate DataFrames for sex
df_male = df.loc[df['sex'] == 'male']
df_female = df.loc[df['sex'] == 'female']

# Run a t-test comparing the charges column between males and females and print the results
tc, pc = stats.ttest(df_male, df_female)
print(tc)
print (pc)

# Run a t-test comparing the bmi column between males and females and print the results
tb, pb = stats.ttest(df_male, df_female)
print(tb)
print (pb)

Here is the error I get when running it:

Error
ImportError: Failed to import test module: tests.test_solution
Traceback (most recent call last):
File “/usr/local/lib/python3.7/unittest/loader.py”, line 436, in _find_test_path
module = self._get_module_from_name(name)
File “/usr/local/lib/python3.7/unittest/loader.py”, line 377, in _get_module_from_name
import(name)
File “/workspace/datascience/tests/test_solution.py”, line 25, in
from solution import *
File “/workspace/datascience/solution.py”, line 13, in
tc, pc = stats.ttest(df_male, df_female)
AttributeError: module ‘scipy.stats’ has no attribute ‘ttest’

How exactly does your code fail @sarahw ? Because the copy-pasted assignment and code don’t tell, and for lack of the data set one can’t try.

Please wrap the code in triple backticks to preserve the formatting, like this:

```
# An example.
if True:
    print('Hello world!')
```

As for your code, you have this line:

df = pd.read_csv

which merely binds (assigns) the function read_csv of pandas to df.

You need to call the function with the path to the data file:

df = pd.read_csv('path/to/file')

thank you @Dutcho ! I added the error I am receiving as well.

Thanks @MRAB - I added the URL address. python.org gave me an error on that when I first posted so I thought I had to delete the reference.

Looking here:

https://docs.scipy.org/doc/scipy/reference/stats.html

it appears that there’s no ttest, but there are ttest_1samp, ttest_ind, ttest_ind_from_stats and ttest_rel.

You’ll need to pick the appropriate one.