Hi, I am Barath
I am new to Python, I have experience with R and SPSS but I am trying to automate the Basic descriptive analysis like Custom Table in SPSS where we get 2 * 2 table for categorical variables(n & %) and we can add the number of Independent variables in Row as given below.
My Code
import pandas as p
import numpy as np
d = {“Gender”: [1,1,1,2,2,1,1,2,1,2,1,2,2,1,1], “Death”: [2,2,1,1,2,1,2,1,1,2,2,1,1,1,2], “Income” : [1,1,3,3,1,2,3,1,1,2,2,3,1,3,2]}
my_df=pd.DataFrame(d)
print(my_df)
########### Final Two-way Custom table
‘’‘’‘’’
def frequency_table(my_df ,column):
freq=my_df[column].value_counts(dropna=True).sort_index() # to get the frequncy
# to get the percentage
percent= freq/freq.sum() * 100
# all values converting to integer data type
percent = [int(i) for i in percent]
if 'Death' in my_df:
for sur in my_df['Death']:
if (sur ==1):
frq_sur_0=my_df['Death'].value_counts(dropna=True).sort_index()
per_sur_0= frq_sur_0/frq_sur_0.sum() * 100
# all values converting to integer data type
per_sur_0 = [int(i) for i in per_sur_0]
if (sur ==2):
frq_sur_1=my_df['Death'].value_counts(dropna=True).sort_index()
per_sur_1= frq_sur_1/frq_sur_1.sum() * 100
# all values converting to integer data type
per_sur_1 = [int(i) for i in per_sur_1]
freq_table = pd.DataFrame({
'n':freq,
'%':percent,
'Case(n)' : frq_sur_0,
'Case(%)' : per_sur_0,
'Control(n)' : frq_sur_1,
'Control(%)' : per_sur_1
})
return freq_table
#freq_table = freq_table.apply(f, axis=0)
cols=[‘Gender’, ‘Income’] #‘sym’,
for column in cols:
print(f"Frequency Table for {column}:\n", frequency_table(my_df, column),“\n”)
‘’‘’‘’‘’
when I run the code I get the error says : ValueError: All arrays must be of the same length
- I did not able to give the correct Frequency and Percentage for the Column variable.
I am trying to fix it but finding little difficult, It would be helpful If I could get any help.