Performance tuning

Shiv · June 1, 2020, 1:14pm

Hello I am new to python and I need some solution or hint what alternative can be used . I am trying to create a new column based on existing column of python for some 77k records.

I already tried have working code but this below piece of code is taking 25 mins to do so.
I also tried to apply cprofile didn’t worked.

can anyone please help me how to do in some 1 or 2 mins. I do same step in SAS in 30 secs. i need in python

def SCHOOL (df_row) :

    if df_row['STUDENT_GRADE') in ('AB',''):
               df_row["AVERAGE_GRADE'] = 'A'
    else:
              df_row['AVERAGE_GRADE']
                      = df_row['STUDENT_GRADE']
    if df_row['STUDENT_GRADE') in ('AC',''):
              df row[' AVERAGE_GRADE'] = 'A'
    if df_row['STUDENT_GRADE') in ('AD',''):
              df row['AVERAGE_GRADE'] = 'A' 
    if df_row['STUDENT_GRADE'] in ('AE',''):
              df row['AVERAGE_GRADE'] = 'A'
    return df_row

GRADE_Details = SCHOOL1.apply(SCHOOL, axis=1) GRADE Details

Rhodri · June 1, 2020, 4:44pm

Have you considered turning the SCHOOL function into a table lookup?

Shiv · June 1, 2020, 6:11pm

Hey James What do you mean by table lookup.

steven.daprano · June 2, 2020, 1:33am

Hi Shiv,

Please read this:

http://www.sscce.org/

You are telling us your code is slow, but only showing us one tiny part
of it. How do you know that the slowness is because of the SCHOOL
function? Maybe it is slow because of something else. Or maybe you are
correct that the problem is in SCHOOL, but how do we know?

This is like you going to the car mechanic and saying “My car is slow,
here’s a picture of the back tyres, what’s wrong with the car?”

Your function does a lot of unnecessary work, but not so much that it
should take 25 minutes. More like 25 milliseconds. So I am pretty sure
that the slowdown is somewhere else. But I could be wrong.

Your SCHOOL function looks up ‘STUDENT_GRADE’ at least four times. So
you can do it only once instead:

def SCHOOL(df_row):
    grade = df_row['STUDENT_GRADE']
    if grade in ('AB', AC', 'AD', 'AE'):
        df_row['AVERAGE_GRADE'] = 'A'
    else:
        df_row['AVERAGE_GRADE'] = grade
    return df_row

By the way, you are posting code you have re-written and that contains
errors, code that cannot possibly run because of syntax errors. What
else is different? What are you not showing us?

Rhodri · June 2, 2020, 6:07pm

I mean literally looking up the information you keep doing comparisons on in a dictionary (i.e. a table) instead of repeatedly comparing it in that bizarre way. Actually I was wrong; you can simplify that logic enormously if you just think about what it’s doing for a minute.

Regardless, Steven is right. Your “SCHOOL” function, inefficient as it is, is highly unlikely to be your bottleneck. Have you tried profiling your code to see what is taking up all the time?