Ok. One more question. I see that your current Exel formula for the score does not refer to any columns in your example. Can I assume that D6 and E6 correspond to E and F (in the example dataframe), and that AJ6:AM6 correspond to A:D?
I didn’t see a response, so for now I will assume that the answer to my last question is “yes”. I trust that the code below still illustrates how to solve your problem if the answer is “no”.
I’ll also assume you have Python 3.11 or later and pandas 2.2.1 with the ability to read/write Exel files (on my Mac this required an extra python package install: openpyxl
). My sample code can be used as stand-alone python code. I’ll try to keep the code as simple as possible. It’s definitely possible to optimize it and make it much more efficient, but this may not be worth the trouble for small files (< 1M rows). The current code may also make it easier for you to later modify it. I’ll be using simple letters ‘A’ to ‘H’ as column headers (you may need to replace those by ‘dm lbr’, ‘dm prts’ etc).
The Exel formula:
IF(E="unkn","0",
IF(AND(E="no",F="no"),SWITCH(A,">0",4,-12)+SWITCH(B,">0",4,-4)+SWITCH(C,">0",4,4)+SWITCH(D,">0",4,4),
IF(AND(E="yes",F="yes"),SWITCH(A,">0",4,-4)+SWITCH(B,">0",4,-4)+SWITCH(C,">0",4,0)+SWITCH(D,">0",2,0),
IF(AND(E="yes",F="no"),SWITCH(A,">0",4,-4)+SWITCH(B,">0",4,-4)+SWITCH(C,">0",4,0)+SWITCH(D,">0",2,2),
IF(AND(E="no",F="yes"),SWITCH(A,">0",4,-4)+SWITCH(B,">0",4,-4)+SWITCH(C,">0",4,4)+SWITCH(D,">0",2,2),"Need PM info"
)))))
can be directly translated into a function that operates on the rows of a pandas dataframe:
def confidence_score(row):
match (row['E'], row['F']):
case ("unkn", _):
score == 0
case ("no", "no"):
a = 4 if row['A'] == ">0" else -12
b = 4 if row['B'] == ">0" else -4
c = 4 # equivalent to: c = 4 if row['C'] == ">0" else 4
d = 4 # equivalent to: d = 4 if row['D'] == ">0" else 4 # d = 4
score = a + b + c + d
case ("yes", "yes"):
a = 4 if row['A'] == ">0" else -4
b = 4 if row['B'] == ">0" else -4
c = 4 if row['C'] == ">0" else 0
d = 2 if row['D'] == ">0" else 0
score = a + b + c + d
case ("yes", "no"):
a = 4 if row['A'] == ">0" else -4
b = 4 if row['B'] == ">0" else -4
c = 4 if row['C'] == ">0" else 0
d = 2 if row['D'] == ">0" else 2
score = a + b + c + d
case ("no", "yes"):
a = 4 if row['A'] == ">0" else -4
b = 4 if row['B'] == ">0" else -4
c = 4 # c = 4 if row['C'] == ">0" else 4
d = 2 # d = 2 if row['D'] == ">0" else 2
score = a + b + c + d
case _:
score = None # rather than "Need PM info"
return score
Note that I only replaced the default case (case _
) output by None instead of a string. I did this in order to make sure that the datatype associated with the newly created column is a numerical datatype, rather than a generic ‘object’. From your question I can see you know Exel very well (definitely better than I do), so it should be straightforward to match up your formula with the python code. The statement x = y if condition else z
is equivalent to the Exel SWITCH
statement. The match ...: case ...
block is a nicer way to represent messier if ... elif ... elif ... else ...
blocks.
This function can be used in a script as follows:
import itertools as it
import pandas as pd
# this recreates your example dataframe in pandas
# the only difference is that I'm using simplified column header names
# 'itertools' is only used for creating the artifical sample data
columns = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']
data = list(it.product(['>0', '0'], ['>0', '0'],
['>0', '0'], ['>0', '0'],
['no', 'yes'], ['no', 'yes'],
['no', 'yes'], ['no', 'yes']))
df = pd.DataFrame(data, columns=columns)
# to instead read an actual Exel file, you run:
# df = pd.read_excel("data.xlsx")
# copy-paste the above `confidence_score` function definition here
# create new column with the confidence score
df['I'] = df.apply(confidence_score, axis=1)
# print to screen
# print(df)
# save again as Exel file
df.to_excel("data.xlsx")
To make this easier to use for actual data, I suggest (1) make sure to explicitly only use one particular data type in each column; (2) rather than using strings “no” and “yes”, use either booleans or integers (0 and 1). In the code below I’ve made columns A:D all take integer values.
These modifications create clear and clean(er) code. They also make storage and processing a bit more efficient and make it easier to later modify the code. With those modifications, the complete script could be sth like this:
import itertools as it
import pandas as pd
columns = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']
data = list(it.product([0, 1, 2], [0, 3, 7], [0, 1, 2], [0, 3, 7],
[True, False], [True, False], [True, False], [True, False]))
df = pd.DataFrame(data, columns=columns)
def confidence_score(row):
match (row['E'], row['F']):
case (False, False):
a = 4 if row['A'] > 0 else -12
b = 4 if row['B'] > 0 else -4
c = 4 # c = 4 if row['C'] > 0 else 4
d = 4 # d = 4 if row['D'] > 0 else 4
score = a + b + c + d
case (True, True):
a = 4 if row['A'] > 0 else -4
b = 4 if row['B'] > 0 else -4
c = 4 if row['C'] > 0 else 0
d = 2 if row['D'] > 0 else 0
score = a + b + c + d
case (True, False):
a = 4 if row['A'] > 0 else -4
b = 4 if row['B'] > 0 else -4
c = 4 if row['C'] > 0 else 0
d = 2 if row['D'] > 0 else 2
score = a + b + c + d
case (False, True):
a = 4 if row['A'] > 0 else -4
b = 4 if row['B'] > 0 else -4
c = 4 # c = 4 if row['C'] > 0 else 4
d = 2 # d = 2 if row['D'] > 0 else 2
score = a + b + c + d
case _:
score = None
return score
df['I'] = df.apply(confidence_score, axis=1)
(Note that I made yet one more modification: I subsumed the “unkn” case under the default case. So any “unkn” value in E will also lead to a None (null) value in the score. Not sure if that is wanted, it just seemed a little strange to me to give a confidence score of 0 if E==‘unkn’.)