For Loop Compared to Static Number

I am iterating every nth column and comparing the year in a datetime to a year (int) in a list. If year in datetime is >= year (int) in the list, the result is 3, otherwise 2. So each column will be compared to the current (in the loop) year (x) in the list. If I use a static number to compare, like 2012, it works, but if I use the variable (x) from the for loop, it doesn’t work. The statement is always false and therefore, 2. I need each column to look at the next number in the list.

list_num = list(range(2000, 2020))

for x in list_num:  
    for i in range(15, df.shape[1],2):
         df.iloc[:,i+1] = np.where((df['ColDatetime'].dt.year >= x),3,2)

I think what you’re actually doing is:

df.iloc[:,i+1] = np.where((df['ColDatetime'].dt.year >= 2000),3,2)

which sets some to 3 and others to 2, followed by:

df.iloc[:,i+1] = np.where((df['ColDatetime'].dt.year >= 2001),3,2)

which sets some to 3 and others to 2, overwriting what the previous step did, and eventually doing:

df.iloc[:,i+1] = np.where((df['ColDatetime'].dt.year >= 2019),3,2)

overwriting what the all of the previous steps did.

So what is the correct way to do it? Switch the for loops? I’ve done that and it didn’t work either. I need to compare each column to the next item in the list.

Well, df['ColDatetime'].dt.year is independent of both x and i, it always has the same value, and you’re comparing it to 2000, 2001, etc.

You say “compare each column to the next item in the list”. What exactly do you mean by that?

The desired outcome I’m looking for. I put the column headers as a list because I couldn’t figure out how to use the column headers as variables. So it should be: where datetime year >= column header year, if true = 3, else 2.

Datetime 2000 2001 2002 2003 2004 2005
2003-05-31 00:00:00 3 3 3 3 2 2
2 2 2 2 2 2
2003-05-31 00:00:00 3 3 3 3 2 2
2003-05-31 00:00:00 3 3 3 3 2 2
2 2 2 2 2 2
2003-05-31 00:00:00 3 3 3 3 2 2
2003-05-31 00:00:00 3 3 3 3 2 2
2 2 2 2 2 2
2003-05-31 00:00:00 3 3 3 3 2 2
2 2 2 2 2 2
2001-11-30 00:00:00 3 3 2 2 2 2
2004-11-30 00:00:00 3 3 3 3 3 2
2002-11-30 00:00:00 3 3 3 2 2 2
2005-11-30 00:00:00 3 3 3 3 3 3
2000-02-01 00:00:00 3 2 2 2 2 2

df['Datetime'] returns the ‘Datetime’ column and df['2000'] returns the ‘2000’ column, for example, so you can do:

for year in range(2000, 2006):
    df[str(year)] = np.where(df['Datetime'].dt.year >= year, 3, 2)

Does that work for you?

I think you got your answer, but I can’t help doing a bit of code review:

list_num = list(range(2000, 2020))

The is no reason to wrap that range in a list - a range object is already iterable (i.e. can be put in a for loop). in fact, a range object is a full immutable sequence – that is, can be used anywhere a list can, if you don’t want to change it.

for x in list_num:

I highly recommend that you use meaningful variable names. x could by anything, and what the heck is a “list_num”?

maybe:

for year in valid_years:

or a name that reflects what that range of years is.