I have a data frame called subset_df and I want to group by it based on a column named “rolling_agg_timestep”. here is my code to do it:
def get_nodeToIndexMap(self, subset_df): #,rolling_agg_timestep):
print('subset_df:',subset_df.columns)
if rolling_agg_timestep not in subset_df.columns:
raise KeyError(f"Column '{rolling_agg_timestep}' does not exist in the DataFrame.")
all_mappings = {}
subset_group = subset_df.groupby("rolling_agg_timestep")
Here is what I see in the console after running the code:
subset_df: Index(['id', 'pos', 'lane', 'rolling_agg_timestep', 'rolling_mean_speed',
'rolling_std_speed', 'rolling_mean_accel', 'rolling_std_accel',
'rolling_std_y', 'rolling_mean_y', 'label'],
dtype='object')
Traceback (most recent call last):
File "/Users/Documents/conference-code/STGCN.py", line 815, in <module>
nodeToIndexList = gInfo.get_nodeToIndexMap(subset_df) #,rolling_agg_timestep
File "/Users/Documents/conference-code/STGCN.py", line 245, in get_nodeToIndexMap
raise KeyError(f"Column '{rolling_agg_timestep}' does not exist in the DataFrame.")
KeyError: "Column '1' does not exist in the DataFrame."
I appreciate if anyone can help me how can I group by the mentioned data frame bases on rolling_agg_timestep.
It looks like your dataframe literally has a column named
“rolling_agg_timestep”. But you’re using the variable rolling_agg_timestep, which contains the string "Column '1'" which
is recited in your exception message.
Your .groupby() call uses the literal "rolling_agg_timestep", so it
would probably work. Your pretest on the variable is incorrect,
because the variable holds the wrong string.
Do you know how the variable came to hold the string "Column '1'" ?
If I group by ‘rolling_agg_timestep’, I will have 239 groups and now after running the code, the error is:
File "/Users/Documents/conference-code/STGCN.py", line 838, in <module>
nodeToIndexList = gInfo.get_nodeToIndexMap(subset_df_raw) #,rolling_agg_timestep
File "/Users/Documents/conference-code/STGCN.py", line 245, in get_nodeToIndexMap
raise KeyError(f"Column '{rolling_agg_timestep}' does not exist in the DataFrame.")
KeyError: "Column '239' does not exist in the DataFrame."
if rolling_agg_timestep not in subset_df.columns:
raise KeyError(f"Column '{rolling_agg_timestep}' does not exist in the DataFrame.")
which isn’t looking for a column named "rolling_agg_timestep" in the columns but for a column named by whatever value is in the global variable rolling_agg_timestep, which in your example above has the string "Column '239'".
The distinction I’m making here is a bit like this:
column_names = ["foo", "bar", "baz"]
foo = "Column '1'"
assert "foo" in column_names # this should be ok
assert foo in column_names # this looks up "Column '1'" and fails