I’m working with two source files within a Python project; one is “base” information, which consists of historical football game results (dates, teams, scores and associated game statistics). This first source is used to train the model.
The second source file is a list of upcoming games, for which the model is generating predictions.
By their very nature, the two datasets consist of different numbers of rows, so when the prediction part of the program runs, it results in an error about the length of each (I’ll include the specifics below).
To work around that, I’m attempting to add rows to the upcoming-games source such that both sets of data have equivalent length. I suspect this isn’t the best way to handle this, but until I find, or can figure out, a better alternative, this is what I’m working with.
So I implemented a loop that attempts to append a program-calculated number of rows to the upcoming games data set. When the code runs, however, this loop doesn’t add any rows as expected. In researching such an issue, I’ve read about working on an original list versus a COPY of the list, so I tried some of those recommended solutions to no avail.
I’ve included as minimal code as I think may be necessary, but will happily include whatever may be needed to best understand what I’m doing and to assess why it may not be working.
(NOTE: The entire program runs flawlessly if I manually add rows to the upcoming games data set such that both training source and prediction source are equivalent lengths. It’s when I introduce the code below that the problem occurs.)
Applicable code:
# This splits the base source csv file which had previously been imported
X_train, X_test, Y_train_team1, Y_test_team1, Y_train_team2, Y_test_team2 = train_test_split(X, y_team1, y_team2, test_size=0.2, random_state=42)
print('len of X_test:', len(X_test)) # this will vary depending on the test_size factor in the split
print('len of df_upcoming_pre_loop:', len(df_upcoming))
# define rows to add to the upcoming games file so there will be an equal number of rows in both the test and upcoming games model
new_row = {'date': '12/31/25', 'match_id': 'away@home12345', 'home_team_name': 'home', 'away_team_name': 'awway', 'home_team_score': 0, 'away_team_score': 0, 'home_team_feature_1': 0,'away_team_feature_1': 0,'home_team_feature_2': 0,'away_team_feature_2': 0,'home_team_feature_3': 0,'away_team_feature_3': 0,'home_team_feature_4': 0,'away_team_feature_4': 0,'home_team_feature_5': 0,'away_team_feature_5': 0,'home_team_feature_6': 0,'away_team_feature_6': 0,'home_team_feature_7': 0,'away_team_feature_7': 0,'home_team_feature_8': 0,'away_team_feature_8': 0,'home_team_feature_9': 0,'away_team_feature_9': 0,'home_team_feature_10': 0,'away_team_feature_10': 0,'home_team_result': 0,'away_team_result': 0}
# This is where I'm attempting to add rows to the list
rows_to_add = len(X_test)-len(df_upcoming)
print('rows_to_add:', rows_to_add)
for i in range(0, rows_to_add): # for a list
df_upcoming.append(new_row, ignore_index=True)
print('len of df_upcoming_post_loop:',len(df_upcoming))
And here is the output of the four “print” statements above:
len of X_test: 758
len of df_upcoming_pre_loop: 64
rows_to_add: 694
len of df_upcoming_post_loop: 64
What this is telling me that the X_test dataset is 758 rows deep (as it should be), the length of the upcoming games dataset is 64 rows deep (as it is), the differential (rows I need to add) is 694 deep, but AFTER the loop, the upcoming games dataset is still 64 rows deep. That is, the loop doesn’t throw an error but it doesn’t function either.