Append not adding desired elements within a loop

RichInLesta · December 13, 2024, 11:15pm

I’m working with two source files within a Python project; one is “base” information, which consists of historical football game results (dates, teams, scores and associated game statistics). This first source is used to train the model.

The second source file is a list of upcoming games, for which the model is generating predictions.

By their very nature, the two datasets consist of different numbers of rows, so when the prediction part of the program runs, it results in an error about the length of each (I’ll include the specifics below).

To work around that, I’m attempting to add rows to the upcoming-games source such that both sets of data have equivalent length. I suspect this isn’t the best way to handle this, but until I find, or can figure out, a better alternative, this is what I’m working with.

So I implemented a loop that attempts to append a program-calculated number of rows to the upcoming games data set. When the code runs, however, this loop doesn’t add any rows as expected. In researching such an issue, I’ve read about working on an original list versus a COPY of the list, so I tried some of those recommended solutions to no avail.

I’ve included as minimal code as I think may be necessary, but will happily include whatever may be needed to best understand what I’m doing and to assess why it may not be working.

(NOTE: The entire program runs flawlessly if I manually add rows to the upcoming games data set such that both training source and prediction source are equivalent lengths. It’s when I introduce the code below that the problem occurs.)

Applicable code:

# This splits the base source csv file which had previously been imported
X_train, X_test, Y_train_team1, Y_test_team1, Y_train_team2, Y_test_team2 = train_test_split(X, y_team1, y_team2, test_size=0.2, random_state=42)

print('len of X_test:', len(X_test)) # this will vary depending on the test_size factor in the split

print('len of df_upcoming_pre_loop:', len(df_upcoming))

# define rows to add to the upcoming games file so there will be an equal number of rows in both the test and upcoming games model

new_row = {'date': '12/31/25', 'match_id': 'away@home12345', 'home_team_name': 'home', 'away_team_name': 'awway', 'home_team_score': 0, 'away_team_score': 0, 'home_team_feature_1': 0,'away_team_feature_1': 0,'home_team_feature_2': 0,'away_team_feature_2': 0,'home_team_feature_3': 0,'away_team_feature_3': 0,'home_team_feature_4': 0,'away_team_feature_4': 0,'home_team_feature_5': 0,'away_team_feature_5': 0,'home_team_feature_6': 0,'away_team_feature_6': 0,'home_team_feature_7': 0,'away_team_feature_7': 0,'home_team_feature_8': 0,'away_team_feature_8': 0,'home_team_feature_9': 0,'away_team_feature_9': 0,'home_team_feature_10': 0,'away_team_feature_10': 0,'home_team_result': 0,'away_team_result': 0}

# This is where I'm attempting to add rows to the list
rows_to_add = len(X_test)-len(df_upcoming)
print('rows_to_add:', rows_to_add)

for i in range(0, rows_to_add): # for a list
    df_upcoming.append(new_row, ignore_index=True)

print('len of df_upcoming_post_loop:',len(df_upcoming))

And here is the output of the four “print” statements above:
len of X_test: 758
len of df_upcoming_pre_loop: 64
rows_to_add: 694
len of df_upcoming_post_loop: 64

What this is telling me that the X_test dataset is 758 rows deep (as it should be), the length of the upcoming games dataset is 64 rows deep (as it is), the differential (rows I need to add) is 694 deep, but AFTER the loop, the upcoming games dataset is still 64 rows deep. That is, the loop doesn’t throw an error but it doesn’t function either.

MRAB · December 13, 2024, 11:39pm

Here’s what the docs say:

DataFrame.append(other, ignore_index=False, verify_integrity=False, sort=False)

Append rows of other to the end of caller, returning a new object.

Does that help?

RichInLesta · December 14, 2024, 12:22am

Thanks for the suggestion, but I’m trying to extend the length of an existing list. In the example you provided, the implication is that there’s a separate list from which it draws from to append to the “current” list.

Apparently, there’s a difference between appending to a list and appending to a dataframe. So I also tried this approach:

#for i in range(rows_to_add): # for a dataframe

df_upcoming.df_upcoming.append(new_row, ignore_index=True)

Which didn’t work.

I’m suspecting it has something to do with modifying an existing list versus modifying a COPY of that list, but I also tried working with that which produced the same results.

MRAB · December 14, 2024, 12:30am

Appending to a dataframe will return the return the new dataframe, which means that it’ll be copying eac time; it won’t extend the current one in-place.

It would be faster to build a list and then make a dataframe from that.

RichInLesta · December 14, 2024, 1:49am

I’m having a hard time wrapping my head around your reply. If append simply overwrites a dataframe, what would be the point of appending?

It’s quite possible I’m not asking the right question, so indulge me with sharing some additional detail.

I have two csv files which I import:

df_base = pd.read_csv(r'/Users/Documents/PythonProjects/pyFootball/Source Files/results_thru_w14.csv')

df_upcoming = pd.read_csv(r'/Users/Documents/PythonProjects/Source Files/Upcoming_NFL_Schedule.csv')

The first file is used for training the model. (Train/Test split according to sklearns’ train_test_split function). The results are saved off to a pickle file.

The upcoming games file (second one to be read in) is then called. In order to function correctly, this file needs the same number of rows as the test portion of the previously split “base” file. In order to get accomplish that, I need to extend the length of df_upcoming. To do so, I’m trying to iterate through the required number of loops until the lengths of the test data and upcoming games data are equal.

So with that context, you’re suggesting I start with a new list. In my mind, “df_upcoming” IS a new list. And that’s what I’m trying to append to. So I think I’m doing precisely what you recommend, and that doesn’t work.

Am I misunderstanding?

MRAB · December 14, 2024, 2:14am

The append method of a dataframe is not like the append method of a list. The list’s method modifies itself whereas the dataframe’s method returns the new dataframe.

Appending one row at a time is, therefore, not efficient because of the copying involved. It’s better to make a list of what you want appended and then append that list in one go.

RichInLesta · December 14, 2024, 2:20am

I just figured out what I was doing wrong. I looked closer at my original importing of the two files (the ones I included in my previous message) and realized that I had been importing the upcoming games file as a dataframe, and not as a list. Therefore, trying to treat it like a list would (obviously, now) never work.

So I made that change and now my append loop is working as expected.

Thanks for your time, explanations, and patience. I’m now just a bit smarter!