Comparison Table for RMSE and MAE Using Three Different Models

Desstro77 · March 17, 2023, 4:09pm

I am performing three different regression models and to minimize the coding I have created a function to fit, train, test etc…

This is all fine and dandy, but after I have ran the function against the three different models, LR, DTR, and RFR, I am now trying to find a way to capture the results in an empty df so that I can print the results in a visually appealing comparison table. I only want to see the testing results RMSE and MAE for each model and preferably with a gradient applied.

MY FUNCTION

def evaluate_model(model, X_train, y_train, X_test, y_test):
    # fit the model
    model.fit(X_train, y_train)

    # print the parameters
    print(model.get_params(), end="\n\n")

    # predict the values using training data
    train_pred = model.predict(X_train)

    # evaluate using training data
    train_mae = mean_absolute_error(y_train, train_pred)
    train_rmse = np.sqrt(mean_squared_error(y_train, train_pred))

    # print the results of the training data
    print("Results of the training data\n")
    print("Mean Absolute Error: {:.2f}".format(train_mae))
    print("Root Mean Squared Error: {:.2f}\n".format(train_rmse))

    # visualize training data
    fig_train = px.scatter(x=y_train, y=train_pred,
                            labels={'x': 'Actual Values', 'y': 'Predicted Values'},
                            title='Visualization of Actual Data vs. Prediction of Training Data')
    fig_train.add_scatter(x=y_train, y=y_train, mode='lines', line=dict(color='#e6981c', width=4))
    fig_train.show()

    # predict the values using testing data
    test_pred = model.predict(X_test)

    # evaluate using testing data
    test_mae = mean_absolute_error(y_test, test_pred)
    test_rmse = np.sqrt(mean_squared_error(y_test, test_pred))

    # print the results of the testing data
    print("Results of the testing data\n")
    print("Mean Absolute Error: {:.2f}".format(test_mae))
    print("Root Mean Squared Error: {:.2f}\n".format(test_rmse))

    # visualize testing data
    fig_test = px.scatter(x=y_test, y=test_pred,
                            labels={'x': 'Actual Values', 'y': 'Predicted Values'},
                            title='Visualization of Actual Data vs. Prediction of Testing Data')
    fig_test.add_scatter(x=y_test, y=y_test, mode='lines', line=dict(color='#1ce658', width=4))
    fig_test.show()

Then in the next three blocks I have this…

lr = LinearRegression()
evaluate_model(lr, X_train, y_train, X_test, y_test)

then

dtr = DecisionTreeRegressor(random_state=42)
evaluate_model(dtr, X_train, y_train, X_test, y_test)

… and finally

rf = RandomForestRegressor(random_state=42)
evaluate_model(rf, X_train, y_train, X_test, y_test)

Now, from this point I would like to have a comparison table showing the test_data MAE and RMSE for all three regressions with gradient… I cant figure out how to do it.

Desstro77 · March 17, 2023, 6:16pm

I have found the solution…

def evaluate_model(model, X_train, y_train, X_test, y_test):
    # fit the model
    model.fit(X_train, y_train)

    # print the parameters
    print(model.get_params(), end="\n\n")

    # predict the values using training data
    train_pred = model.predict(X_train)

    # evaluate using training data
    train_mae = mean_absolute_error(y_train, train_pred)
    train_rmse = np.sqrt(mean_squared_error(y_train, train_pred))

    # print the results of the training data
    print("Results of the training data\n")
    print("Mean Absolute Error: {:.2f}".format(train_mae))
    print("Root Mean Squared Error: {:.2f}\n".format(train_rmse))

    # visualize training data
    fig_train = px.scatter(x=y_train, y=train_pred,
                            labels={'x': 'Actual Values', 'y': 'Predicted Values'},
                            title='Visualization of Actual Data vs. Prediction of Training Data')
    fig_train.add_scatter(x=y_train, y=y_train, mode='lines', line=dict(color='#e6981c', width=4))
    fig_train.show()

    # predict the values using testing data
    test_pred = model.predict(X_test)

    # evaluate using testing data
    test_mae = mean_absolute_error(y_test, test_pred)
    test_rmse = np.sqrt(mean_squared_error(y_test, test_pred))

    # print the results of the testing data
    print("Results of the testing data\n")
    print("Mean Absolute Error: {:.2f}".format(test_mae))
    print("Root Mean Squared Error: {:.2f}\n".format(test_rmse))

    # visualize testing data
    fig_test = px.scatter(x=y_test, y=test_pred,
                            labels={'x': 'Actual Values', 'y': 'Predicted Values'},
                            title='Visualization of Actual Data vs. Prediction of Testing Data')
    fig_test.add_scatter(x=y_test, y=y_test, mode='lines', line=dict(color='#1ce658', width=4))
    fig_test.show()
    
    return test_rmse, test_mae

then

lr = LinearRegression()
lr_testing_rmse, lr_testing_mae = evaluate_model(lr, X_train, y_train, X_test, y_test)

then

dtr = DecisionTreeRegressor()
dtr_testing_rmse, dtr_testing_mae = evaluate_model(dtr, X_train, y_train, X_test, y_test)

then

rf = RandomForestRegressor()
rf_testing_rmse, rf_testing_mae = evaluate_model(rf, X_train, y_train, X_test, y_test)

finally…

models = pd.DataFrame({
    'Model': ['Linear Regression', 'Random Forest Regression', "DecisionTreeRegressor"],
    'RMSE': [lr_testing_rmse, rf_testing_rmse, dtr_testing_rmse], 
    'MAE' : [lr_testing_mae, rf_testing_mae, dtr_testing_mae]
})
models.style.background_gradient(cmap='Blues')

tjreedy · March 17, 2023, 6:18pm

RMSE = root mean square error (fairly standard); MAE = mean absolute error (more obscure, I had to look it up).

Glad you found the answer you wanted because it was unclear what ballpark you were considering and what deail you were missing.