The ML algorithm examines a dataset consisting of the geometric parameters of a metallic object in the first three columns. The fourth column indicates a coefficient related to the object’s speed. The first four columns form a dataframe X, and the last column highlights an efficiency coefficient, which represents the response, denoted by y.

| a | b | c | 0 | r1

| — | — |

| a | b | c | 0.1 | r2

| a | b | c |0.2 | r3

| e | f | g | 0 | s1

| e | f | g | 0.1 | s2

| e | f | g |0.2 | s3

The initial dataframe is split using the following Python comma

`X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)`

To train the model using linear regression

`model = LinearRegression()`

The values of`X_test`

are predicted:

`y_pred = model.predict(X_test)`

Then the model’s performance is evaluated: MAE + coefficient of determination, with very good results, the error corresponds to 3% and the coefficient of determination is 0.99346.

The problem arises when I create a new matrix to predict values on, that is, I present it with a matrix that the algorithm has never seen before, which respects the range of values of the training parameters.

| m | n | p | 0 |

| — | — |

| m | n | p | 0.1 |

| m | n | p | 0.2 |

The prediction error is very high and the coefficient of determination is very low.

I cannot understand what the problem is.