Why MissForest algorithm in Python imputes negative values?

_Huzaifa · June 18, 2021, 8:03am

I ran MissForest algorithm and it imputed negative values at some places. Why it happened?

steven.daprano · June 18, 2021, 9:36am

I can think of at least five reasons:

Because your code is buggy.
Because the MissForest library is buggy.
Because your data is invalid.
Because you used the library wrong.
Because negative values are correct.

Why do you think that negative values are incorrect?

_Huzaifa · June 18, 2021, 9:55am

My data is of Hospital. DBP (Diastolic Blood Pressure) column in my data had missing values. Imputing through MissForest gave negative values. This blood pressure can’t be negative.

_Huzaifa · June 18, 2021, 10:40am

What do you mean?

steven.daprano · June 18, 2021, 11:00am

Okay. That still leaves four alternatives – bugs in the library, bugs
in your code, invalid data, misuse of the library.

We don’t know which MissForest library you are using. We don’t know how
you are using it. We don’t have access to the data, or your code. What
sort of answer do you hope to get from us under those restrictions?

These may help:

http://www.sscce.org/

Have you read the documentation for the library? Does it give any
warnings or hints?

If you cut the data in half, do the negative values go away?

Do all the missing values become negative, or only some?

Can you control what the missing values are initially set to? For
example, if you are using the mean, try changing it to the median.

Can you change the number of imputation iterations and see if that
changes the result?