Sample Weights in Random Survival Forest

Claire · January 18, 2024, 10:41am

I am trying to implement a random survival forest with sample weights. However, it appears as if the function does not do anything with the sample weights I give as input. I have tried fitting the function without and with sample weights of different values, but I keep getting the exact same wrong predictions. My code looks something like this:

pip install scikit-survival
from sklearn.metrics import classification_report 
from sklearn.model_selection import train_test_split 
from sksurv.ensemble import RandomSurvivalForest

trainWeight = X_train["Weight"].to_numpy()
y_train = y_train[["Event", "Time"]].to_records(index = False)

# Create and fit the Random Survival Forest model
random_state = 20
rsf = RandomSurvivalForest(n_estimators=10, min_samples_split=10, min_samples_leaf=15, n_jobs=-1, random_state=random_state, max_features="sqrt", max_depth=10)
rsf.fit(X_train[bestFeatures2], y_train, trainWeight)

# Check if logical
survivalTrain = rsf.predict_survival_function(X_train[bestFeatures2], return_array=True)
survivalTrain = pd.DataFrame(data = survivalTrain)

The data looks something like this:
Screenshot 2024-01-18 113834

Is there something I am doing wrong, or is there a workaround?

Thanks in advance!

kyle · January 18, 2024, 2:36pm

Hey claire are you using the most recent version of scikit-survival lib… from the documentation I came across the following below… so I don’t know if an update would come in handy

### Bug fixes

* Fix bug where times passed to [`sksurv.metrics.brier_score()`](https://scikit-survival.readthedocs.io/en/stable/api/generated/sksurv.metrics.brier_score.html#sksurv.metrics.brier_score) was downcast, resulting in a loss of precision that may lead to duplicate time points ([#349](https://github.com/sebp/scikit-survival/issues/349)).
* Fix inconsistent behavior of evaluating functions returned by predict_cumulative_hazard_function or predict_survival_function ([#375](https://github.com/sebp/scikit-survival/issues/375)).

Claire · January 19, 2024, 6:25pm

Hi Michael, thanks for your reply. I am just not sure this is what I am looking for. I read somewhere that the 2022 version of scikit-survival cannot incorporate the sample weights yet. Anyone perhaps encountered the same problem and managed to find a solution?

Thanks!

Topic		Replies	Views
Help a noob out Python Help help	8	1000	July 10, 2023
Why I could not pass the MLPClassifier to cross_val_predict? Python Help help	1	668	August 7, 2021
ValueError: Found input variables with inconsistent numbers of samples: [10, 1] Python Help help	0	3383	July 28, 2021
How to save prediction result from a ML model (SVM, kNN) using sklearn Python Help	3	2612	December 2, 2020
Issue in RECURRENT NEURAL NETWORK Python Help help	0	615	December 8, 2020

Sample Weights in Random Survival Forest

Related Topics