Why I could not pass the MLPClassifier to cross_val_predict?

henryjhu · August 6, 2021, 10:27pm

Please help! Thank you!
I am building an ensemble learner of Random Forest and Artificial Neuro Networks.
The function seems to be able to pass Random Forest to cross_val_predict fine.
However, there are errors with passing the MLPClassifier to cross_val_predict.
Why my function produces those errors?
What do the errors mean?

Code

def cross_val_model(model, X, Y, n_splits_n=25, n_repeats_n=5, random_state_n=42):
    np.random.seed(33) 
    k = RepeatedStratifiedKFold(n_splits=n_splits_n, n_repeats=n_repeats_n, random_state=random_state_n)
    y_preds = cross_val_predict(model, X=X, y=Y, cv=k, n_jobs=-1, method='predict', verbose=1)
    return y_preds

def ensemble_fun (X_train, X_test, Y_train, hidden_layer_sizes_n=50, max_iter_n=150, n_estimators_n=100, cv_n=10, max_depth_n=25,random_state_ann=42, random_state_rf=42):
  
  # Scale and center the data around the mean of 0
  # scaling=StandardScaler()

  # Initialize the log variable
  class log:
    def_tz = pytz.timezone('America/New_York')
    def info(text):        
        print(f'{datetime.now(log.def_tz).replace(microsecond=0)} : {text}');  
  
  # Enumerate list of estimators
  estimator_list = {
    'ann': MLPClassifier( solver='lbfgs', hidden_layer_sizes=hidden_layer_sizes_n,
                          max_iter=max_iter_n, shuffle=True, random_state=random_state_ann, activation='logistic'),
    'rf': RandomForestClassifier( max_depth=max_depth_n, n_estimators=n_estimators_n, random_state=random_state_rf, n_jobs=-1 ) 
  }

  clf = StackingClassifier(estimators=estimator_list, final_estimator=LogisticRegression(), cv=cv_n)

  stacking_train_dataset = np.zeros([X_train.shape[0], len(estimator_list)]) 
  stacking_test_dataset = np.zeros([X_train.shape[0], len(estimator_list)]) 

  for i, base_algorithm in enumerate(estimator_list):
    stacking_train_dataset[:,i] = cross_val_model(base_algorithm, X_train, Y_train, n_splits_n=25, n_repeats_n=5, random_state_n=42)
    stacking_test_dataset[:,i] = base_algorithm.predict(X_test)

  final_predictions = clf.fit(stacking_train_dataset, Y_train).predict(stacking_test_dataset)

  print(f'Accuracy: {metrics.accuracy_score(stacking_test_dataset, final_predictions)}'); 

  return final_predictions

Calling function ensemble_fun

ensemble_fun (NN_103_score_df_x_train, NN_103_score_df_x_test, NN_103_score_df_y_train, hidden_layer_sizes_n=50, max_iter_n=150, n_estimators_n=100, cv_n=10, max_depth_n=25,random_state_ann=42, random_state_rf=42)

Error Message

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
---------------------------------------------------------------------------
Empty                                     Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/joblib/parallel.py in dispatch_one_batch(self, iterator)
    819             try:
--> 820                 tasks = self._ready_batches.get(block=False)
    821             except queue.Empty:

8 frames
Empty: 

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/sklearn/base.py in clone(estimator, safe)
     65                             "it does not seem to be a scikit-learn estimator "
     66                             "as it does not implement a 'get_params' methods."
---> 67                             % (repr(estimator), type(estimator)))
     68     klass = estimator.__class__
     69     new_object_params = estimator.get_params(deep=False)

TypeError: Cannot clone object ''ann'' (type <class 'str'>): it does not seem to be a scikit-learn estimator as it does not implement a 'get_params' methods.

steven.daprano · August 7, 2021, 2:13am

Hi Henry, and welcome!

I don’t think that your post has anything to do with core development of
Python. It seems to be related to numpy, and scikit-learn, and perhaps
some other third-party libraries. Where do RepeatedStratifiedKFold,
RandomForestClassifier etc come from?

Perhaps you can help us improve the Discuss experience. Is there
something we can do to make it more clear which discussion group is
appropriate here?

https://discuss.python.org/

Unfortunately, knowledge of scikit-learn is a fairly niche area. You
might be lucky to find experts on it here, but it may help you to also
post your question to some specific scikit-learn forums, which are
listed towards the bottom of the scikit-learn project pages:

https://scikit-learn.org/stable/index.html

If you do get an answer elsewhere, please consider posting it in this
thread so that others who experience the same issue can find it.