I’m trying to detect some anomalies in my data (nX2) , containing only dates and values, right now im using 2 methods, ‘IsolationForest’ and ‘KNN’. in both methods some data points are real different from the neighbors, the function dont assign that as anomaly. . see picture
As you can see there is a value in the data ‘593.xxx’
That is more than 10x bigger than the near neighbours)
‘o’ in the dataframe means no anomaly
playing around with the parameters gives different results but dont solve the problem as in the picture.
The goal will be to just detect large deviation from neighbors points in the data. im using the code below any help with that will be great or other suggestions
from sklearn.ensemble import IsolationForest from pyod.models.knn import KNN def fit_model(model, data, column='NO3_conc'): # fit the model and predict it df = data.copy() data_to_predict = data[column].to_numpy().reshape(-1, 1) predictions = model.fit_predict(data_to_predict) df['Predictions'] = predictions knn_model = KNN(contamination=0.1, n_neighbors=5, method='median', radius=1.0, algorithm='auto', leaf_size=30, metric='minkowski', p=2, metric_params=None, n_jobs=1) knn_df = fit_model(knn_model, data) plot_anomalies(knn_df, 'KNN model') iso_forest = IsolationForest(n_estimators=125, max_samples='auto', contamination=0.05, max_features=1.0, bootstrap=False, n_jobs=None, random_state=None, verbose=0, warm_start=False) iso_df = fit_model(iso_forest, data) iso_df['Predictions'] = iso_df['Predictions'].map(lambda x: 1 if x == -1 else 0) plot_anomalies(iso_df,'iso forest model')