[Feature] WithoutLiersCV model selection#595
Conversation
| cv = WithoutLiersCV( | ||
| cv=KFold(n_splits=3), | ||
| anomalous_label=1 | ||
| ) |
There was a problem hiding this comment.
I think I'd want @MBrouns to weight in on the name 😅 just to make sure.
But I'm also wondering if it's perhaps easier to the enduser to not require an anomalous label ... wouldn't it perhaps be better to pass in an outlier model? this outlier model could then internally train on X and determine which items are outliers. Or am I overthinking?
There was a problem hiding this comment.
From the conversation in the issue my understanding is slightly different. The goal of the CV is to validate anomaly detectors that do not train with different labels, namely the novelty detection ones. Therefore passing a novelty detection model would not be possible in the first place.
Now I agree that the name would suit both implementations 😁
There was a problem hiding this comment.
@koaning Potentially we could have two CV strategies:
WithoutLiersCV: takes any outlier detection model, train onX, and excludes outliers fromtrain_indexesNoveltyDetectorCV: what's in this PR, to be able to train a novelty detection algorithm on non-anomalous labels and evaluate on both anomalous and not.
Description
Introduces
WithoutLiersCVas discussed in #307. To be able to follow different cross validation strategies, the idea is to take a CV object as input and exclude the anomalous samples from the training indexes. All the splitting logic is delegated to the cv object.Type of change
Checklist: