CSMOTE#
- class capymoa.classifier.CSMOTE[source]#
Bases:
MOAClassifier
CSMOTE
This strategy saves all the minority samples in a window managed by ADWIN. Meanwhile, a model is trained with the input data. When the minority sample ratio falls below a certain threshold, an online version of SMOTE is applied. A random minority sample is chosen from the window, and a new synthetic sample is generated until the minority sample ratio is greater than or equal to the threshold. The model is then trained with the newly generated samples.
Example usages:
>>> from capymoa.datasets import ElectricityTiny >>> from capymoa.classifier import CSMOTE >>> from capymoa.evaluation import prequential_evaluation >>> stream = ElectricityTiny() >>> schema = stream.get_schema() >>> learner = CSMOTE(schema) >>> results = prequential_evaluation(stream, learner, max_instances=1000) >>> results["cumulative"].accuracy() 83.1
- __init__(
- schema: Schema = None,
- random_seed: int = 0,
- base_learner='trees.HoeffdingTree',
- neighbors: int = 10,
- threshold: float = 0.5,
- min_size_allowed: int = 100,
- disable_drift_detection: bool = False,
Continuous Synthetic Minority Oversampling (C-SMOTE) by Bernardo et al.
- Parameters:
schema – The schema of the stream.
random_seed – The random seed passed to the MOA learner.
base_learner – The base learner to be trained. Default AdaptiveRandomForestClassifier.
neighbors – Number of neighbors for SMOTE.
threshold – Minority class samples threshold.
min_size_allowed – Minimum number of samples in the minority class for applying SMOTE.
disable_drift_detection – If set, disables ADWIN drift detector
- predict(instance)[source]#
Predict the label of an instance.
The base implementation calls
predict_proba()
and returns the label with the highest probability.- Parameters:
instance – The instance to predict the label for.
- Returns:
The predicted label or
None
if the classifier is unable to make a prediction.
- predict_proba(instance)[source]#
Return probability estimates for each label.
- Parameters:
instance – The instance to estimate the probabilities for.
- Returns:
An array of probabilities for each label or
None
if the classifier is unable to make a prediction.
- train(instance)[source]#
Train the classifier with a labeled instance.
- Parameters:
instance – The labeled instance to train the classifier with.
- random_seed: int#
The random seed for reproducibility.
When implementing a classifier ensure random number generators are seeded.