CSMOTE#

class capymoa.classifier.CSMOTE[source]#

Bases: MOAClassifier

Continuous Synthetic Minority Oversampling Technique.

Continuous Synthetic Minority Oversampling Technique (C-SMOTE) [1] is a meta-strategy. This strategy saves all the minority samples in a window managed by ADWIN. Meanwhile, a model is trained with the input data. When the minority sample ratio falls below a certain threshold, an online version of SMOTE is applied. A random minority sample is chosen from the window, and a new synthetic sample is generated until the minority sample ratio is greater than or equal to the threshold. The model is then trained with the newly generated samples.

>>> from capymoa.classifier import CSMOTE
>>> from capymoa.datasets import ElectricityTiny
>>> from capymoa.evaluation import prequential_evaluation
>>>
>>> stream = ElectricityTiny()
>>> classifier = CSMOTE(stream.get_schema())
>>> results = prequential_evaluation(stream, classifier, max_instances=1000)
>>> print(f"{results['cumulative'].accuracy():.1f}")
83.1

__init__( schema: Schema = None, random_seed: int = 0, base_learner='trees.HoeffdingTree', neighbors: int = 10, threshold: float = 0.5, min_size_allowed: int = 100, disable_drift_detection: bool = False, )[source]#

Construct C-SMOTE.

Parameters:

schema – The schema of the stream.
random_seed – The random seed passed to the MOA learner.
base_learner – The base learner to be trained. Default AdaptiveRandomForestClassifier.
neighbors – Number of neighbors for SMOTE.
threshold – Minority class samples threshold.
min_size_allowed – Minimum number of samples in the minority class for applying SMOTE.
disable_drift_detection – If set, disables ADWIN drift detector

cli_help()[source]#

predict(instance: Instance) → int | None[source]#

Predict the label of an instance.

The base implementation calls predict_proba() and returns the label with the highest probability.

Parameters:: instance – The instance to predict the label for.
Returns:: The predicted label or None if the classifier is unable to make a prediction.

predict_proba( instance, ) → ndarray[tuple[Any, ...], dtype[float64]] | None[source]#

Return probability estimates for each label.

Parameters:: instance – The instance to estimate the probabilities for.
Returns:: An array of probabilities for each label or None if the classifier is unable to make a prediction.

train(instance)[source]#

Train the classifier with a labeled instance.

Parameters:: instance – The labeled instance to train the classifier with.

random_seed: int#

The random seed for reproducibility.

When implementing a classifier ensure random number generators are seeded.

schema: Schema#: The schema representing the instances.