OnlineAdwinBagging#

class capymoa.classifier.OnlineAdwinBagging[source]#

Bases: MOAClassifier

Bagging for evolving data streams using ADWIN.

Bagging for evolving data streams using ADWIN [1] [2] is an ensemble classifier. ADWIN is a change detector and estimator that solves in a well-specified way the problem of tracking the average of a stream of bits or real-valued numbers. ADWIN keeps a variable-length window of recently seen items, with the property that the window has the maximal length statistically consistent with the hypothesis “there has been no change in the average value inside the window”.

More precisely, an older fragment of the window is dropped if and only if there is enough evidence that its average value differs from that of the rest of the window. This has two consequences: one, that change reliably declared whenever the window shrinks; and two, that at any time the average over the existing window can be reliably taken as an estimation of the current average in the stream (barring a very small or very recent change that is still not statistically visible). A formal and quantitative statement of these two points (a theorem) appears in the reference paper.

ADWIN is parameter- and assumption-free in the sense that it automatically detects and adapts to the current rate of change. Its only parameter is a confidence bound δ, indicating how confident we want to be in the algorithm’s output, inherent to all algorithms dealing with random processes. Also important, ADWIN does not maintain the window explicitly, but compresses it using a variant of the exponential histogram technique. This means that it keeps a window of length W using only O(log W) memory and O(log W) processing time per item.

ADWIN Bagging is the online bagging method of Oza and Rusell with the addition of the ADWIN algorithm as a change detector and as an estimator for the weights of the boosting method. When a change is detected, the worst classifier of the ensemble of classifier is removed and a new classifier is added to the ensemble.

>>> from capymoa.classifier import OnlineAdwinBagging
>>> from capymoa.datasets import ElectricityTiny
>>> from capymoa.evaluation import prequential_evaluation
>>>
>>> stream = ElectricityTiny()
>>> classifier = OnlineAdwinBagging(stream.get_schema())
>>> results = prequential_evaluation(stream, classifier, max_instances=1000)
>>> print(f"{results['cumulative'].accuracy():.1f}")
85.3
__init__(
schema=None,
CLI=None,
random_seed=1,
base_learner=None,
ensemble_size=100,
minibatch_size=None,
number_of_jobs=None,
)[source]#

Constructor for online ADWIN bagging.

Parameters:
  • schema – The schema of the stream. If not provided, it will be inferred from the data.

  • CLI – Command Line Interface (CLI) options for configuring the ARF algorithm. If not provided, default options will be used.

  • random_seed – Seed for the random number generator.

  • base_learner – The base learner to use. If not provided, a default Hoeffding Tree is used.

  • ensemble_size – The number of trees in the ensemble.

cli_help()[source]#
predict(instance)[source]#

Predict the label of an instance.

The base implementation calls predict_proba() and returns the label with the highest probability.

Parameters:

instance – The instance to predict the label for.

Returns:

The predicted label or None if the classifier is unable to make a prediction.

predict_proba(instance)[source]#

Return probability estimates for each label.

Parameters:

instance – The instance to estimate the probabilities for.

Returns:

An array of probabilities for each label or None if the classifier is unable to make a prediction.

train(instance)[source]#

Train the classifier with a labeled instance.

Parameters:

instance – The labeled instance to train the classifier with.

random_seed: int#

The random seed for reproducibility.

When implementing a classifier ensure random number generators are seeded.

schema: Schema#

The schema representing the instances.