OnlineBagging#

class capymoa.classifier.OnlineBagging[source]#

Bases: MOAClassifier

Incremental on-line bagging of Oza and Russell

Oza and Russell developed online versions of bagging and boosting for Data Streams.

They show how the process of sampling bootstrap replicates from training data can be simulated in a data stream context.

They observe that the probability that any individual example will be chosen for a replicate tends to a Poisson(1) distribution.

Reference: [OR] N. Oza and S. Russell. Online bagging and boosting. In Artificial Intelligence and Statistics 2001, pages 105–112. Morgan Kaufmann, 2001.

__init__(
schema=None,
CLI=None,
random_seed=1,
base_learner=None,
ensemble_size=100,
minibatch_size=None,
number_of_jobs=None,
)[source]#

Construct an Online bagging classifier using online bootstrap sampling.

Parameters:
  • schema – The schema of the stream. If not provided, it will be inferred from the data.

  • CLI – Command Line Interface (CLI) options for configuring the ARF algorithm. If not provided, default options will be used.

  • random_seed – Seed for the random number generator.

  • base_learner – The base learner to use. If not provided, a default Hoeffding Tree is used.

  • ensemble_size – The number of trees in the ensemble.

  • minibatch_size – The number of instances that a learner must accumulate before training.

  • number_of_jobs – The number of parallel jobs to run during the execution of the algorithm. By default, the algorithm executes tasks sequentially (i.e., with number_of_jobs=1). Increasing the number_of_jobs can lead to faster execution on multi-core systems. However, setting it to a high value may consume more system resources and memory. This implementation focuses more on performance, therefore the predictive performance is modified. It’s recommended to experiment with different values to find the optimal setting based on

CLI_help()[source]#
predict(instance)[source]#

Predict the label of an instance.

The base implementation calls predict_proba() and returns the label with the highest probability.

Parameters:

instance – The instance to predict the label for.

Returns:

The predicted label or None if the classifier is unable to make a prediction.

predict_proba(instance)[source]#

Return probability estimates for each label.

Parameters:

instance – The instance to estimate the probabilities for.

Returns:

An array of probabilities for each label or None if the classifier is unable to make a prediction.

train(instance)[source]#

Train the classifier with a labeled instance.

Parameters:

instance – The labeled instance to train the classifier with.

random_seed: int#

The random seed for reproducibility.

When implementing a classifier ensure random number generators are seeded.

schema: Schema#

The schema representing the instances.