OnlineBagging#

class capymoa.classifier.OnlineBagging[source]#

Bases: MOAClassifier

Incremental on-line bagging of Oza and Russell

Oza and Russell developed online versions of bagging and boosting for Data Streams.

They show how the process of sampling bootstrap replicates from training data can be simulated in a data stream context.

They observe that the probability that any individual example will be chosen for a replicate tends to a Poisson(1) distribution.

Reference: [OR] N. Oza and S. Russell. Online bagging and boosting. In Artificial Intelligence and Statistics 2001, pages 105–112. Morgan Kaufmann, 2001.

__init__(
schema=None,
CLI=None,
random_seed=1,
base_learner=None,
ensemble_size=100,
minibatch_size=None,
number_of_jobs=None,
)[source]#

Construct an Online bagging classifier using online bootstrap sampling.

Parameters:
  • schema – The schema of the stream. If not provided, it will be inferred from the data.

  • CLI – Command Line Interface (CLI) options for configuring the ARF algorithm. If not provided, default options will be used.

  • random_seed – Seed for the random number generator.

  • base_learner – The base learner to use. If not provided, a default Hoeffding Tree is used.

  • ensemble_size – The number of trees in the ensemble.

  • minibatch_size – The number of instances that a learner must accumulate before training.

  • number_of_jobs – The number of parallel jobs to run during the execution of the algorithm. By default, the algorithm executes tasks sequentially (i.e., with number_of_jobs=1). Increasing the number_of_jobs can lead to faster execution on multi-core systems. However, setting it to a high value may consume more system resources and memory. This implementation focuses more on performance, therefore the predictive performance is modified. It’s recommended to experiment with different values to find the optimal setting based on

CLI_help()[source]#
predict(instance)[source]#
predict_proba(instance)[source]#
train(instance)[source]#