AdaptiveRandomForestClassifier#

class capymoa.classifier.AdaptiveRandomForestClassifier[source]#

Bases: MOAClassifier

Adaptive Random Forest Classifier

This class implements the Adaptive Random Forest (ARF) algorithm, which is an ensemble classifier capable of adapting to concept drift.

ARF is implemented in MOA (Massive Online Analysis) and provides several parameters for customization.

Reference:

Adaptive random forests for evolving data stream classification. Heitor Murilo Gomes, A. Bifet, J. Read, …, B. Pfahringer, G. Holmes, T. Abdessalem. Machine Learning, 106, 1469-1495, 2017.

See also capymoa.regressor.AdaptiveRandomForestRegressor See capymoa.base.MOAClassifier for train, predict and predict_proba.

Example usage:

>>> from capymoa.datasets import ElectricityTiny
>>> from capymoa.classifier import AdaptiveRandomForestClassifier
>>> from capymoa.evaluation import prequential_evaluation
>>> stream = ElectricityTiny()
>>> schema = stream.get_schema()
>>> learner = AdaptiveRandomForestClassifier(schema)
>>> results = prequential_evaluation(stream, learner, max_instances=1000)
>>> results["cumulative"].accuracy()
87.9
__init__(
schema=None,
CLI=None,
random_seed=1,
base_learner=None,
ensemble_size=100,
max_features=0.6,
lambda_param=6.0,
minibatch_size=None,
number_of_jobs=1,
drift_detection_method=None,
warning_detection_method=None,
disable_weighted_vote=False,
disable_drift_detection=False,
disable_background_learner=False,
)[source]#

Construct an Adaptive Random Forest Classifier

Parameters:
  • schema – The schema of the stream. If not provided, it will be inferred from the data.

  • CLI – Command Line Interface (CLI) options for configuring the ARF algorithm. If not provided, default options will be used.

  • random_seed – Seed for the random number generator.

  • base_learner – The base learner to use. If not provided, a default Hoeffding Tree is used.

  • ensemble_size – The number of trees in the ensemble.

  • max_features – The maximum number of features to consider when splitting a node. If provided as a float between 0.0 and 1.0, it represents the percentage of features to consider. If provided as an integer, it specifies the exact number of features to consider. If provided as the string “sqrt”, it indicates that the square root of the total number of features. If not provided, the default value is 60%.

  • lambda_param – The lambda parameter that controls the Poisson distribution for the online bagging simulation.

  • minibatch_size – The number of instances that a learner must accumulate before training.

  • number_of_jobs – The number of parallel jobs to run during the execution of the algorithm. By default, the algorithm executes tasks sequentially (i.e., with number_of_jobs=1). Increasing the number_of_jobs can lead to faster execution on multi-core systems. However, setting it to a high value may consume more system resources and memory. This implementation is designed to be embarrassingly parallel, meaning that the algorithm’s computations can be efficiently distributed across multiple processing units without sacrificing predictive performance. It’s recommended to experiment with different values to find the optimal setting based on the available hardware resources and the nature of the workload.

  • drift_detection_method – The method used for drift detection.

  • warning_detection_method – The method used for warning detection.

  • disable_weighted_vote – Whether to disable weighted voting.

  • disable_drift_detection – Whether to disable drift detection.

  • disable_background_learner – Whether to disable background learning.

CLI_help()[source]#
predict(instance)[source]#
predict_proba(instance)[source]#
train(instance)[source]#