Classifiers#

Classifiers implement the capymoa.base.Classifier interface.

class capymoa.classifier.AdaptiveRandomForestClassifier[source]#

Bases: MOAClassifier

Adaptive Random Forest Classifier

This class implements the Adaptive Random Forest (ARF) algorithm, which is an ensemble classifier capable of adapting to concept drift.

ARF is implemented in MOA (Massive Online Analysis) and provides several parameters for customization.

Reference:

Adaptive random forests for evolving data stream classification. Heitor Murilo Gomes, A. Bifet, J. Read, …, B. Pfahringer, G. Holmes, T. Abdessalem. Machine Learning, 106, 1469-1495, 2017.

See also capymoa.regressor.AdaptiveRandomForestRegressor See capymoa.base.MOAClassifier for train, predict and predict_proba.

Example usage:

>>> from capymoa.datasets import ElectricityTiny
>>> from capymoa.classifier import AdaptiveRandomForestClassifier
>>> from capymoa.evaluation import prequential_evaluation
>>> stream = ElectricityTiny()
>>> schema = stream.get_schema()
>>> learner = AdaptiveRandomForestClassifier(schema)
>>> results = prequential_evaluation(stream, learner, max_instances=1000)
>>> results["cumulative"].accuracy()
87.9
__init__(
schema=None,
CLI=None,
random_seed=1,
base_learner=None,
ensemble_size=100,
max_features=0.6,
lambda_param=6.0,
number_of_jobs=1,
drift_detection_method=None,
warning_detection_method=None,
disable_weighted_vote=False,
disable_drift_detection=False,
disable_background_learner=False,
)[source]#

Construct an Adaptive Random Forest Classifier

Parameters:
  • schema – The schema of the stream. If not provided, it will be inferred from the data.

  • CLI – Command Line Interface (CLI) options for configuring the ARF algorithm. If not provided, default options will be used.

  • random_seed – Seed for the random number generator.

  • base_learner – The base learner to use. If not provided, a default Hoeffding Tree is used.

  • ensemble_size – The number of trees in the ensemble.

  • max_features – The maximum number of features to consider when splitting a node. If provided as a float between 0.0 and 1.0, it represents the percentage of features to consider. If provided as an integer, it specifies the exact number of features to consider. If provided as the string “sqrt”, it indicates that the square root of the total number of features. If not provided, the default value is 60%.

  • lambda_param – The lambda parameter that controls the Poisson distribution for the online bagging simulation.

  • number_of_jobs – The number of parallel jobs to run during the execution of the algorithm. By default, the algorithm executes tasks sequentially (i.e., with number_of_jobs=1). Increasing the number_of_jobs can lead to faster execution on multi-core systems. However, setting it to a high value may consume more system resources and memory. This implementation is designed to be embarrassingly parallel, meaning that the algorithm’s computations can be efficiently distributed across multiple processing units without sacrificing predictive performance. It’s recommended to experiment with different values to find the optimal setting based on the available hardware resources and the nature of the workload.

  • drift_detection_method – The method used for drift detection.

  • warning_detection_method – The method used for warning detection.

  • disable_weighted_vote – Whether to disable weighted voting.

  • disable_drift_detection – Whether to disable drift detection.

  • disable_background_learner – Whether to disable background learning.

class capymoa.classifier.EFDT[source]#

Bases: MOAClassifier

Extremely Fast Decision Tree (EFDT) Classifier

Also referred to as the Hoeffding AnyTime Tree (HATT) classifier. In practice, despite the name, EFDTs are typically slower than a vanilla Hoeffding Tree to process data. The speed differences come from the mechanism of split re-evaluation present in EFDT. Nonetheless, EFDT has theoretical properties that ensure it converges faster than the vanilla Hoeffding Tree to the structure that would be created by a batch decision tree model (such as Classification and Regression Trees - CART). Keep in mind that such propositions hold when processing a stationary data stream. When dealing with non-stationary data, EFDT is somewhat robust to concept drifts as it continually revisits and updates its internal decision tree structure. Still, in such cases, the Hoeffding Adaptive Tree might be a better option, as it was specifically designed to handle non-stationarity.

Reference:

Extremely fast decision tree. Manapragada, Chaitanya, G. I. Webb, M. Salehi. ACM SIGKDD, pp. 1953-1962, 2018.

Example usage:

>>> from capymoa.datasets import ElectricityTiny
>>> from capymoa.classifier import EFDT
>>> from capymoa.evaluation import prequential_evaluation
>>> stream = ElectricityTiny()
>>> schema = stream.get_schema()
>>> learner = EFDT(schema)
>>> results = prequential_evaluation(stream, learner, max_instances=1000)
>>> results["cumulative"].accuracy()
84.39999999999999
__init__(
schema: Schema | None = None,
random_seed: int = 0,
grace_period: int = 200,
min_samples_reevaluate: int = 200,
split_criterion: str | SplitCriterion = 'InfoGainSplitCriterion',
confidence: float = 0.001,
tie_threshold: float = 0.05,
leaf_prediction: str = 'NaiveBayesAdaptive',
nb_threshold: int = 0,
numeric_attribute_observer: str = 'GaussianNumericAttributeClassObserver',
binary_split: bool = False,
max_byte_size: float = 33554433,
memory_estimate_period: int = 1000000,
stop_mem_management: bool = True,
remove_poor_attrs: bool = False,
disable_prepruning: bool = True,
)[source]#

Construct an Extremely Fast Decision Tree (EFDT) Classifier

Parameters:
  • schema – The schema of the stream.

  • random_seed – The random seed passed to the MOA learner.

  • grace_period – Number of instances a leaf should observe between split attempts.

  • min_samples_reevaluate – Number of instances a node should observe before re-evaluating the best split.

  • split_criterion – Split criterion to use. Defaults to InfoGainSplitCriterion.

  • confidence – Significance level to calculate the Hoeffding bound. The significance level is given by 1 - delta. Values closer to zero imply longer split decision delays.

  • tie_threshold – Threshold below which a split will be forced to break ties.

  • leaf_prediction – Prediction mechanism used at the leaves (“MajorityClass” or 0, “NaiveBayes” or 1, “NaiveBayesAdaptive” or 2).

  • nb_threshold – Number of instances a leaf should observe before allowing Naive Bayes.

  • numeric_attribute_observer – The Splitter or Attribute Observer (AO) used to monitor the class statistics of numeric features and perform splits.

  • binary_split – If True, only allow binary splits.

  • max_byte_size – The max size of the tree, in bytes.

  • memory_estimate_period – Interval (number of processed instances) between memory consumption checks.

  • stop_mem_management – If True, stop growing as soon as memory limit is hit.

  • remove_poor_attrs – If True, disable poor attributes to reduce memory usage.

  • disable_prepruning – If True, disable merit-based tree pre-pruning.

class capymoa.classifier.HoeffdingTree[source]#

Bases: MOAClassifier

Hoeffding Tree classifier.

Parameters#

schema

The schema of the stream

random_seed

The random seed passed to the moa learner

grace_period

Number of instances a leaf should observe between split attempts.

split_criterion

Split criterion to use. Defaults to InfoGainSplitCriterion

confidence

Significance level to calculate the Hoeffding bound. The significance level is given by 1 - delta. Values closer to zero imply longer split decision delays.

tie_threshold

Threshold below which a split will be forced to break ties.

leaf_prediction

Prediction mechanism used at leafs.</br> - 0 - Majority Class</br> - 1 - Naive Bayes</br> - 2 - Naive Bayes Adaptive</br>

nb_threshold

Number of instances a leaf should observe before allowing Naive Bayes.

numeric_attribute_observer

The Splitter or Attribute Observer (AO) used to monitor the class statistics of numeric features and perform splits.

binary_split

If True, only allow binary splits.

max_byte_size

The max size of the tree, in bytes.

memory_estimate_period

Interval (number of processed instances) between memory consumption checks.

stop_mem_management

If True, stop growing as soon as memory limit is hit.

remove_poor_attrs

If True, disable poor attributes to reduce memory usage.

disable_prepruning

If True, disable merit-based tree pre-pruning.

__init__(
schema: Schema | None = None,
random_seed: int = 0,
grace_period: int = 200,
split_criterion: str | SplitCriterion = 'InfoGainSplitCriterion',
confidence: float = 0.001,
tie_threshold: float = 0.05,
leaf_prediction: int = 'NaiveBayesAdaptive',
nb_threshold: int = 0,
numeric_attribute_observer: str = 'GaussianNumericAttributeClassObserver',
binary_split: bool = False,
max_byte_size: float = 33554433,
memory_estimate_period: int = 1000000,
stop_mem_management: bool = True,
remove_poor_attrs: bool = False,
disable_prepruning: bool = True,
)[source]#
class capymoa.classifier.NaiveBayes[source]#

Bases: MOAClassifier

Naive Bayes incremental learner. Performs classic Bayesian prediction while making the naive assumption that all inputs are independent. Naive Bayes is a classifier algorithm known for its simplicity and low computational cost. Given n different classes, the trained Naive Bayes classifier predicts, for every unlabeled instance I, the class C to which it belongs with high accuracy.

Parameters:
  • schema – The schema of the stream, defaults to None.

  • random_seed – The random seed passed to the MOA learner, defaults to 0.

__init__(
schema: Schema | None = None,
random_seed: int = 0,
)[source]#
class capymoa.classifier.OnlineBagging[source]#

Bases: MOAClassifier

__init__(
schema=None,
CLI=None,
random_seed=1,
base_learner=None,
ensemble_size=100,
)[source]#
class capymoa.classifier.KNN[source]#

Bases: MOAClassifier

The default number of neighbors (k) is set to 3 instead of 10 (as in MOA)

__init__(schema=None, CLI=None, random_seed=1, k=3, window_size=1000)[source]#
class capymoa.classifier.PassiveAggressiveClassifier[source]#

Bases: SKClassifier

Streaming Passive Aggressive Classifier

This wraps sklearn.linear_model.PassiveAggressiveClassifier for ease of use in the streaming context. Some options are missing because they are not relevant in the streaming context.

Online Passive-Aggressive Algorithms K. Crammer, O. Dekel, J. Keshat, S. Shalev-Shwartz, Y. Singer - JMLR (2006)

>>> from capymoa.datasets import ElectricityTiny
>>> from capymoa.classifier import PassiveAggressiveClassifier
>>> from capymoa.evaluation import prequential_evaluation
>>> stream = ElectricityTiny()
>>> schema = stream.get_schema()
>>> learner = PassiveAggressiveClassifier(schema)
>>> results = prequential_evaluation(stream, learner, max_instances=1000)
>>> results["cumulative"].accuracy()
84.3
sklearner: PassiveAggressiveClassifier#

The underlying scikit-learn object. See: sklearn.linear_model.PassiveAggressiveClassifier

__init__(
schema: Schema,
max_step_size: float = 1.0,
fit_intercept: bool = True,
loss: str = 'hinge',
n_jobs: int | None = None,
class_weight: Dict[int, float] | None | Literal['balanced'] = None,
average: bool = False,
random_seed=1,
)[source]#

Construct a passive aggressive classifier.

Parameters:
  • schema – Stream schema

  • max_step_size – Maximum step size (regularization).

  • fit_intercept – Whether the intercept should be estimated or not. If False, the data is assumed to be already centered.

  • loss – The loss function to be used: hinge: equivalent to PA-I in the reference paper. squared_hinge: equivalent to PA-II in the reference paper.

  • n_jobs – The number of CPUs to use to do the OVA (One Versus All, for multi-class problems) computation. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

  • class_weight

    Preset for the sklearner.class_weight fit parameter.

    Weights associated with classes. If not given, all classes are supposed to have weight one.

    The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)).

  • average – When set to True, computes the averaged SGD weights and stores the result in the sklearner.coef_ attribute. If set to an int greater than 1, averaging will begin once the total number of samples seen reaches average. So average=10 will begin averaging after seeing 10 samples.

  • random_seed – Seed for the random number generator.

class capymoa.classifier.SGDClassifier[source]#

Bases: SKClassifier

Streaming stochastic gradient descent classifier.

This wraps sklearn.linear_model.SGDClassifier for ease of use in the streaming context. Some options are missing because they are not relevant in the streaming context. Furthermore, the learning rate is constant.

>>> from capymoa.datasets import ElectricityTiny
>>> from capymoa.classifier import PassiveAggressiveClassifier
>>> from capymoa.evaluation import prequential_evaluation
>>> stream = ElectricityTiny()
>>> schema = stream.get_schema()
>>> learner = SGDClassifier(schema)
>>> results = prequential_evaluation(stream, learner, max_instances=1000)
>>> results["cumulative"].accuracy()
84.2
sklearner: SGDClassifier#

The underlying scikit-learn object

__init__(
schema: Schema,
loss: Literal['hinge', 'log_loss', 'modified_huber', 'squared_hinge', 'perceptron', 'squared_error', 'huber', 'epsilon_insensitive', 'squared_epsilon_insensitive'] = 'hinge',
penalty: Literal['l2', 'l1', 'easticnet'] = 'l2',
alpha: float = 0.0001,
l1_ratio: float = 0.15,
fit_intercept: bool = True,
epsilon: float = 0.1,
n_jobs: int | None = None,
learning_rate: Literal['constant', 'optimal', 'invscaling'] = 'optimal',
eta0: float = 0.0,
random_seed: int | None = None,
)[source]#

Construct stochastic gradient descent classifier.

Parameters:
  • schema – Describes the datastream’s structure.

  • loss – The loss function to be used.

  • penalty – The penalty (aka regularization term) to be used.

  • alpha – Constant that multiplies the regularization term.

  • l1_ratio – The Elastic Net mixing parameter, with 0 <= l1_ratio <= 1. l1_ratio=0 corresponds to L2 penalty, l1_ratio=1 to L1. Only used if penalty is ‘elasticnet’. Values must be in the range [0.0, 1.0].

  • fit_intercept – Whether the intercept (bias) should be estimated or not. If False, the data is assumed to be already centered.

  • epsilon – Epsilon in the epsilon-insensitive loss functions; only if loss is ‘huber’, ‘epsilon_insensitive’, or ‘squared_epsilon_insensitive’. For ‘huber’, determines the threshold at which it becomes less important to get the prediction exactly right. For epsilon-insensitive, any differences between the current prediction and the correct label are ignored if they are less than this threshold.

  • n_jobs – The number of CPUs to use to do the OVA (One Versus All, for multi-class problems) computation. Defaults to 1.

  • learning_rate – The size of the gradient step.

  • eta0 – The initial learning rate for the ‘constant’, ‘invscaling’ or ‘adaptive’ schedules. The default value is 0.0 as eta0 is not used by the default schedule ‘optimal’.

  • class_weight

    Weights associated with classes. If not given, all classes are supposed to have weight one.

    The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)).

  • random_seed – Seed for reproducibility.

class capymoa.classifier.StreamingGradientBoostedTrees[source]#

Bases: MOAClassifier

Streaming Gradient Boosted Trees (SGBT) Classifier

Streaming Gradient Boosted Trees (SGBT), which is trained using weighted squared loss elicited in XGBoost. SGBT exploits trees with a replacement strategy to detect and recover from drifts, thus enabling the ensemble to adapt without sacrificing the predictive performance.

Reference:

Gradient boosted trees for evolving data streams. Nuwan Gunasekara, Bernhard Pfahringer, Heitor Murilo Gomes, Albert Bifet. Machine Learning, Springer, 2024.

Example usages:

>>> from capymoa.datasets import ElectricityTiny
>>> from capymoa.classifier import StreamingGradientBoostedTrees
>>> from capymoa.evaluation import prequential_evaluation
>>> stream = ElectricityTiny()
>>> schema = stream.get_schema()
>>> learner = StreamingGradientBoostedTrees(schema)
>>> results = prequential_evaluation(stream, learner, max_instances=1000)
>>> results["cumulative"].accuracy()
86.3
>>> stream = ElectricityTiny()
>>> schema = stream.get_schema()
>>> learner = StreamingGradientBoostedTrees(schema, base_learner='meta.AdaptiveRandomForestRegressor -s 10', boosting_iterations=10)
>>> results = prequential_evaluation(stream, learner, max_instances=1000)
>>> results["cumulative"].accuracy()
86.8
__init__(
schema: Schema | None = None,
random_seed: int = 0,
base_learner='trees.FIMTDD -s VarianceReductionSplitCriterion -g 25 -c 0.05 -e -p',
boosting_iterations: int = 100,
percentage_of_features: int = 75,
learning_rate=0.0125,
disable_one_hot: bool = False,
multiply_hessian_by: int = 1,
skip_training: int = 1,
use_squared_loss: bool = False,
)[source]#

Streaming Gradient Boosted Trees (SGBT) Classifier

Parameters:
  • schema – The schema of the stream.

  • random_seed – The random seed passed to the MOA learner.

  • base_learner – The base learner to be trained. Default FIMTDD -s VarianceReductionSplitCriterion -g 25 -c 0.05 -e -p.

  • boosting_iterations – The number of boosting iterations.

  • percentage_of_features – The percentage of features to use.

  • learning_rate – The learning rate.

  • disable_one_hot – Whether to disable one-hot encoding for regressors that supports nominal attributes.

  • multiply_hessian_by – The multiply hessian by this parameter to generate weights for multiple iterations.

  • skip_training – Skip training of 1/skip_training instances. skip_training=1 means no skipping is performed (train on all instances).

  • use_squared_loss – Whether to use squared loss for classification.

class capymoa.classifier.OzaBoost[source]#

Bases: MOAClassifier

Incremental on-line boosting classifier of Oza and Russell.

For the boosting method, Oza and Russell note that the weighting procedure of AdaBoost actually divides the total example weight into two halves – half of the weight is assigned to the correctly classified examples, and the other half goes to the misclassified examples. They use the Poisson distribution for deciding the random probability that an example is used for training, only this time the parameter changes according to the boosting weight of the example as it is passed through each model in sequence.

Reference:

Online bagging and boosting. Nikunj Oza, Stuart Russell. Artificial Intelligence and Statistics 2001.

Example usages:

>>> from capymoa.datasets import ElectricityTiny
>>> from capymoa.classifier import OzaBoost
>>> from capymoa.evaluation import prequential_evaluation
>>> stream = ElectricityTiny()
>>> schema = stream.get_schema()
>>> learner = OzaBoost(schema)
>>> results = prequential_evaluation(stream, learner, max_instances=1000)
>>> results["cumulative"].accuracy()
88.8
__init__(
schema: Schema | None = None,
random_seed: int = 0,
base_learner='trees.HoeffdingTree',
boosting_iterations: int = 10,
use_pure_boost: bool = False,
)[source]#

Incremental on-line boosting classifier of Oza and Russell.

Parameters:
  • schema – The schema of the stream.

  • random_seed – The random seed passed to the MOA learner.

  • base_learner – The base learner to be trained. Default trees.HoeffdingTree.

  • boosting_iterations – The number of boosting iterations.

  • use_pure_boost – Boost with weights only; no poisson..

class capymoa.classifier.MajorityClass[source]#

Bases: MOAClassifier

Majority class classifier.

Always predicts the class that has been observed most frequently the in the training data.

Example usages:

>>> from capymoa.datasets import ElectricityTiny
>>> from capymoa.classifier import MajorityClass
>>> from capymoa.evaluation import prequential_evaluation
>>> stream = ElectricityTiny()
>>> schema = stream.get_schema()
>>> learner = MajorityClass(schema)
>>> results = prequential_evaluation(stream, learner, max_instances=1000)
>>> results["cumulative"].accuracy()
50.2
__init__(schema: Schema | None = None)[source]#

Majority class classifier.

Parameters:

schema – The schema of the stream.

class capymoa.classifier.NoChange[source]#

Bases: MOAClassifier

NoChange classifier.

Always predicts the last class seen.

Example usages:

>>> from capymoa.datasets import ElectricityTiny
>>> from capymoa.classifier import NoChange
>>> from capymoa.evaluation import prequential_evaluation
>>> stream = ElectricityTiny()
>>> schema = stream.get_schema()
>>> learner = NoChange(schema)
>>> results = prequential_evaluation(stream, learner, max_instances=1000)
>>> results["cumulative"].accuracy()
85.9
__init__(schema: Schema | None = None)[source]#

NoChange class classifier.

Parameters:

schema – The schema of the stream.

class capymoa.classifier.OnlineSmoothBoost[source]#

Bases: MOAClassifier

OnlineSmoothBoost Classifier

Incremental on-line boosting with Theoretical Justifications of Shang-Tse Chen

Reference:

An Online Boosting Algorithm with Theoretical Justifications. Shang-Tse Chen, Hsuan-Tien Lin, Chi-Jen Lu. ICML, 2012.

Example usages:

>>> from capymoa.datasets import ElectricityTiny
>>> from capymoa.classifier import OnlineSmoothBoost
>>> from capymoa.evaluation import prequential_evaluation
>>> stream = ElectricityTiny()
>>> schema = stream.get_schema()
>>> learner = OnlineSmoothBoost(schema)
>>> results = prequential_evaluation(stream, learner, max_instances=1000)
>>> results["cumulative"].accuracy()
87.8
__init__(
schema: Schema | None = None,
random_seed: int = 0,
base_learner='trees.HoeffdingTree',
boosting_iterations: int = 100,
gamma=0.1,
)[source]#

OnlineSmoothBoost Classifier

Parameters:
  • schema – The schema of the stream.

  • random_seed – The random seed passed to the MOA learner.

  • base_learner – The base learner to be trained. Default trees.HoeffdingTree.

  • boosting_iterations – The number of boosting iterations (ensemble size).

  • gamma – The value of the gamma parameter.

class capymoa.classifier.StreamingRandomPatches[source]#

Bases: MOAClassifier

Streaming Random Patches (SRP) Classifier

Streaming Random Patches (SRP). This ensemble method uses a hoeffding tree by default, but it can be used with any other base model (differently from random forest variations). This algorithm can be used to simulate bagging or random subspaces, see parameter training_method. The default algorithm uses both bagging and random subspaces, namely Random Patches.

Reference:

Streaming Random Patches for Evolving Data Stream Classification. Heitor Murilo Gomes, Jesse Read, Albert Bifet. IEEE International Conference on Data Mining (ICDM), 2019.

Example usages:

>>> from capymoa.datasets import ElectricityTiny
>>> from capymoa.classifier import StreamingRandomPatches
>>> from capymoa.evaluation import prequential_evaluation
>>> stream = ElectricityTiny()
>>> schema = stream.get_schema()
>>> learner = StreamingRandomPatches(schema)
>>> results = prequential_evaluation(stream, learner, max_instances=1000)
>>> results["cumulative"].accuracy()
89.7
__init__(
schema: Schema | None = None,
random_seed: int = 0,
base_learner='trees.HoeffdingTree -g 50 -c 0.01',
ensemble_size=100,
max_features=0.6,
training_method: str = 'RandomPatches',
lambda_param: float = 6.0,
drift_detection_method='ADWINChangeDetector -a 1.0E-5',
warning_detection_method='ADWINChangeDetector -a 1.0E-4',
disable_weighted_vote: bool = False,
disable_drift_detection: bool = False,
disable_background_learner: bool = False,
)[source]#

Streaming Random Patches (SRP) Classifier

Parameters:
  • schema – The schema of the stream.

  • random_seed – The random seed passed to the MOA learner.

  • base_learner – The base learner to be trained. Default trees.HoeffdingTree -g 50 -c 0.01.

  • ensemble_size – The number of trees in the ensemble.

  • max_features – The maximum number of features to consider when splitting a node. If provided as a float between 0.0 and 1.0, it represents the percentage of features to consider. If provided as an integer, it specifies the exact number of features to consider. If provided as the string “sqrt”, it indicates that the square root of the total number of features. If not provided, the default value is 60%.

  • training_method – The training method to use: RandomSubspaces, Resampling or RandomPatches. RandomSubspaces: Random Subspaces. Resampling: Resampling (bagging). RandomPatches: Random Patches.

  • lambda_param – The lambda parameter that controls the Poisson distribution for the online bagging simulation.

  • drift_detection_method – The method used for drift detection.

  • warning_detection_method – The method used for warning detection.

  • disable_weighted_vote – Whether to disable weighted voting.

  • disable_drift_detection – Whether to disable drift detection.

  • disable_background_learner – Whether to disable background learning.