StreamingRandomPatches#

class capymoa.classifier.StreamingRandomPatches[source]#

Bases: MOAClassifier

Streaming Random Patches (SRP) Classifier

Streaming Random Patches (SRP). This ensemble method uses a hoeffding tree by default, but it can be used with any other base model (differently from random forest variations). This algorithm can be used to simulate bagging or random subspaces, see parameter training_method. The default algorithm uses both bagging and random subspaces, namely Random Patches.

Reference:

Streaming Random Patches for Evolving Data Stream Classification. Heitor Murilo Gomes, Jesse Read, Albert Bifet. IEEE International Conference on Data Mining (ICDM), 2019.

Example usages:

>>> from capymoa.datasets import ElectricityTiny
>>> from capymoa.classifier import StreamingRandomPatches
>>> from capymoa.evaluation import prequential_evaluation
>>> stream = ElectricityTiny()
>>> schema = stream.get_schema()
>>> learner = StreamingRandomPatches(schema)
>>> results = prequential_evaluation(stream, learner, max_instances=1000)
>>> results["cumulative"].accuracy()
89.7
__init__(
schema: Schema | None = None,
random_seed: int = 0,
base_learner='trees.HoeffdingTree -g 50 -c 0.01',
ensemble_size=100,
max_features=0.6,
training_method: str = 'RandomPatches',
lambda_param: float = 6.0,
minibatch_size=None,
number_of_jobs=None,
drift_detection_method='ADWINChangeDetector -a 1.0E-5',
warning_detection_method='ADWINChangeDetector -a 1.0E-4',
disable_weighted_vote: bool = False,
disable_drift_detection: bool = False,
disable_background_learner: bool = False,
)[source]#

Streaming Random Patches (SRP) Classifier

Parameters:
  • schema – The schema of the stream.

  • random_seed – The random seed passed to the MOA learner.

  • base_learner – The base learner to be trained. Default trees.HoeffdingTree -g 50 -c 0.01.

  • ensemble_size – The number of trees in the ensemble.

  • max_features – The subspace size for each ensemble member. If provided as a float between 0.0 and 1.0, it represents the percentage of features to consider. If provided as an integer, it specifies the exact number of features to consider. If provided as the string “sqrt”, it indicates that the square root of the total number of features. If not provided, the default value is 60%.

  • training_method – The training method to use: RandomSubspaces, Resampling or RandomPatches. RandomSubspaces: Random Subspaces. Resampling: Resampling (bagging). RandomPatches: Random Patches.

  • lambda_param – The lambda parameter that controls the Poisson distribution for the online bagging simulation.

  • minibatch_size – The number of instances that a learner must accumulate before training.

  • number_of_jobs – The number of parallel jobs to run during the execution of the algorithm. By default, the algorithm executes tasks sequentially (i.e., with number_of_jobs=1). Increasing the number_of_jobs can lead to faster execution on multi-core systems. However, setting it to a high value may consume more system resources and memory. This implementation is designed to be embarrassingly parallel, meaning that the algorithm’s computations can be efficiently distributed across multiple processing units without sacrificing predictive performance. It’s recommended to experiment with different values to find the optimal setting based on the available hardware resources and the nature of the workload.

  • drift_detection_method – The method used for drift detection.

  • warning_detection_method – The method used for warning detection.

  • disable_weighted_vote – Whether to disable weighted voting.

  • disable_drift_detection – Whether to disable drift detection.

  • disable_background_learner – Whether to disable background learning.

CLI_help()[source]#
predict(instance)[source]#

Predict the label of an instance.

The base implementation calls predict_proba() and returns the label with the highest probability.

Parameters:

instance – The instance to predict the label for.

Returns:

The predicted label or None if the classifier is unable to make a prediction.

predict_proba(instance)[source]#

Return probability estimates for each label.

Parameters:

instance – The instance to estimate the probabilities for.

Returns:

An array of probabilities for each label or None if the classifier is unable to make a prediction.

train(instance)[source]#

Train the classifier with a labeled instance.

Parameters:

instance – The labeled instance to train the classifier with.

random_seed: int#

The random seed for reproducibility.

When implementing a classifier ensure random number generators are seeded.

schema: Schema#

The schema representing the instances.