StreamRHF#

class capymoa.anomaly.StreamRHF[source]#

Bases: AnomalyDetector

StreamRHF anomaly detector

StreamRHF: Streaming Random Histogram Forest for Anomaly Detection

StreamRHF is an unsupervised anomaly detection algorithm tailored for real-time data streams. Building upon the principles of Random Histogram Forests (RHF), this algorithm extends its capabilities to handle dynamic data streams efficiently. StreamRHF combines the power of tree-based partitioning with kurtosis-driven feature selection to detect anomalies in a resource-constrained streaming environment.

Reference:

STREAMRHF: Tree-Based Unsupervised Anomaly Detection for Data Streams. Stefan Nesic, Andrian Putina, Maroua Bahri, Alexis Huet, Jose Manuel Navarro, Dario Rossi, Mauro Sozio.

Example:

>>> from capymoa.datasets import ElectricityTiny
>>> from capymoa.anomaly import StreamRHF
>>> from capymoa.evaluation import AnomalyDetectionEvaluator
>>> stream = ElectricityTiny()
>>> schema = stream.get_schema()
>>> learner = StreamRHF(schema=schema, num_trees=5, max_height=3)
>>> evaluator = AnomalyDetectionEvaluator(schema)
>>> while stream.has_more_instances():
...     instance = stream.next_instance()
...     proba = learner.score_instance(instance)
...     evaluator.update(instance.y_index, proba)
...     learner.train(instance)
>>> auc = evaluator.auc()
>>> print(f"AUC: {auc:.2f}")
AUC: 0.73
__init__(schema, max_height=5, num_trees=100, window_size=20, random_seed=0)[source]#

Initialize the StreamRHF learner. :param schema: Schema of the data stream. :param max_height: Maximum height of the trees. :param num_trees: Number of trees in the forest. :param window_size: Size of the sliding window. :param random_seed: Random seed for reproducibility.

predict(instance)[source]#

Predict anomaly score for a single instance. This method uses the anomaly score of the instance to classify it as normal (0) or anomalous (1) based on a threshold. :param instance: An instance from the stream. :return: 0 if the instance is classified as normal, 1 if classified as anomalous.

score_instance(instance) float[source]#

Score a single instance. A score value close to 1 means that is an anomaly and close to 0 it means it is a normal instance :param instance: An instance from the stream. :return: Anomaly score for the instance.

train(instance)[source]#

Train the learner with a single instance. :param instance: An instance from the stream.