RobustRandomCutForest#

class capymoa.anomaly.RobustRandomCutForest[source]#

Bases: AnomalyDetector

Robust Random Cut Forest.

Robust Random Cut Forest (RRCF) [1] is an algorithm for anomaly detection in dynamic data streams. It maintains a random cut-based data structure (the forest) that acts as a compact sketch or synopsis of the input stream. Anomalies are defined non-parametrically in terms of the “externality” a new point imposes on the existing data—that is, how much the new point influences the structure of the forest.

This implementation is adapted from https://klabum.github.io/rrcf/

>>> from capymoa.datasets import ElectricityTiny
>>> from capymoa.anomaly import RobustRandomCutForest
>>> from capymoa.evaluation import AnomalyDetectionEvaluator
>>> stream = ElectricityTiny()
>>> schema = stream.get_schema()
>>> learner = RobustRandomCutForest(schema, tree_size=256, n_trees=100, random_state=42)
>>> evaluator = AnomalyDetectionEvaluator(schema)
>>> while stream.has_more_instances():
...     instance = stream.next_instance()
...     proba = learner.score_instance(instance)
...     evaluator.update(instance.y_index, proba)
...     learner.train(instance)
>>> auc = evaluator.auc()
>>> print(f"AUC: {auc:.2f}")
AUC: 0.56
__init__(
schema: Schema,
tree_size=1000,
n_trees=100,
random_state=42,
)[source]#
predict(instance: Instance) int | None[source]#
score_instance(instance: Instance) float[source]#

Returns the anomaly score for the instance.

A high score is indicative of an anomaly.

Parameters:

instance – The instance for which the anomaly score is calculated.

Returns:

The anomaly score for the instance.

train(instance: Instance)[source]#