StreamingIsolationForest#

class capymoa.anomaly.StreamingIsolationForest[source]#

Bases: AnomalyDetector

Streaming Isolation Forest anomaly detector.

Streaming Isolation Forest anomaly detector [1] constructs an ensemble of isolation trees incrementally in a streaming manner to perform anomaly detection. Each tree employs reservoir sampling to maintain a fixed-size window of training instances. The anomaly score of an instance is calculated as the average path length across all trees, normalized by the expected path length for a randomly chosen instance in a tree of equivalent size. Scores are scaled between 0 and 1, with higher values indicating greater anomaly likelihood.

>>> from capymoa.datasets import ElectricityTiny
>>> from capymoa.anomaly import StreamingIsolationForest
>>> from capymoa.evaluation import AnomalyDetectionEvaluator
>>> stream = ElectricityTiny()
>>> schema = stream.get_schema()
>>> learner = StreamingIsolationForest(schema, window_size=256, n_trees=100, seed=42)
>>> evaluator = AnomalyDetectionEvaluator(schema)
>>> while stream.has_more_instances():
...     instance = stream.next_instance()
...     proba = learner.score_instance(instance)
...     evaluator.update(instance.y_index, proba)
...     learner.train(instance)
>>> auc = evaluator.auc()
>>> print(f"AUC: {auc:.2f}")
AUC: 0.61
__init__(
schema: Schema,
window_size=256,
n_trees=100,
height=None,
seed: int | None = None,
)[source]#

Construct a Streaming Isolation Forest anomaly detector. :param schema: The schema of the stream. If not provided, it will be inferred from the data. :param window_size: The size of the window for each tree. :param n_trees: The number of trees in the ensemble. :param height: The maximum height of each tree. If None, it will be set to log2(window_size). :param seed: Random seed for reproducibility.

predict(
instance: LabeledInstance,
) int | None[source]#
score_instance(
instance: LabeledInstance,
) float[source]#

Returns the anomaly score for the instance.

A high score is indicative of an anomaly.

Parameters:

instance – The instance for which the anomaly score is calculated.

Returns:

The anomaly score for the instance.

train(instance: LabeledInstance)[source]#