StreamingIsolationForest#
- class capymoa.anomaly.StreamingIsolationForest[source]#
Bases:
AnomalyDetector
Streaming Isolation Forest anomaly detector. This detector constructs an ensemble of isolation trees incrementally in a streaming manner. Each tree employs reservoir sampling to maintain a fixed-size window of training instances. The anomaly score of an instance is calculated as the average path length across all trees, normalized by the expected path length for a randomly chosen instance in a tree of equivalent size. Scores are scaled between 0 and 1, with higher values indicating greater anomaly likelihood.
Reference: None provided.
Example: >>> from capymoa.datasets import ElectricityTiny >>> from capymoa.anomaly import StreamingIsolationForest >>> from capymoa.evaluation import AnomalyDetectionEvaluator >>> stream = ElectricityTiny() >>> schema = stream.get_schema() >>> learner = StreamingIsolationForest(schema, window_size=256, n_trees=100, seed=42) >>> evaluator = AnomalyDetectionEvaluator(schema) >>> while stream.has_more_instances(): … instance = stream.next_instance() … proba = learner.score_instance(instance) … evaluator.update(instance.y_index, proba) … learner.train(instance) >>> auc = evaluator.auc() >>> print(f”AUC: {auc:.2f}”) AUC: 0.61
- __init__(
- schema: Schema,
- window_size=256,
- n_trees=100,
- height=None,
- seed: int | None = None,
Construct a Streaming Isolation Forest anomaly detector. :param schema: The schema of the stream. If not provided, it will be inferred from the data. :param window_size: The size of the window for each tree. :param n_trees: The number of trees in the ensemble. :param height: The maximum height of each tree. If None, it will be set to log2(window_size). :param seed: Random seed for reproducibility.
- predict(
- instance: LabeledInstance,
- score_instance(
- instance: LabeledInstance,
Returns the anomaly score for the instance.
A high score is indicative of an anomaly.
- Parameters:
instance – The instance for which the anomaly score is calculated.
- Returns:
The anomaly score for the instance.
- train(instance: LabeledInstance)[source]#