
class capymoa.anomaly.OnlineIsolationForest[source]#

Bases: AnomalyDetector

Online Isolation Forest

This class implements the Online Isolation Forest (oIFOR) algorithm, which is an ensemble anomaly detector capable of adapting to concept drift.


Online Isolation Forest. Filippo Leveni, Guilherme Weigert Cassales, Bernhard Pfahringer, Albert Bifet, and Giacomo Boracchi. International Conference on Machine Learning (ICML), Proceedings of Machine Learning Research (PMLR), 2024.


>>> from capymoa.datasets import ElectricityTiny
>>> from capymoa.anomaly import OnlineIsolationForest
>>> from capymoa.evaluation import AnomalyDetectionEvaluator
>>> stream = ElectricityTiny()
>>> schema = stream.get_schema()
>>> learner = OnlineIsolationForest(schema=schema)
>>> evaluator = AnomalyDetectionEvaluator(schema)
>>> while stream.has_more_instances():
...     instance = stream.next_instance()
...     proba = learner.score_instance(instance)
...     evaluator.update(instance.y_index, proba)
...     learner.train(instance)
>>> auc = evaluator.auc()
>>> print(f"AUC: {auc:.2f}")
AUC: 0.52
schema: Schema | None = None,
random_seed: int = 1,
num_trees: int = 32,
max_leaf_samples: int = 32,
growth_criterion: Literal['fixed', 'adaptive'] = 'adaptive',
subsample: float = 1.0,
window_size: int = 2048,
branching_factor: int = 2,
split: Literal['axisparallel'] = 'axisparallel',
n_jobs: int = 1,

Construct an Online Isolation Forest anomaly detector

  • schema – The schema of the stream. If not provided, it will be inferred from the data.

  • random_seed – Random seed for reproducibility.

  • num_trees – Number of trees in the ensemble.

  • window_size – The size of the window for each tree.

  • branching_factor – Branching factor of each tree.

  • max_leaf_samples – Maximum number of samples per leaf. When this number is reached, a split is performed.

  • growth_criterion – When to perform a split. If ‘adaptive’, the max_leaf_samples grows with tree depth, otherwise ‘fixed’.

  • subsample – Probability of learning a new sample in each tree.

  • split – Type of split performed at each node. Currently only ‘axisparallel’ is supported, which is the same type used by the IsolationForest algorithm.

  • n_jobs – Number of parallel jobs.

train(instance: Instance)[source]#
predict(instance: Instance) int | None[source]#
instance: Instance,
) float64[source]#