OnlineIsolationForest#

class capymoa.anomaly.OnlineIsolationForest[source]#

Bases: AnomalyDetector

Online Isolation Forest

This class implements the Online Isolation Forest (oIFOR) algorithm, which is an ensemble anomaly detector capable of adapting to concept drift.

Reference:

Online Isolation Forest. Filippo Leveni, Guilherme Weigert Cassales, Bernhard Pfahringer, Albert Bifet, and Giacomo Boracchi. International Conference on Machine Learning (ICML), Proceedings of Machine Learning Research (PMLR), 2024.

Example:

>>> from capymoa.datasets import ElectricityTiny
>>> from capymoa.anomaly import OnlineIsolationForest
>>> from capymoa.evaluation import AnomalyDetectionEvaluator
>>> stream = ElectricityTiny()
>>> schema = stream.get_schema()
>>> learner = OnlineIsolationForest(schema=schema)
>>> evaluator = AnomalyDetectionEvaluator(schema)
>>> while stream.has_more_instances():
...     instance = stream.next_instance()
...     proba = learner.score_instance(instance)
...     evaluator.update(instance.y_index, proba)
...     learner.train(instance)
>>> auc = evaluator.auc()
>>> print(f"AUC: {auc:.2f}")
AUC: 0.52
__init__(
schema: Schema | None = None,
random_seed: int = 1,
num_trees: int = 32,
max_leaf_samples: int = 32,
growth_criterion: Literal['fixed', 'adaptive'] = 'adaptive',
subsample: float = 1.0,
window_size: int = 2048,
branching_factor: int = 2,
split: Literal['axisparallel'] = 'axisparallel',
n_jobs: int = 1,
)[source]#

Construct an Online Isolation Forest anomaly detector

Parameters:
  • schema – The schema of the stream. If not provided, it will be inferred from the data.

  • random_seed – Random seed for reproducibility.

  • num_trees – Number of trees in the ensemble.

  • window_size – The size of the window for each tree.

  • branching_factor – Branching factor of each tree.

  • max_leaf_samples – Maximum number of samples per leaf. When this number is reached, a split is performed.

  • growth_criterion – When to perform a split. If ‘adaptive’, the max_leaf_samples grows with tree depth, otherwise ‘fixed’.

  • subsample – Probability of learning a new sample in each tree.

  • split – Type of split performed at each node. Currently only ‘axisparallel’ is supported, which is the same type used by the IsolationForest algorithm.

  • n_jobs – Number of parallel jobs.

train(instance: Instance)[source]#
predict(instance: Instance) int | None[source]#
score_instance(
instance: Instance,
) float64[source]#