Drift Detection in CapyMOA#

In this tutorial, we show how to conduct drift detection using CapyMOA

  • Then test different drift detectors

  • Example using ADWIN

  • Evaluating detectors based on known drift location


More information about CapyMOA can be found in https://www.capymoa.org

last update on 25/07/2024

[1]:
import numpy as np
import pandas as pd

import capymoa.drift.detectors as detectors

Basic example#

  • Creating dummy data

[2]:
data_stream = np.random.randint(2, size=2000)
for i in range(999, 2000):
    data_stream[i] = np.random.randint(6, high=12)
  • Basic drift detection example

[3]:
all_detectors = detectors.__all__

n_detections = {k: 0 for k in all_detectors}
for detector_name in all_detectors:

    detector = getattr(detectors, detector_name)()

    for i in range(2000):
        detector.add_element(float(data_stream[i]))
        if detector.detected_change():
            n_detections[detector_name] += 1

print(pd.Series(n_detections))
ADWIN                       1
CUSUM                       2
DDM                         1
EWMAChart                   1
GeometricMovingAverage      1
HDDMAverage               154
HDDMWeighted               92
PageHinkley                 2
RDDM                        1
SEED                        3
STEPD                       1
dtype: int64

Example using ADWIN#

[4]:
from capymoa.drift.detectors import ADWIN

detector = ADWIN(delta=0.001)

for i in range(2000):
    detector.add_element(data_stream[i])
    if detector.detected_change():
        print('Change detected in data: ' + str(data_stream[i]) + ' - at index: ' + str(i))

Change detected in data: 10 - at index: 1023
[5]:
# Detection indices
detector.detection_index
[5]:
[1024]
[6]:
# Warning indices
detector.warning_index
[6]:
[]
[7]:
# Instance counter
detector.idx
[7]:
2000

Evaluating drift detectors#

Assuming the drift locations are known, you can evaluate detectors using EvaluateDetector class

This class takes a parameter called max_delay, which is the maximum number of instances for which we consider a detector to have detected a change. After max_delay instances, we assume that the change is obvious and have been missed by the detector.

[8]:
from capymoa.drift.eval_detector import EvaluateDetector
[9]:
eval = EvaluateDetector(max_delay=200)

The EvaluateDetector class takes two arguments for evaluating detectors: - The locations of the drift - The locations of the detections

[10]:
trues = np.array([1000])
preds = detector.detection_index

eval.calc_performance(preds, trues)
[10]:
mean_time_to_detect           24.0
missed_detection_ratio         0.0
mean_time_btw_false_alarms     NaN
no_alarms_per_episode          0.0
dtype: float64