Drift Detection in CapyMOA#
In this tutorial, we show how to conduct drift detection using CapyMOA
Then test different drift detectors
Example using ADWIN
Evaluating detectors based on known drift location
More information about CapyMOA can be found in https://www.capymoa.org
last update on 25/07/2024
[1]:
import numpy as np
import pandas as pd
import capymoa.drift.detectors as detectors
Basic example#
Creating dummy data
[2]:
data_stream = np.random.randint(2, size=2000)
for i in range(999, 2000):
data_stream[i] = np.random.randint(6, high=12)
Basic drift detection example
[3]:
all_detectors = detectors.__all__
n_detections = {k: 0 for k in all_detectors}
for detector_name in all_detectors:
detector = getattr(detectors, detector_name)()
for i in range(2000):
detector.add_element(float(data_stream[i]))
if detector.detected_change():
n_detections[detector_name] += 1
print(pd.Series(n_detections))
ADWIN 1
CUSUM 2
DDM 1
EWMAChart 1
GeometricMovingAverage 1
HDDMAverage 154
HDDMWeighted 92
PageHinkley 2
RDDM 1
SEED 3
STEPD 1
dtype: int64
Example using ADWIN#
[4]:
from capymoa.drift.detectors import ADWIN
detector = ADWIN(delta=0.001)
for i in range(2000):
detector.add_element(data_stream[i])
if detector.detected_change():
print('Change detected in data: ' + str(data_stream[i]) + ' - at index: ' + str(i))
Change detected in data: 10 - at index: 1023
[5]:
# Detection indices
detector.detection_index
[5]:
[1024]
[6]:
# Warning indices
detector.warning_index
[6]:
[]
[7]:
# Instance counter
detector.idx
[7]:
2000
Evaluating drift detectors#
Assuming the drift locations are known, you can evaluate detectors using EvaluateDetector class
This class takes a parameter called max_delay, which is the maximum number of instances for which we consider a detector to have detected a change. After max_delay instances, we assume that the change is obvious and have been missed by the detector.
[8]:
from capymoa.drift.eval_detector import EvaluateDetector
[9]:
eval = EvaluateDetector(max_delay=200)
The EvaluateDetector class takes two arguments for evaluating detectors: - The locations of the drift - The locations of the detections
[10]:
trues = np.array([1000])
preds = detector.detection_index
eval.calc_performance(preds, trues)
[10]:
mean_time_to_detect 24.0
missed_detection_ratio 0.0
mean_time_btw_false_alarms NaN
no_alarms_per_episode 0.0
dtype: float64