Drift Detection in CapyMOA#

In this tutorial, we show how to conduct drift detection using CapyMOA

  • Then test different drift detectors

  • Example using ADWIN

  • Evaluating detectors based on known drift location


More information about CapyMOA can be found in https://www.capymoa.org

last update on 25/07/2024

[1]:
import numpy as np
import pandas as pd

import capymoa.drift.detectors as detectors

Basic example#

  • Creating dummy data

[2]:
data_stream = np.random.randint(2, size=2000)
for i in range(999, 2000):
    data_stream[i] = np.random.randint(6, high=12)
  • Basic drift detection example

[3]:
all_detectors = detectors.__all__

n_detections = {k: 0 for k in all_detectors}
for detector_name in all_detectors:
    detector = getattr(detectors, detector_name)()

    for i in range(2000):
        detector.add_element(float(data_stream[i]))
        if detector.detected_change():
            n_detections[detector_name] += 1

print(pd.Series(n_detections))
ADWIN                       2
CUSUM                       1
DDM                         1
EWMAChart                   1
GeometricMovingAverage      1
HDDMAverage               126
HDDMWeighted               89
PageHinkley                 1
RDDM                        2
SEED                        2
STEPD                       1
ABCD                        1
dtype: int64

Example using ADWIN#

[4]:
from capymoa.drift.detectors import ADWIN

# detector = ADWIN(delta=0.001)

for i in range(2000):
    detector.add_element(data_stream[i])
    if detector.detected_change():
        print(
            "Change detected in data: " + str(data_stream[i]) + " - at index: " + str(i)
        )
Change detected in data: 1 - at index: 24
Change detected in data: 6 - at index: 1010
[5]:
# Detection indices
detector.detection_index
[5]:
[1011, 2025, 3011]
[6]:
# Warning indices
detector.warning_index
[6]:
[1009, 1010, 2022, 2023, 2024, 3009, 3010]
[7]:
# Instance counter
detector.idx
[7]:
4000

Evaluating drift detectors#

Assuming the drift locations are known, you can evaluate detectors using EvaluateDetector class

This class takes a parameter called max_delay, which is the maximum number of instances for which we consider a detector to have detected a change. After max_delay instances, we assume that the change is obvious and have been missed by the detector.

[8]:
from capymoa.drift.eval_detector import EvaluateDetector
[9]:
eval = EvaluateDetector(max_delay=200)

The EvaluateDetector class takes two arguments for evaluating detectors: - The locations of the drift - The locations of the detections

[10]:
trues = np.array([1000])
preds = detector.detection_index

eval.calc_performance(preds, trues)
[10]:
mean_time_to_detect           11.0
missed_detection_ratio         0.0
mean_time_btw_false_alarms     NaN
no_alarms_per_episode          0.0
dtype: float64

Multivariate Drift Detection#

[11]:
from capymoa.drift.detectors import ABCD
from capymoa.datasets import ElectricityTiny

detector = ABCD()

## Opening a file as a stream
stream = ElectricityTiny()
[12]:
i = 0
loss_values = []
while stream.has_more_instances and i < 5000:
    i += 1
    instance = stream.next_instance()
    detector.add_element(instance)
    loss_values.append(detector.loss())
    if detector.detected_change():
        print("Change detected at index: " + str(i))
Change detected at index: 2283
[13]:
import numpy as np
from capymoa.drift.detectors import ABCD
from capymoa.datasets import ElectricityTiny

detector = ABCD(model_id="pca")

## Opening a file as a stream
stream_change = np.hstack([np.random.uniform(0, 0.5, 3000), np.random.uniform(0.5, 1.0, 3000)])
stream_nochange = np.random.uniform(0, 1.0, len(stream_change))
stream = np.vstack([stream_change, stream_nochange]).T
print(f"A {stream.shape[-1]}-dimensional stream")
A 2-dimensional stream
[14]:
i = 0
loss_values = []
while i < len(stream):
    instance = stream[i]
    i += 1
    detector.add_element(instance)
    loss_values.append(detector.loss())
    if detector.detected_change():
        print("Change detected at index: " + str(i))
Change detected at index: 3063
[15]:
import matplotlib.pyplot as plt
import pandas as pd

plt.plot(pd.Series(loss_values).rolling(10).mean())
plt.title("ABCD with PCA")
plt.xlabel("# Instances")
plt.ylabel("Reconstruction loss")
[15]:
Text(0, 0.5, 'Reconstruction loss')
../_images/notebooks_drift_detection_22_1.png

We see that a value of 1 as maximum reconstruction error is very conservative. By decreasing the maximum_absolute_value parameter, we can make change detection faster as it makes the applied statistical test more sensitive.

[16]:
detector= ABCD(model_id="pca", maximum_absolute_value=0.3)

i = 0
loss_values = []
while i < len(stream):
    instance = stream[i]
    i += 1
    detector.add_element(instance)
    loss_values.append(detector.loss())
    if detector.detected_change():
        print("Change detected at index: " + str(i))
Change detected at index: 3024