Drift Detection in CapyMOA#

In this tutorial, we show how to conduct drift detection using CapyMOA

Then test different drift detectors
Example using ADWIN
Evaluating detectors based on known drift location

More information about CapyMOA can be found in https://www.capymoa.org

last update on 25/07/2024

[1]:

import numpy as np
import pandas as pd

import capymoa.drift.detectors as detectors

Basic example#

Creating dummy data

[2]:

data_stream = np.random.randint(2, size=2000)
for i in range(999, 2000):
    data_stream[i] = np.random.randint(6, high=12)

Basic drift detection example

[3]:

all_detectors = detectors.__all__

n_detections = {k: 0 for k in all_detectors}
for detector_name in all_detectors:
    detector = getattr(detectors, detector_name)()

    for i in range(2000):
        detector.add_element(float(data_stream[i]))
        if detector.detected_change():
            n_detections[detector_name] += 1

print(pd.Series(n_detections))

ADWIN                       2
CUSUM                       1
DDM                         1
EWMAChart                   1
GeometricMovingAverage      1
HDDMAverage               126
HDDMWeighted               89
PageHinkley                 1
RDDM                        2
SEED                        2
STEPD                       1
ABCD                        1
dtype: int64

Example using ADWIN#

[4]:

# detector = ADWIN(delta=0.001)

for i in range(2000):
    detector.add_element(data_stream[i])
    if detector.detected_change():
        print(
            "Change detected in data: " + str(data_stream[i]) + " - at index: " + str(i)
        )

Change detected in data: 1 - at index: 24
Change detected in data: 6 - at index: 1010

[5]:

# Detection indices
detector.detection_index

[5]:

[1011, 2025, 3011]

[6]:

# Warning indices
detector.warning_index

[6]:

[1009, 1010, 2022, 2023, 2024, 3009, 3010]

[7]:

# Instance counter
detector.idx

[7]:

Evaluating drift detectors#

Assuming the drift locations are known, you can evaluate detectors using EvaluateDetector class

This class takes a parameter called max_delay, which is the maximum number of instances for which we consider a detector to have detected a change. After max_delay instances, we assume that the change is obvious and have been missed by the detector.

[8]:

from capymoa.drift.eval_detector import EvaluateDetector

[9]:

eval = EvaluateDetector(max_delay=200)

The EvaluateDetector class takes two arguments for evaluating detectors:

The locations of the drift
The locations of the detections

[10]:

trues = np.array([1000])
preds = detector.detection_index

eval.calc_performance(preds, trues)

[10]:

mean_time_to_detect           11.0
missed_detection_ratio         0.0
mean_time_btw_false_alarms     NaN
no_alarms_per_episode          0.0
dtype: float64

Multivariate Drift Detection#

[11]:

from capymoa.drift.detectors import ABCD
from capymoa.datasets import ElectricityTiny

detector = ABCD()

## Opening a file as a stream
stream = ElectricityTiny()

[12]:

i = 0
loss_values = []
while stream.has_more_instances and i < 5000:
    i += 1
    instance = stream.next_instance()
    detector.add_element(instance)
    loss_values.append(detector.loss())
    if detector.detected_change():
        print("Change detected at index: " + str(i))

Change detected at index: 2283

[13]:

import numpy as np
from capymoa.drift.detectors import ABCD
from capymoa.datasets import ElectricityTiny

detector = ABCD(model_id="pca")

## Opening a file as a stream
stream_change = np.hstack(
    [np.random.uniform(0, 0.5, 3000), np.random.uniform(0.5, 1.0, 3000)]
)
stream_nochange = np.random.uniform(0, 1.0, len(stream_change))
stream = np.vstack([stream_change, stream_nochange]).T
print(f"A {stream.shape[-1]}-dimensional stream")

A 2-dimensional stream

[14]:

i = 0
loss_values = []
while i < len(stream):
    instance = stream[i]
    i += 1
    detector.add_element(instance)
    loss_values.append(detector.loss())
    if detector.detected_change():
        print("Change detected at index: " + str(i))

Change detected at index: 3063

[15]:

import matplotlib.pyplot as plt
import pandas as pd

plt.plot(pd.Series(loss_values).rolling(10).mean())
plt.title("ABCD with PCA")
plt.xlabel("# Instances")
plt.ylabel("Reconstruction loss")

[15]:

Text(0, 0.5, 'Reconstruction loss')

../_images/notebooks_drift_detection_22_1.png

We see that a value of 1 as maximum reconstruction error is very conservative. By decreasing the maximum_absolute_value parameter, we can make change detection faster as it makes the applied statistical test more sensitive.

[16]:

detector = ABCD(model_id="pca", maximum_absolute_value=0.3)

i = 0
loss_values = []
while i < len(stream):
    instance = stream[i]
    i += 1
    detector.add_element(instance)
    loss_values.append(detector.loss())
    if detector.detected_change():
        print("Change detected at index: " + str(i))

Change detected at index: 3024

Drift Detection in CapyMOA#

Basic example#

Example using ADWIN#

Evaluating drift detectors#

Multivariate Drift Detection#

This Page