{ "cells": [ { "cell_type": "markdown", "id": "154b5160-fd11-4ed9-82f3-17b7bf7abf0d", "metadata": {}, "source": [ "# Drift Detection in CapyMOA\n", "\n", "In this tutorial, we show how to conduct drift detection using CapyMOA\n", "\n", "* Then test different drift detectors\n", "* Example using ADWIN\n", "* Evaluating detectors based on known drift location\n", "\n", "---\n", "\n", "*More information about CapyMOA can be found in* https://www.capymoa.org\n", "\n", "**last update on 25/07/2024**" ] }, { "cell_type": "code", "execution_count": 1, "id": "78dc8927-1bc3-4ce2-b352-ecf50ab56480", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "\n", "import capymoa.drift.detectors as detectors" ] }, { "cell_type": "markdown", "id": "432b8844-6f91-412d-ad36-3a640affc223", "metadata": {}, "source": [ "## Basic example" ] }, { "cell_type": "markdown", "id": "93224151-66bd-4124-ba0f-4ad486d5810a", "metadata": {}, "source": [ "- Creating dummy data" ] }, { "cell_type": "code", "execution_count": 2, "id": "3406740a-f265-4434-aae8-05db48de7e56", "metadata": {}, "outputs": [], "source": [ "data_stream = np.random.randint(2, size=2000)\n", "for i in range(999, 2000):\n", " data_stream[i] = np.random.randint(6, high=12)" ] }, { "cell_type": "markdown", "id": "e6aca673-6eab-42b0-8981-e5c73491243e", "metadata": {}, "source": [ "- Basic drift detection example" ] }, { "cell_type": "code", "execution_count": 3, "id": "bca87b8f-91f3-4eaf-a011-e8b274bda1f5", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ADWIN 1\n", "CUSUM 2\n", "DDM 1\n", "EWMAChart 1\n", "GeometricMovingAverage 1\n", "HDDMAverage 154\n", "HDDMWeighted 92\n", "PageHinkley 2\n", "RDDM 1\n", "SEED 3\n", "STEPD 1\n", "dtype: int64\n" ] } ], "source": [ "all_detectors = detectors.__all__\n", "\n", "n_detections = {k: 0 for k in all_detectors}\n", "for detector_name in all_detectors:\n", "\n", " detector = getattr(detectors, detector_name)()\n", "\n", " for i in range(2000):\n", " detector.add_element(float(data_stream[i]))\n", " if detector.detected_change():\n", " n_detections[detector_name] += 1\n", "\n", "print(pd.Series(n_detections))" ] }, { "cell_type": "markdown", "id": "5ca1d03f-a7a2-421a-b800-ebd6a7918791", "metadata": {}, "source": [ "## Example using ADWIN" ] }, { "cell_type": "code", "execution_count": 4, "id": "5c9ade00-778f-481c-a51c-49dd199cd145", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Change detected in data: 10 - at index: 1023\n" ] } ], "source": [ "from capymoa.drift.detectors import ADWIN\n", "\n", "detector = ADWIN(delta=0.001)\n", "\n", "for i in range(2000):\n", " detector.add_element(data_stream[i])\n", " if detector.detected_change():\n", " print('Change detected in data: ' + str(data_stream[i]) + ' - at index: ' + str(i))\n" ] }, { "cell_type": "code", "execution_count": 5, "id": "dcf26795-09bb-4508-ab36-878c4f145197", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1024]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Detection indices\n", "detector.detection_index" ] }, { "cell_type": "code", "execution_count": 6, "id": "cd7ae6be-17cb-4dec-b630-32e41531b020", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Warning indices\n", "detector.warning_index" ] }, { "cell_type": "code", "execution_count": 7, "id": "0b8a45e7-d983-4f69-a0d8-035619da3b11", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2000" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Instance counter\n", "detector.idx" ] }, { "cell_type": "markdown", "id": "71ba1b8b-751c-4ab4-a9a3-331de462b0f9", "metadata": {}, "source": [ "## Evaluating drift detectors\n", "\n", "Assuming the drift locations are known, you can evaluate detectors using **EvaluateDetector** class\n", "\n", "This class takes a parameter called **max_delay**, which is the maximum number of instances for which we consider a detector to have detected a change. After **max_delay** instances, we assume that the change is obvious and have been missed by the detector." ] }, { "cell_type": "code", "execution_count": 8, "id": "598a89e7-8460-415f-8a92-6854509e4697", "metadata": {}, "outputs": [], "source": [ "from capymoa.drift.eval_detector import EvaluateDetector" ] }, { "cell_type": "code", "execution_count": 9, "id": "4a2df820-9314-42e8-bf37-575c837ffabe", "metadata": {}, "outputs": [], "source": [ "eval = EvaluateDetector(max_delay=200)" ] }, { "cell_type": "markdown", "id": "98799472-d6cb-4cc1-912e-49d1004b3c84", "metadata": {}, "source": [ "The EvaluateDetector class takes two arguments for evaluating detectors:\n", "- The locations of the drift\n", "- The locations of the detections" ] }, { "cell_type": "code", "execution_count": 10, "id": "352a52da-71e0-4f7b-bf74-09230086b91a", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "mean_time_to_detect 24.0\n", "missed_detection_ratio 0.0\n", "mean_time_btw_false_alarms NaN\n", "no_alarms_per_episode 0.0\n", "dtype: float64" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trues = np.array([1000])\n", "preds = detector.detection_index\n", "\n", "eval.calc_performance(preds, trues)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.19" } }, "nbformat": 4, "nbformat_minor": 5 }