{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "a48e9306-f459-4d8a-8608-9bd71a7600ae", "metadata": {}, "source": [ "# 6. Exploring Advanced Features\n", "\n", "This notebook is target at advanced users that want, among other things, access MOA objects directly using the Python API from capymoa. \n", "\n", "* Examples on how to use any MOA Classifier or Regressor from capymoa\n", "* An example of how preprocessing (from MOA) can be used.\n", "* Comparing a SKLearn model against a MOA model\n", "* A variation of **Tutorial 5**: `Creating a new classifier in CapyMOA` which uses MOA learners, thus accessing MOA (Java) objects directly\n", "* How to log experiments using TensorBoard alongside the PyTorch API. This extends **Tutorial 3**: `Using Pytorch with CapyMOA`\n", "* Creating a synthetic stream with concept drifts using the MOA CLI directly\n", "* An example utilising a multi-threaded ensemble\n", "\n", "---\n", "\n", "*More information about CapyMOA can be found in* https://www.capymoa.org\n", "\n", "**last update on 28/07/2024**" ] }, { "cell_type": "markdown", "id": "d2bb536e-4716-48fe-bf9b-05455b9e5a85", "metadata": {}, "source": [ "## 1. Using any MOA learner\n", "\n", "* **CapyMOA gives you access to any MOA classifier or regressor**\n", "\n", "* For some of the MOA learners there are corresponding Python objects (such as the HoeffdingTree or Adaptive Random Forest Classifier). However, MOA has over a hundred learners, and more are added constantly.\n", "\n", "* To allow advanced users to access **any** MOA learner from CapyMOA, we included the ```MOAClassifier``` and ```MOARegressor``` generic wrappers." ] }, { "cell_type": "code", "execution_count": 1, "id": "3d1a9e23-a272-4c01-ab9b-e7f3ec5f7395", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Cumulative accuracy = 59.57599999999999, wall-clock time: 1.4879562854766846\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
instancesaccuracykappakappa_tkappa_mf1_scoref1_score_0f1_score_1f1_score_2f1_score_3...precision_1precision_2precision_3precision_4recallrecall_0recall_1recall_2recall_3recall_4
04500.050.00000033.44950734.85813527.76886045.08956753.48741233.01282162.37424513.945578...39.84526167.68559029.49640347.64454043.33801557.14285728.18057557.8358219.13140364.399421
19000.049.46666732.40111735.28742228.33280841.32895256.12774525.12998361.8447252.307692...34.85576968.2284046.18556745.59637641.01378459.17508419.64769656.5533981.41844068.274303
213500.052.42222236.24596238.84604432.03174645.92881159.17745625.33333353.85159018.657938...39.82035965.01706531.31868151.89938443.75628766.07142918.57541945.95898713.28671374.888889
318000.059.31111146.62640347.30935341.63213357.97440459.91701234.18013958.62069064.179104...44.22310856.34285775.70422564.53045756.44120762.94681827.85445461.09045855.69948274.614820
422500.064.55555654.11230554.25867548.94366263.43673967.88912652.96735960.54687563.816475...62.96296361.91744357.95454568.31476363.57351470.62999145.71062759.23566970.99768071.293605
527000.060.11111148.45584248.67028944.37558158.75513768.89320449.30707558.83054955.581395...58.88501756.53669754.56621060.08032158.55848973.67109642.40903461.31840856.63507158.758837
631500.060.66666748.79768149.22547344.05815459.62247969.43866947.24660856.35103961.557478...56.81382052.92841666.22340458.83959058.58757770.22708240.43715860.24691457.50577464.520958
736000.062.48888951.36584151.97724046.66666762.07919167.82608755.93220353.04449667.179487...61.68224350.72788467.00767365.52748961.83253768.99747351.16279155.58282267.35218566.067416
840500.057.93333345.32022445.85240339.55938756.43368665.09572935.96774256.27760361.290323...43.98422157.10627459.50783064.43172556.39578475.29812630.42292055.47263763.18289857.602339
945000.072.00000063.51042863.63636459.42029071.30844571.45807566.17647172.94117669.961977...70.97791871.48058375.00000074.64788770.42234374.12168061.98347174.46270565.55819575.985663
1049500.066.17777856.00505955.82002951.92672166.01230562.70358361.89151662.08718670.573871...65.05848065.09695371.53465369.77067465.43266863.90041559.01856859.34343469.63855475.262369
1150000.065.40000054.99547154.86956550.71225165.01834961.38613960.44568261.47910068.827930...63.26530663.47941670.40816370.11095764.46632562.15538857.86666759.60099867.31707375.391499
\n", "

12 rows × 23 columns

\n", "
" ], "text/plain": [ " instances accuracy kappa kappa_t kappa_m f1_score \\\n", "0 4500.0 50.000000 33.449507 34.858135 27.768860 45.089567 \n", "1 9000.0 49.466667 32.401117 35.287422 28.332808 41.328952 \n", "2 13500.0 52.422222 36.245962 38.846044 32.031746 45.928811 \n", "3 18000.0 59.311111 46.626403 47.309353 41.632133 57.974404 \n", "4 22500.0 64.555556 54.112305 54.258675 48.943662 63.436739 \n", "5 27000.0 60.111111 48.455842 48.670289 44.375581 58.755137 \n", "6 31500.0 60.666667 48.797681 49.225473 44.058154 59.622479 \n", "7 36000.0 62.488889 51.365841 51.977240 46.666667 62.079191 \n", "8 40500.0 57.933333 45.320224 45.852403 39.559387 56.433686 \n", "9 45000.0 72.000000 63.510428 63.636364 59.420290 71.308445 \n", "10 49500.0 66.177778 56.005059 55.820029 51.926721 66.012305 \n", "11 50000.0 65.400000 54.995471 54.869565 50.712251 65.018349 \n", "\n", " f1_score_0 f1_score_1 f1_score_2 f1_score_3 ... precision_1 \\\n", "0 53.487412 33.012821 62.374245 13.945578 ... 39.845261 \n", "1 56.127745 25.129983 61.844725 2.307692 ... 34.855769 \n", "2 59.177456 25.333333 53.851590 18.657938 ... 39.820359 \n", "3 59.917012 34.180139 58.620690 64.179104 ... 44.223108 \n", "4 67.889126 52.967359 60.546875 63.816475 ... 62.962963 \n", "5 68.893204 49.307075 58.830549 55.581395 ... 58.885017 \n", "6 69.438669 47.246608 56.351039 61.557478 ... 56.813820 \n", "7 67.826087 55.932203 53.044496 67.179487 ... 61.682243 \n", "8 65.095729 35.967742 56.277603 61.290323 ... 43.984221 \n", "9 71.458075 66.176471 72.941176 69.961977 ... 70.977918 \n", "10 62.703583 61.891516 62.087186 70.573871 ... 65.058480 \n", "11 61.386139 60.445682 61.479100 68.827930 ... 63.265306 \n", "\n", " precision_2 precision_3 precision_4 recall recall_0 recall_1 \\\n", "0 67.685590 29.496403 47.644540 43.338015 57.142857 28.180575 \n", "1 68.228404 6.185567 45.596376 41.013784 59.175084 19.647696 \n", "2 65.017065 31.318681 51.899384 43.756287 66.071429 18.575419 \n", "3 56.342857 75.704225 64.530457 56.441207 62.946818 27.854454 \n", "4 61.917443 57.954545 68.314763 63.573514 70.629991 45.710627 \n", "5 56.536697 54.566210 60.080321 58.558489 73.671096 42.409034 \n", "6 52.928416 66.223404 58.839590 58.587577 70.227082 40.437158 \n", "7 50.727884 67.007673 65.527489 61.832537 68.997473 51.162791 \n", "8 57.106274 59.507830 64.431725 56.395784 75.298126 30.422920 \n", "9 71.480583 75.000000 74.647887 70.422343 74.121680 61.983471 \n", "10 65.096953 71.534653 69.770674 65.432668 63.900415 59.018568 \n", "11 63.479416 70.408163 70.110957 64.466325 62.155388 57.866667 \n", "\n", " recall_2 recall_3 recall_4 \n", "0 57.835821 9.131403 64.399421 \n", "1 56.553398 1.418440 68.274303 \n", "2 45.958987 13.286713 74.888889 \n", "3 61.090458 55.699482 74.614820 \n", "4 59.235669 70.997680 71.293605 \n", "5 61.318408 56.635071 58.758837 \n", "6 60.246914 57.505774 64.520958 \n", "7 55.582822 67.352185 66.067416 \n", "8 55.472637 63.182898 57.602339 \n", "9 74.462705 65.558195 75.985663 \n", "10 59.343434 69.638554 75.262369 \n", "11 59.600998 67.317073 75.391499 \n", "\n", "[12 rows x 23 columns]" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from capymoa.evaluation import prequential_evaluation\n", "from capymoa.base import MOAClassifier\n", "from capymoa.datasets import RBFm_100k\n", "# This is an import from MOA\n", "from moa.classifiers.trees import HoeffdingAdaptiveTree\n", "\n", "rbf_100k = RBFm_100k()\n", "\n", "# Creates a wrapper around the HoeffdingAdaptiveTree, which then can be used as any other capymoa classifier\n", "HAT = MOAClassifier(schema=rbf_100k.get_schema(), moa_learner=HoeffdingAdaptiveTree)\n", "\n", "results_HAT = prequential_evaluation(stream=rbf_100k, learner=HAT, window_size=4500, max_instances=50000)\n", "\n", "print(f\"Cumulative accuracy = {results_HAT['cumulative'].accuracy()}, wall-clock time: {results_HAT['wallclock']}\")\n", "display(results_HAT['windowed'].metrics_per_window())" ] }, { "cell_type": "markdown", "id": "3c102052-1a19-4f30-b3d1-f0163cab6af0", "metadata": {}, "source": [ "### 1.1 Checking the hyperparameters for the MOA CLI\n", "\n", "* MOA objects can be parametrized using the MOA CLI (Command Line Interface)\n", "* Sometimes you may not know the relevent parameters for ```moa_learner```, ```moa_learner.CLI_help()``` presents all the hyperparameters available for the ```moa_learner``` object." ] }, { "cell_type": "code", "execution_count": 2, "id": "3fbca563-e87f-41f2-98f2-dcad2ab65fb6", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-l treeLearner (default: ARFHoeffdingTree -e 2000000 -g 50 -c 0.01)\n", "Random Forest Tree.\n", "-s ensembleSize (default: 100)\n", "The number of trees.\n", "-o mFeaturesMode (default: Percentage (M * (m / 100)))\n", "Defines how m, defined by mFeaturesPerTreeSize, is interpreted. M represents the total number of features.\n", "-m mFeaturesPerTreeSize (default: 60)\n", "Number of features allowed considered for each split. Negative values corresponds to M - m\n", "-a lambda (default: 6.0)\n", "The lambda parameter for bagging.\n", "-j numberOfJobs (default: 1)\n", "Total number of concurrent jobs used for processing (-1 = as much as possible, 0 = do not use multithreading)\n", "-x driftDetectionMethod (default: ADWINChangeDetector -a 1.0E-3)\n", "Change detector for drifts and its parameters\n", "-p warningDetectionMethod (default: ADWINChangeDetector -a 1.0E-2)\n", "Change detector for warnings (start training bkg learner)\n", "-w disableWeightedVote\n", "Should use weighted voting?\n", "-u disableDriftDetection\n", "Should use drift detection? If disabled then bkg learner is also disabled\n", "-q disableBackgroundLearner\n", "Should use bkg learner? If disabled then reset tree immediately.\n", "\n" ] } ], "source": [ "from moa.classifiers.meta import AdaptiveRandomForest\n", "\n", "arf = MOAClassifier(schema=rbf_100k.get_schema(), moa_learner=AdaptiveRandomForest)\n", "\n", "print(arf.CLI_help())" ] }, { "attachments": {}, "cell_type": "markdown", "id": "55d070de-8697-4f98-a11b-eab4e3d5c281", "metadata": {}, "source": [ "## 2. Using preprocessing from MOA (filters)\n", "\n", "We are working on a more user friendly API for preprocessing, this example just show how one can do that using MOA filters from here\n", "\n", "* Here we use ```NormalisationFilter``` filter from MOA to normalize instances in an online fashion.\n", "* MOA filters syntax wraps the whole stream, so we are always composing commands like `Filter(Stream, \n", "* We obtain the MOA CLI from the rbf_100k stream, since it can be mapped to a MOA stream, it is possible to obtain that. Comment out the print statements if you would like to inspect the actual creation strings (perhaps to copy and paste that into MOA?)" ] }, { "cell_type": "code", "execution_count": 3, "id": "ae9bb646-e0d1-4de6-b5a1-cff0f0a1b172", "metadata": { "ExecuteTime": { "end_time": "2024-04-29T11:52:48.998749Z", "start_time": "2024-04-29T11:52:45.889095Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Accuracy with online normalization: 61.815\n", "Accuracy without normalization: 60.357000000000006\n" ] } ], "source": [ "from capymoa.stream import Stream\n", "from capymoa.classifier import OnlineBagging\n", "from capymoa.evaluation import prequential_evaluation\n", "from moa.streams.filters import StandardisationFilter, NormalisationFilter\n", "from moa.streams import FilteredStream\n", "\n", "rbf_100k = RBFm_100k()\n", "\n", "# print(f'MOA creation string for data: {rbf_100k.moa_stream.getCLICreationString(rbf_100k.moa_stream.__class__)}')\n", "\n", "# Create a FilterStream and use the NormalisationFilter\n", "rbf_stream_normalised = Stream(CLI=f\"-s ({rbf_100k.moa_stream.getCLICreationString(rbf_100k.moa_stream.__class__)}) \\\n", "-f NormalisationFilter \", moa_stream=FilteredStream())\n", "\n", "# print(f'MOA creation string for filtered version: {rbf_stream_normalised.moa_stream.getCLICreationString(rbf_stream_normalised.moa_stream.__class__)}')\n", "\n", "ob_learner_norm = OnlineBagging(schema=rbf_stream_normalised.get_schema(), ensemble_size=5)\n", "ob_learner = OnlineBagging(schema=rbf_100k.get_schema(), ensemble_size=5)\n", "\n", "ob_results_norm = prequential_evaluation(stream=rbf_stream_normalised, learner=ob_learner_norm)\n", "ob_results = prequential_evaluation(stream=rbf_100k, learner=ob_learner)\n", "\n", "\n", "print(f\"Accuracy with online normalization: {ob_results_norm['cumulative'].accuracy()}\")\n", "print(f\"Accuracy without normalization: {ob_results['cumulative'].accuracy()}\")" ] }, { "cell_type": "markdown", "id": "f74c58fb-dd90-49f4-8b4f-81a9e36e47ff", "metadata": {}, "source": [ "## 3. Comparing a MOA and SKLearn models\n", "\n", "* This simple example shows how it is simple to compare a MOA and a SKLearn regressors. \n", "* For the sake of this example, we are using the wrappers\n", "* SKClassifier (and SKRegressor) are parametrized directly as part of the object initialization\n", "* MOAClassifier (and MOARegressor) are parametrized through a CLI (a separate parameter)" ] }, { "cell_type": "code", "execution_count": 4, "id": "afe7193c-5bab-4b46-8627-c74b28a3b7c5", "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from capymoa.base import SKClassifier, MOAClassifier\n", "from capymoa.datasets import CovtypeTiny\n", "from capymoa.evaluation import prequential_evaluation_multiple_learners\n", "from capymoa.evaluation.visualization import plot_windowed_results\n", "\n", "from sklearn.linear_model import SGDClassifier\n", "from moa.classifiers.trees import HoeffdingTree\n", "\n", "covt_tiny = CovtypeTiny()\n", "\n", "sk_sgd = SKClassifier(schema=covt_tiny.schema, sklearner=SGDClassifier(loss='log_loss', penalty='l1', alpha=0.001))\n", "moa_ht = MOAClassifier(schema=covt_tiny.schema, moa_learner=HoeffdingTree, CLI=\"-g 50\")\n", "\n", "results = prequential_evaluation_multiple_learners(stream=covt_tiny, learners={'sk_sgd':sk_sgd, 'moa_ht':moa_ht}, window_size=100)\n", "plot_windowed_results(results['sk_sgd'], results['moa_ht'], metric='accuracy')" ] }, { "cell_type": "markdown", "id": "df198282-7a87-4e03-ba0b-b3de3ccf9163", "metadata": {}, "source": [ "## 4. Creating Python learners with MOA Objects\n", "\n", "* This example follow the example from `06_new_learner` which shows how to create a custom online bagging implementation.\n", "* Here we also create an online bagging implementation, but the base_learner is a MOA class" ] }, { "cell_type": "code", "execution_count": 5, "id": "a0a0906a-c953-4d50-8be8-c1c94e3eac4d", "metadata": {}, "outputs": [], "source": [ "from capymoa.base import Classifier, MOAClassifier\n", "from moa.classifiers.trees import HoeffdingTree\n", "from collections import Counter\n", "import numpy as np\n", "import random\n", "import math\n", "\n", "def poisson(lambd, random_generator):\n", " if lambd < 100.0:\n", " product = 1.0\n", " _sum = 1.0\n", " threshold = random_generator.random() * math.exp(lambd)\n", " i = 1\n", " max_val = max(100, 10 * math.ceil(lambd))\n", " while i < max_val and _sum <= threshold:\n", " product *= (lambd / i)\n", " _sum += product\n", " i += 1\n", " return i - 1\n", " x = lambd + math.sqrt(lambd) * random_generator.gauss(0, 1)\n", " if x < 0.0:\n", " return 0\n", " return int(math.floor(x))\n", "\n", "class CustomOnlineBagging(Classifier):\n", " def __init__(self, schema=None, random_seed=1, ensemble_size=5, moa_base_learner_class=None, CLI_base_learner=None):\n", " super().__init__(schema=schema, random_seed=random_seed)\n", "\n", " self.random_generator = random.Random()\n", " self.CLI_base_learner = CLI_base_learner\n", " \n", " self.ensemble_size = ensemble_size\n", " self.moa_base_learner_class = moa_base_learner_class\n", " \n", " # Default base learner if None is specified\n", " if self.moa_base_learner_class is None:\n", " self.moa_base_learner_class = HoeffdingTree\n", " \n", " self.ensemble = []\n", " # Create several instances for the base_learners\n", " for i in range(self.ensemble_size): \n", " self.ensemble.append(MOAClassifier(schema=self.schema, moa_learner=self.moa_base_learner_class(), CLI=self.CLI_base_learner))\n", " \n", " def __str__(self):\n", " return 'CustomOnlineBagging'\n", "\n", " def train(self, instance):\n", " for i in range(self.ensemble_size):\n", " k = poisson(1.0, self.random_generator)\n", " for _ in range(k):\n", " self.ensemble[i].train(instance)\n", "\n", " def predict(self, instance):\n", " predictions = []\n", " for i in range(self.ensemble_size):\n", " predictions.append(self.ensemble[i].predict(instance))\n", " majority_vote = Counter(predictions)\n", " prediction = majority_vote.most_common(1)[0][0]\n", " return prediction\n", "\n", " def predict_proba(self, instance):\n", " probabilities = []\n", " for i in range(self.ensemble_size):\n", " classifier_proba = self.ensemble[i].predict_proba(instance)\n", " classifier_proba = classifier_proba / np.sum(classifier_proba)\n", " probabilities.append(classifier_proba)\n", " avg_proba = np.mean(probabilities, axis=0)\n", " return avg_proba\n", "\n" ] }, { "cell_type": "markdown", "id": "c971ac60-0aa5-4295-be56-bcb6ee1ccb40", "metadata": {}, "source": [ "### 4.1 Testing the custom online bagging\n", "\n", "* We choose to use an HoeffdingAdaptiveTree from MOA as the base learner\n", "* We also specify the CLI commands to configure the base learner" ] }, { "cell_type": "code", "execution_count": 6, "id": "0740765c-35c0-416a-99ca-f8e55f921032", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/gomeshe/Dropbox/ciencia_computacao/dev/main-projects/CapyMOA/src/capymoa/stream/_stream.py:38: UserWarning: target variable includes 2 (< 20) unique values, inferred as categorical, set target_type = 'numeric' if you intend numeric targets\n", " warnings.warn(f'target variable includes {num_unique} (< 20) unique values, inferred as categorical, '\n" ] }, { "data": { "text/plain": [ "82.58077330508475" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from capymoa.evaluation import prequential_evaluation\n", "from capymoa.stream import stream_from_file\n", "from moa.classifiers.trees import HoeffdingAdaptiveTree\n", "\n", "elec_stream = stream_from_file(path_to_csv_or_arff=\"../data/electricity.csv\")\n", "\n", "# Creating a learner: using a hoeffding adaptive tree as the base learner with grace period of 50 (-g 50)\n", "NEW_OB = CustomOnlineBagging(schema=elec_stream.get_schema(), \n", " ensemble_size=5, \n", " moa_base_learner_class=HoeffdingAdaptiveTree, \n", " CLI_base_learner=\"-g 50\")\n", "\n", "results_NEW_OB = prequential_evaluation(stream=elec_stream, learner=NEW_OB, window_size=4500)\n", "\n", "print(f\"Accuracy: {results_NEW_OB.cumulative.accuracy()}\"" ] }, { "cell_type": "markdown", "id": "62e3e70a-2422-4b3a-b2bf-b8f96a3efdeb", "metadata": {}, "source": [ "## 5. Using TensorBoard with PyTorch in CapyMOA\n", "\n", "* One can use TensorBoard to visualize logged data in an online fashion\n", "* We go through all the steps below, including installing TensorBoard" ] }, { "cell_type": "markdown", "id": "8fda8006-e0e9-4547-a2c9-8fc43d16ca57", "metadata": {}, "source": [ "### 5.1 Install TensorBoard\n", "Clear any logs from previous runs\n", "\n", "```sh\n", "rm -rf ./runs\n", "```" ] }, { "cell_type": "code", "execution_count": 7, "id": "f11baceb-2c77-4636-8e91-d19aadf7b3b3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: tensorboard in /Users/gomeshe/miniconda3/envs/capymoa/lib/python3.9/site-packages (2.17.0)\n", "Requirement already satisfied: absl-py>=0.4 in /Users/gomeshe/miniconda3/envs/capymoa/lib/python3.9/site-packages (from tensorboard) (2.1.0)\n", "Requirement already satisfied: grpcio>=1.48.2 in /Users/gomeshe/miniconda3/envs/capymoa/lib/python3.9/site-packages (from tensorboard) (1.65.1)\n", "Requirement already satisfied: markdown>=2.6.8 in /Users/gomeshe/miniconda3/envs/capymoa/lib/python3.9/site-packages (from tensorboard) (3.6)\n", "Requirement already satisfied: numpy>=1.12.0 in /Users/gomeshe/miniconda3/envs/capymoa/lib/python3.9/site-packages (from tensorboard) (1.26.3)\n", "Requirement already satisfied: protobuf!=4.24.0,<5.0.0,>=3.19.6 in /Users/gomeshe/miniconda3/envs/capymoa/lib/python3.9/site-packages (from tensorboard) (4.25.4)\n", "Requirement already satisfied: setuptools>=41.0.0 in /Users/gomeshe/miniconda3/envs/capymoa/lib/python3.9/site-packages (from tensorboard) (69.5.1)\n", "Requirement already satisfied: six>1.9 in /Users/gomeshe/miniconda3/envs/capymoa/lib/python3.9/site-packages (from tensorboard) (1.16.0)\n", "Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in /Users/gomeshe/miniconda3/envs/capymoa/lib/python3.9/site-packages (from tensorboard) (0.7.2)\n", "Requirement already satisfied: werkzeug>=1.0.1 in /Users/gomeshe/miniconda3/envs/capymoa/lib/python3.9/site-packages (from tensorboard) (3.0.3)\n", "Requirement already satisfied: importlib-metadata>=4.4 in /Users/gomeshe/miniconda3/envs/capymoa/lib/python3.9/site-packages (from markdown>=2.6.8->tensorboard) (7.1.0)\n", "Requirement already satisfied: MarkupSafe>=2.1.1 in /Users/gomeshe/miniconda3/envs/capymoa/lib/python3.9/site-packages (from werkzeug>=1.0.1->tensorboard) (2.1.5)\n", "Requirement already satisfied: zipp>=0.5 in /Users/gomeshe/miniconda3/envs/capymoa/lib/python3.9/site-packages (from importlib-metadata>=4.4->markdown>=2.6.8->tensorboard) (3.18.2)\n" ] } ], "source": [ "!pip install tensorboard" ] }, { "cell_type": "markdown", "id": "b1e64bd6-b0c7-4296-aee7-a29985a9da21", "metadata": {}, "source": [ "### 5.2 PyTorchClassifier\n", "* We define `PyTorchClassifier` and `NeuralNetwork` classes similarly to those from **Tutorial 3**: `Using Pytorch with CapyMOA`" ] }, { "cell_type": "code", "execution_count": 8, "id": "ea9a4d94-7515-424c-a9fd-76c78ddf52d1", "metadata": {}, "outputs": [], "source": [ "from capymoa.base import Classifier\n", "import numpy as np\n", "import torch\n", "from torch import nn\n", "\n", "torch.manual_seed(1)\n", "torch.use_deterministic_algorithms(True)\n", "\n", "# Get cpu device for training.\n", "device = (\"cpu\")\n", "\n", "# Define model\n", "class NeuralNetwork(nn.Module):\n", " def __init__(self, input_size=0, number_of_classes=0):\n", " super().__init__()\n", " self.flatten = nn.Flatten()\n", " self.linear_relu_stack = nn.Sequential(\n", " nn.Linear(input_size, 512),\n", " nn.ReLU(),\n", " nn.Linear(512, 512),\n", " nn.ReLU(),\n", " nn.Linear(512, number_of_classes)\n", " )\n", "\n", " def forward(self, x):\n", " x = self.flatten(x)\n", " logits = self.linear_relu_stack(x)\n", " return logits\n", "\n", "\n", "class PyTorchClassifier(Classifier):\n", " def __init__(self, schema=None, random_seed=1, nn_model: nn.Module = None, optimizer=None, loss_fn=nn.CrossEntropyLoss(), device=(\"cpu\"), lr=1e-3):\n", " super().__init__(schema, random_seed)\n", " self.model = None\n", " self.optimizer = None\n", " self.loss_fn = loss_fn\n", " self.lr = lr\n", " self.device = device\n", " \n", " torch.manual_seed(random_seed)\n", " \n", " if nn_model is None:\n", " self.set_model(None)\n", " else:\n", " self.model = nn_model.to(device)\n", " if optimizer is None:\n", " if self.model is not None:\n", " self.optimizer = torch.optim.SGD(self.model.parameters(), lr=lr)\n", " else:\n", " self.optimizer = optimizer\n", " \n", " def __str__(self):\n", " return str(self.model)\n", "\n", " def CLI_help(self):\n", " return str('schema=None, random_seed=1, nn_model: nn.Module = None, optimizer=None, loss_fn=nn.CrossEntropyLoss(), device=(\"cpu\"), lr=1e-3')\n", "\n", " def set_model(self, instance):\n", " if self.schema is None:\n", " moa_instance = instance.java_instance.getData()\n", " self.model = NeuralNetwork(input_size=moa_instance.get_num_attributes(), number_of_classes=moa_instance.get_num_classes()).to(self.device)\n", " elif instance is not None:\n", " self.model = NeuralNetwork(input_size=self.schema.get_num_attributes(), number_of_classes=self.schema.get_num_classes()).to(self.device)\n", " \n", " def train(self, instance):\n", " if self.model is None:\n", " self.set_model(instance)\n", " \n", " X = torch.tensor(instance.x, dtype=torch.float32)\n", " y = torch.tensor(instance.y_index, dtype=torch.long)\n", " # set the device and add a dimension to the tensor\n", " X, y = torch.unsqueeze(X.to(self.device), 0), torch.unsqueeze(y.to(self.device),0)\n", "\n", " # Compute prediction error\n", " pred = self.model(X)\n", " loss = self.loss_fn(pred, y)\n", " \n", " # Backpropagation\n", " loss.backward()\n", " self.optimizer.step()\n", " self.optimizer.zero_grad()\n", "\n", " def predict(self, instance):\n", " return np.argmax(self.predict_proba(instance))\n", "\n", " def predict_proba(self, instance):\n", " if self.model is None:\n", " self.set_model(instance)\n", " X = torch.unsqueeze(torch.tensor(instance.x, dtype=torch.float32).to(self.device), 0)\n", " # turn off gradient collection\n", " with torch.no_grad():\n", " pred = np.asarray(self.model(X).numpy(), dtype=np.double)\n", " return pred\n" ] }, { "cell_type": "markdown", "id": "2b166ade-23b3-445c-a382-6f0cb6231d66", "metadata": {}, "source": [ "### 5.3 PyTorchClassifier + the test-then-train loop + TensorBoard\n", "* Here we use instance loop to log relevant log information to TensorBoard\n", "* These information can be viewed while the processing is happening using TensorBoard" ] }, { "cell_type": "code", "execution_count": 9, "id": "9e93527d-26cb-4a0b-a4e4-2f3399724502", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Processed 10000 instances\n", "Processed 20000 instances\n", "Processed 30000 instances\n", "Processed 40000 instances\n", "Processed 50000 instances\n", "Processed 60000 instances\n", "Processed 70000 instances\n", "Processed 80000 instances\n", "Processed 90000 instances\n", "Processed 100000 instances\n" ] } ], "source": [ "from capymoa.evaluation import ClassificationEvaluator\n", "from capymoa.datasets import RBFm_100k\n", "from torch.utils.tensorboard import SummaryWriter\n", "\n", "# Create a SummaryWriter instance.\n", "writer = SummaryWriter()\n", "## Opening a file again to start from the beginning\n", "rbf_stream = RBFm_100k()\n", "\n", "# Creating the evaluator\n", "evaluator = ClassificationEvaluator(schema=rbf_stream.get_schema())\n", "\n", "# Creating a learner\n", "simple_pyTorch_classifier = PyTorchClassifier(\n", " schema=rbf_stream.get_schema(), \n", " nn_model=NeuralNetwork(input_size=rbf_stream.get_schema().get_num_attributes(), \n", " number_of_classes=rbf_stream.get_schema().get_num_classes()).to(device)\n", ")\n", "\n", "i = 0\n", "while rbf_stream.has_more_instances():\n", " i += 1\n", " instance = rbf_stream.next_instance()\n", "\n", " prediction = simple_pyTorch_classifier.predict(instance)\n", " evaluator.update(instance.y_index, prediction)\n", " simple_pyTorch_classifier.train(instance)\n", " \n", " if i % 1000 == 0:\n", " writer.add_scalar(\"accuracy\", evaluator.accuracy(), i)\n", "\n", " if i % 10000 == 0:\n", " print(f\"Processed {i} instances\")\n", "\n", "writer.add_scalar(\"accuracy\", evaluator.accuracy(), i)\n", "# Call flush() method to make sure that all pending events have been written to disk.\n", "writer.flush()\n", "\n", "# If you do not need the summary writer anymore, call close() method.\n", "writer.close()" ] }, { "cell_type": "markdown", "id": "9da96643-1900-41e8-96aa-6af460194ac6", "metadata": {}, "source": [ "#### 5.4 Run TensorBoard\n", "Now, start TensorBoard, specifying the root log directory you used above. \n", "Argument ``logdir`` points to directory where TensorBoard will look to find \n", "event files that it can display. TensorBoard will recursively walk \n", "the directory structure rooted at ``logdir``, looking for ``.*tfevents.*`` files.\n", "\n", "```sh\n", "tensorboard --logdir=runs\n", "```\n", "Go to the URL it provides\n", "\n", "This dashboard shows how the accuracy change with time. \n", "You can use it to also track training speed, learning rate, and other \n", "scalar values." ] }, { "cell_type": "markdown", "id": "38b1f9ce-c3a1-4944-8cb1-f2a84cd4ff25", "metadata": {}, "source": [ "## 6. Creating a synthetic stream with concept drifts from MOA\n", "\n", "* Demonstrates the flexibility of the API, these level of manipulation of the API is expected from experienced MOA users.\n", "* To use the API like this the user must be familiar with how concept drifts are simulatd in MOA\n", "\n", "EvaluatePrequential -l trees.HoeffdingAdaptiveTree **-s (ConceptDriftStream -s generators.AgrawalGenerator -d (generators.AgrawalGenerator -f 2) -p 5000)** -e (WindowClassificationPerformanceEvaluator **-w 100**) **-i 10000 -f 100**" ] }, { "cell_type": "code", "execution_count": 10, "id": "7b79ac8e-d7ac-48fb-b983-22301d272364", "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from capymoa.stream import Stream\n", "from capymoa.classifier import OnlineBagging\n", "from capymoa.evaluation import prequential_evaluation\n", "from capymoa.evaluation.visualization import plot_windowed_results\n", "from moa.streams import ConceptDriftStream\n", "\n", "# Using the API to generate the data using the ConceptDriftStream and AgrawalGenerator.\n", "# The drift location is based on the number of instances (5000) as well as the drift width (1000, the default value) \n", "stream_sea1drift = Stream(moa_stream=ConceptDriftStream(), \n", " CLI=\"-s generators.SEAGenerator -d (generators.SEAGenerator -f 2) -p 5000 -w 1000\")\n", "\n", "OB = OnlineBagging(schema=stream_sea1drift.get_schema(), ensemble_size=10)\n", "\n", "results_sea1drift_OB = prequential_evaluation(stream=stream_sea1drift, learner=OB, window_size=100, max_instances=10000)\n", "\n", "plot_windowed_results(results_sea1drift_OB, metric='accuracy')" ] }, { "cell_type": "markdown", "id": "0f2f9fb6-0994-4f3f-aaf3-c73b09847019", "metadata": {}, "source": [ "## 7. Drift, Multi-threated Ensemble and Results\n", "\n", "* Generate a stream with 3 drifts, 2 abrupt and one gradual. \n", "* Evaluate utilising test-then-train (cumulative) and windowed evaluation.\n", "* Execute a multi-threated version of AdaptiveRandomForest.\n", "* For more on multi-threaded ensembles, see **parallel_ensembles.ipynb** notebook" ] }, { "cell_type": "code", "execution_count": 12, "id": "3142e7e7-7175-40da-a89c-b528d71eb00c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "None\n", "Cumulative accuracy = 89.37\n", "wallclock = 14.674108982086182 seconds\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
instancesaccuracykappakappa_tkappa_mf1_scoref1_score_0f1_score_1precisionprecision_0precision_1recallrecall_0recall_1
05000.088.2673.74368774.33318867.09641387.03348082.53496091.15830788.17689787.95180788.40198785.91933877.74663794.092040
110000.088.8875.48265276.52027069.70027287.90317783.93991991.49587088.99012089.30547088.67477086.84246579.18256194.502370
215000.089.2676.18157177.18776670.31509188.23899184.30283591.83766589.31087289.45409489.16765187.19253279.71254894.672516
320000.088.9875.12446875.93886568.58608987.73330883.29796991.77734788.96674388.93203989.00144786.53360678.33523494.731978
425000.089.9077.70605378.15743972.11485488.95520685.40040592.27946889.82979389.62378690.03580088.09748481.55715194.637817
530000.089.5076.47197676.93321670.52217988.41729984.31431192.10882389.66378290.10217189.22539387.20499779.22515495.184840
635000.090.1077.63475378.23219071.53536588.94972584.98635192.61524790.05155389.92297890.18012887.87453480.56354295.185526
740000.089.7077.15909177.67663671.32516788.68715484.96350492.16730089.60145489.31860089.88430787.79132681.01336394.569288
845000.089.5876.65862677.40676570.53167488.46062784.46167692.16187889.51503689.33753989.69253387.43076980.09049894.771040
950000.089.5477.04400577.69722871.63774488.65678885.05287291.95508489.63330889.90936689.35725087.70131780.69414394.708492
\n", "
" ], "text/plain": [ " instances accuracy kappa kappa_t kappa_m f1_score \\\n", "0 5000.0 88.26 73.743687 74.333188 67.096413 87.033480 \n", "1 10000.0 88.88 75.482652 76.520270 69.700272 87.903177 \n", "2 15000.0 89.26 76.181571 77.187766 70.315091 88.238991 \n", "3 20000.0 88.98 75.124468 75.938865 68.586089 87.733308 \n", "4 25000.0 89.90 77.706053 78.157439 72.114854 88.955206 \n", "5 30000.0 89.50 76.471976 76.933216 70.522179 88.417299 \n", "6 35000.0 90.10 77.634753 78.232190 71.535365 88.949725 \n", "7 40000.0 89.70 77.159091 77.676636 71.325167 88.687154 \n", "8 45000.0 89.58 76.658626 77.406765 70.531674 88.460627 \n", "9 50000.0 89.54 77.044005 77.697228 71.637744 88.656788 \n", "\n", " f1_score_0 f1_score_1 precision precision_0 precision_1 recall \\\n", "0 82.534960 91.158307 88.176897 87.951807 88.401987 85.919338 \n", "1 83.939919 91.495870 88.990120 89.305470 88.674770 86.842465 \n", "2 84.302835 91.837665 89.310872 89.454094 89.167651 87.192532 \n", "3 83.297969 91.777347 88.966743 88.932039 89.001447 86.533606 \n", "4 85.400405 92.279468 89.829793 89.623786 90.035800 88.097484 \n", "5 84.314311 92.108823 89.663782 90.102171 89.225393 87.204997 \n", "6 84.986351 92.615247 90.051553 89.922978 90.180128 87.874534 \n", "7 84.963504 92.167300 89.601454 89.318600 89.884307 87.791326 \n", "8 84.461676 92.161878 89.515036 89.337539 89.692533 87.430769 \n", "9 85.052872 91.955084 89.633308 89.909366 89.357250 87.701317 \n", "\n", " recall_0 recall_1 \n", "0 77.746637 94.092040 \n", "1 79.182561 94.502370 \n", "2 79.712548 94.672516 \n", "3 78.335234 94.731978 \n", "4 81.557151 94.637817 \n", "5 79.225154 95.184840 \n", "6 80.563542 95.185526 \n", "7 81.013363 94.569288 \n", "8 80.090498 94.771040 \n", "9 80.694143 94.708492 " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from capymoa.stream.generator import SEA\n", "from capymoa.stream.drift import DriftStream, AbruptDrift, GradualDrift\n", "from capymoa.classifier import AdaptiveRandomForestClassifier\n", "from capymoa.evaluation import prequential_evaluation\n", "from capymoa.evaluation.visualization import plot_windowed_results\n", "\n", "SEA3drifts = DriftStream(stream=[SEA(1), \n", " AbruptDrift(10000),\n", " SEA(2), \n", " GradualDrift(start=20000, end=25000), \n", " SEA(3), \n", " AbruptDrift(45000),\n", " SEA(1)])\n", "\n", "arf = AdaptiveRandomForestClassifier(schema=SEA3drifts.get_schema(), \n", " ensemble_size=100, \n", " number_of_jobs=4)\n", "\n", "results = prequential_evaluation(stream=SEA3drifts, \n", " learner=arf, \n", " window_size=5000, \n", " max_instances=50000)\n", "\n", "print(f\"Cumulative accuracy = {results.cumulative.accuracy()}\")\n", "print(f\"wallclock = {results.wallclock()} seconds\")\n", "display(results.windowed.metrics_per_window())\n", "plot_windowed_results(results, metric='accuracy')" ] }, { "cell_type": "code", "execution_count": null, "id": "c5431a62-64c5-4634-a86e-29a86f2397a7", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.19" } }, "nbformat": 4, "nbformat_minor": 5 }