6. Exploring Advanced Features#

This notebook is target at advanced users that want, among other things, access MOA objects directly using the Python API from capymoa.

  • Examples on how to use any MOA Classifier or Regressor from capymoa

  • An example of how preprocessing (from MOA) can be used.

  • Comparing a SKLearn model against a MOA model

  • A variation of Tutorial 5: Creating a new classifier in CapyMOA which uses MOA learners, thus accessing MOA (Java) objects directly

  • How to log experiments using TensorBoard alongside the PyTorch API. This extends Tutorial 3: Using Pytorch with CapyMOA

  • Creating a synthetic stream with concept drifts using the MOA CLI directly

  • An example utilising a multi-threaded ensemble


More information about CapyMOA can be found in https://www.capymoa.org

last update on 28/07/2024

1. Using any MOA learner#

  • CapyMOA gives you access to any MOA classifier or regressor

  • For some of the MOA learners there are corresponding Python objects (such as the HoeffdingTree or Adaptive Random Forest Classifier). However, MOA has over a hundred learners, and more are added constantly.

  • To allow advanced users to access any MOA learner from CapyMOA, we included the MOAClassifier and MOARegressor generic wrappers.

[2]:
from capymoa.evaluation import prequential_evaluation
from capymoa.base import MOAClassifier
from capymoa.datasets import Electricity

# This is an import from MOA
from moa.classifiers.trees import HoeffdingAdaptiveTree

stream = Electricity()

# Creates a wrapper around the HoeffdingAdaptiveTree, which then can be used as any other capymoa classifier
HAT = MOAClassifier(schema=stream.get_schema(), moa_learner=HoeffdingAdaptiveTree)

results_HAT = prequential_evaluation(stream=stream, learner=HAT, window_size=500)

print(
    f"Cumulative accuracy = {results_HAT['cumulative'].accuracy()}, wall-clock time: {results_HAT['wallclock']}"
)
display(results_HAT["windowed"].metrics_per_window())
Cumulative accuracy = 83.38629943502825, wall-clock time: 0.6487746238708496
instances accuracy kappa kappa_t kappa_m f1_score f1_score_0 f1_score_1 precision precision_0 precision_1 recall recall_0 recall_1
0 500.0 86.0 71.762808 -9.375000 68.888889 85.886082 84.581498 87.179487 85.939394 85.333333 86.545455 85.832836 83.842795 87.822878
1 1000.0 89.2 78.456874 28.947368 78.988327 89.441189 89.285714 89.112903 89.408294 94.142259 84.674330 89.474107 84.905660 94.042553
2 1500.0 95.8 86.827579 66.129032 83.064516 93.435701 89.447236 97.378277 94.263385 91.752577 96.774194 92.622426 87.254902 97.989950
3 2000.0 77.0 54.896301 -47.435897 41.326531 78.794015 75.479744 78.342750 78.232560 64.835165 91.629956 79.363588 90.306122 68.421053
4 2500.0 86.2 71.983109 25.000000 68.636364 85.991852 84.282460 87.700535 86.009685 84.474886 87.544484 85.974026 84.090909 87.857143
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
86 43500.0 84.4 66.158171 20.408163 62.679426 85.193622 77.058824 88.181818 89.430894 100.000000 78.861789 81.339713 62.679426 100.000000
87 44000.0 77.4 35.265811 -32.941176 28.481013 74.117119 44.334975 85.821832 87.582418 100.000000 75.164835 64.240506 28.481013 100.000000
88 44500.0 72.0 39.008452 -105.882353 36.073059 74.346872 53.947368 79.885057 81.729270 96.470588 66.987952 68.187653 37.442922 98.932384
89 45000.0 77.6 52.642706 -77.777778 45.365854 76.539541 70.526316 81.935484 77.362637 76.571429 78.153846 75.733774 65.365854 86.101695
90 45312.0 76.4 52.842253 -38.823529 47.555556 76.613251 75.105485 77.566540 76.446493 70.634921 82.258065 76.780738 80.180180 73.381295

91 rows × 14 columns

1.1 Checking the hyperparameters for the MOA CLI#

  • MOA objects can be parametrized using the MOA CLI (Command Line Interface)

  • Sometimes you may not know the relevent parameters for moa_learner, moa_learner.CLI_help() presents all the hyperparameters available for the moa_learner object.

[3]:
from moa.classifiers.meta import AdaptiveRandomForest

arf = MOAClassifier(schema=stream.get_schema(), moa_learner=AdaptiveRandomForest)

print(arf.CLI_help())
-l treeLearner (default: ARFHoeffdingTree -e 2000000 -g 50 -c 0.01)
Random Forest Tree.
-s ensembleSize (default: 100)
The number of trees.
-o mFeaturesMode (default: Percentage (M * (m / 100)))
Defines how m, defined by mFeaturesPerTreeSize, is interpreted. M represents the total number of features.
-m mFeaturesPerTreeSize (default: 60)
Number of features allowed considered for each split. Negative values corresponds to M - m
-a lambda (default: 6.0)
The lambda parameter for bagging.
-j numberOfJobs (default: 1)
Total number of concurrent jobs used for processing (-1 = as much as possible, 0 = do not use multithreading)
-x driftDetectionMethod (default: ADWINChangeDetector -a 1.0E-3)
Change detector for drifts and its parameters
-p warningDetectionMethod (default: ADWINChangeDetector -a 1.0E-2)
Change detector for warnings (start training bkg learner)
-w disableWeightedVote
Should use weighted voting?
-u disableDriftDetection
Should use drift detection? If disabled then bkg learner is also disabled
-q disableBackgroundLearner
Should use bkg learner? If disabled then reset tree immediately.

2. Using preprocessing from MOA (filters)#

We are working on a more user friendly API for preprocessing, this example just show how one can do that using MOA filters from here

  • Here we use NormalisationFilter filter from MOA to normalize instances in an online fashion.

  • MOA filters syntax wraps the whole stream, so we are always composing commands like `Filter(Stream,

  • We obtain the MOA CLI from the rbf_100k stream, since it can be mapped to a MOA stream, it is possible to obtain that. Comment out the print statements if you would like to inspect the actual creation strings (perhaps to copy and paste that into MOA?)

[4]:
from capymoa.stream import Stream
from capymoa.classifier import OnlineBagging
from capymoa.evaluation import prequential_evaluation
from capymoa.datasets import Electricity, get_download_dir

from moa.streams import FilteredStream

stream = Electricity()
cli = (
    f"-s (ArffFileStream -f {get_download_dir() / Electricity._filename}) "
    f" -f NormalisationFilter"
)
print(cli)
# Create a FilterStream and use the NormalisationFilter
rbf_stream_normalised = Stream(CLI=cli, moa_stream=FilteredStream())

# print(f'MOA creation string for filtered version: {rbf_stream_normalised.moa_stream.getCLICreationString(rbf_stream_normalised.moa_stream.__class__)}')

ob_learner_norm = OnlineBagging(
    schema=rbf_stream_normalised.get_schema(), ensemble_size=5
)
ob_learner = OnlineBagging(schema=stream.get_schema(), ensemble_size=5)

ob_results_norm = prequential_evaluation(
    stream=rbf_stream_normalised, learner=ob_learner_norm
)
ob_results = prequential_evaluation(stream=stream, learner=ob_learner)


print(f"Accuracy with online normalization: {ob_results_norm['cumulative'].accuracy()}")
print(f"Accuracy without normalization: {ob_results['cumulative'].accuracy()}")
-s (ArffFileStream -f /local/scratch/antonlee/datasets/electricity.arff)  -f NormalisationFilter
Accuracy with online normalization: 80.53937146892656
Accuracy without normalization: 82.06656073446328

3. Comparing a MOA and SKLearn models#

  • This simple example shows how it is simple to compare a MOA and a SKLearn regressors.

  • For the sake of this example, we are using the wrappers

  • SKClassifier (and SKRegressor) are parametrized directly as part of the object initialization

  • MOAClassifier (and MOARegressor) are parametrized through a CLI (a separate parameter)

[5]:
from capymoa.base import SKClassifier, MOAClassifier
from capymoa.datasets import CovtypeTiny
from capymoa.evaluation import prequential_evaluation_multiple_learners
from capymoa.evaluation.visualization import plot_windowed_results

from sklearn.linear_model import SGDClassifier
from moa.classifiers.trees import HoeffdingTree

covt_tiny = CovtypeTiny()

sk_sgd = SKClassifier(
    schema=covt_tiny.schema,
    sklearner=SGDClassifier(loss="log_loss", penalty="l1", alpha=0.001),
)
moa_ht = MOAClassifier(schema=covt_tiny.schema, moa_learner=HoeffdingTree, CLI="-g 50")

results = prequential_evaluation_multiple_learners(
    stream=covt_tiny, learners={"sk_sgd": sk_sgd, "moa_ht": moa_ht}, window_size=100
)
plot_windowed_results(results["sk_sgd"], results["moa_ht"], metric="accuracy")
../_images/notebooks_06_advanced_API_9_0.png

4. Creating Python learners with MOA Objects#

  • This example follow the example from 06_new_learner which shows how to create a custom online bagging implementation.

  • Here we also create an online bagging implementation, but the base_learner is a MOA class

[6]:
from capymoa.base import Classifier, MOAClassifier
from moa.classifiers.trees import HoeffdingTree
from collections import Counter
import numpy as np
import random
import math


def poisson(lambd, random_generator):
    if lambd < 100.0:
        product = 1.0
        _sum = 1.0
        threshold = random_generator.random() * math.exp(lambd)
        i = 1
        max_val = max(100, 10 * math.ceil(lambd))
        while i < max_val and _sum <= threshold:
            product *= lambd / i
            _sum += product
            i += 1
        return i - 1
    x = lambd + math.sqrt(lambd) * random_generator.gauss(0, 1)
    if x < 0.0:
        return 0
    return int(math.floor(x))


class CustomOnlineBagging(Classifier):
    def __init__(
        self,
        schema=None,
        random_seed=1,
        ensemble_size=5,
        moa_base_learner_class=None,
        CLI_base_learner=None,
    ):
        super().__init__(schema=schema, random_seed=random_seed)

        self.random_generator = random.Random()
        self.CLI_base_learner = CLI_base_learner

        self.ensemble_size = ensemble_size
        self.moa_base_learner_class = moa_base_learner_class

        # Default base learner if None is specified
        if self.moa_base_learner_class is None:
            self.moa_base_learner_class = HoeffdingTree

        self.ensemble = []
        # Create several instances for the base_learners
        for i in range(self.ensemble_size):
            self.ensemble.append(
                MOAClassifier(
                    schema=self.schema,
                    moa_learner=self.moa_base_learner_class(),
                    CLI=self.CLI_base_learner,
                )
            )

    def __str__(self):
        return "CustomOnlineBagging"

    def train(self, instance):
        for i in range(self.ensemble_size):
            k = poisson(1.0, self.random_generator)
            for _ in range(k):
                self.ensemble[i].train(instance)

    def predict(self, instance):
        predictions = []
        for i in range(self.ensemble_size):
            predictions.append(self.ensemble[i].predict(instance))
        majority_vote = Counter(predictions)
        prediction = majority_vote.most_common(1)[0][0]
        return prediction

    def predict_proba(self, instance):
        probabilities = []
        for i in range(self.ensemble_size):
            classifier_proba = self.ensemble[i].predict_proba(instance)
            classifier_proba = classifier_proba / np.sum(classifier_proba)
            probabilities.append(classifier_proba)
        avg_proba = np.mean(probabilities, axis=0)
        return avg_proba

4.1 Testing the custom online bagging#

  • We choose to use an HoeffdingAdaptiveTree from MOA as the base learner

  • We also specify the CLI commands to configure the base learner

[7]:
from capymoa.evaluation import prequential_evaluation
from capymoa.datasets import Electricity
from moa.classifiers.trees import HoeffdingAdaptiveTree

elec_stream = Electricity()

# Creating a learner: using a hoeffding adaptive tree as the base learner with grace period of 50 (-g 50)
NEW_OB = CustomOnlineBagging(
    schema=elec_stream.get_schema(),
    ensemble_size=5,
    moa_base_learner_class=HoeffdingAdaptiveTree,
    CLI_base_learner="-g 50",
)

results_NEW_OB = prequential_evaluation(
    stream=elec_stream, learner=NEW_OB, window_size=4500
)

print(f"Accuracy: {results_NEW_OB.cumulative.accuracy()}")
Accuracy: 85.89556850282486

5. Using TensorBoard with PyTorch in CapyMOA#

  • One can use TensorBoard to visualize logged data in an online fashion

  • We go through all the steps below, including installing TensorBoard

5.1 Install TensorBoard#

Clear any logs from previous runs

rm -rf ./runs
[8]:
!pip install tensorboard
Requirement already satisfied: tensorboard in /local/scratch/antonlee/miniconda3/envs/capymoa/lib/python3.9/site-packages (2.17.1)
Requirement already satisfied: absl-py>=0.4 in /local/scratch/antonlee/miniconda3/envs/capymoa/lib/python3.9/site-packages (from tensorboard) (2.1.0)
Requirement already satisfied: grpcio>=1.48.2 in /local/scratch/antonlee/miniconda3/envs/capymoa/lib/python3.9/site-packages (from tensorboard) (1.66.1)
Requirement already satisfied: markdown>=2.6.8 in /local/scratch/antonlee/miniconda3/envs/capymoa/lib/python3.9/site-packages (from tensorboard) (3.7)
Requirement already satisfied: numpy>=1.12.0 in /local/scratch/antonlee/miniconda3/envs/capymoa/lib/python3.9/site-packages (from tensorboard) (1.26.3)
Requirement already satisfied: packaging in /local/scratch/antonlee/miniconda3/envs/capymoa/lib/python3.9/site-packages (from tensorboard) (24.1)
Requirement already satisfied: protobuf!=4.24.0,>=3.19.6 in /local/scratch/antonlee/miniconda3/envs/capymoa/lib/python3.9/site-packages (from tensorboard) (5.28.2)
Requirement already satisfied: setuptools>=41.0.0 in /local/scratch/antonlee/miniconda3/envs/capymoa/lib/python3.9/site-packages (from tensorboard) (69.5.1)
Requirement already satisfied: six>1.9 in /local/scratch/antonlee/miniconda3/envs/capymoa/lib/python3.9/site-packages (from tensorboard) (1.16.0)
Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in /local/scratch/antonlee/miniconda3/envs/capymoa/lib/python3.9/site-packages (from tensorboard) (0.7.2)
Requirement already satisfied: werkzeug>=1.0.1 in /local/scratch/antonlee/miniconda3/envs/capymoa/lib/python3.9/site-packages (from tensorboard) (3.0.4)
Requirement already satisfied: importlib-metadata>=4.4 in /local/scratch/antonlee/miniconda3/envs/capymoa/lib/python3.9/site-packages (from markdown>=2.6.8->tensorboard) (7.2.1)
Requirement already satisfied: MarkupSafe>=2.1.1 in /local/scratch/antonlee/miniconda3/envs/capymoa/lib/python3.9/site-packages (from werkzeug>=1.0.1->tensorboard) (2.1.5)
Requirement already satisfied: zipp>=0.5 in /local/scratch/antonlee/miniconda3/envs/capymoa/lib/python3.9/site-packages (from importlib-metadata>=4.4->markdown>=2.6.8->tensorboard) (3.19.2)

5.2 PyTorchClassifier#

  • We define PyTorchClassifier and NeuralNetwork classes similarly to those from Tutorial 3: Using Pytorch with CapyMOA

[9]:
from capymoa.base import Classifier
import torch
from torch import nn

torch.manual_seed(1)
torch.use_deterministic_algorithms(True)

# Get cpu device for training.
device = "cpu"


# Define model
class NeuralNetwork(nn.Module):
    def __init__(self, input_size=0, number_of_classes=0):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(input_size, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, number_of_classes),
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits


class PyTorchClassifier(Classifier):
    def __init__(
        self,
        schema=None,
        random_seed=1,
        nn_model: nn.Module = None,
        optimizer=None,
        loss_fn=nn.CrossEntropyLoss(),
        device=("cpu"),
        lr=1e-3,
    ):
        super().__init__(schema, random_seed)
        self.model = None
        self.optimizer = None
        self.loss_fn = loss_fn
        self.lr = lr
        self.device = device

        torch.manual_seed(random_seed)

        if nn_model is None:
            self.set_model(None)
        else:
            self.model = nn_model.to(device)
        if optimizer is None:
            if self.model is not None:
                self.optimizer = torch.optim.SGD(self.model.parameters(), lr=lr)
        else:
            self.optimizer = optimizer

    def __str__(self):
        return str(self.model)

    def CLI_help(self):
        return str(
            'schema=None, random_seed=1, nn_model: nn.Module = None, optimizer=None, loss_fn=nn.CrossEntropyLoss(), device=("cpu"), lr=1e-3'
        )

    def set_model(self, instance):
        if self.schema is None:
            moa_instance = instance.java_instance.getData()
            self.model = NeuralNetwork(
                input_size=moa_instance.get_num_attributes(),
                number_of_classes=moa_instance.get_num_classes(),
            ).to(self.device)
        elif instance is not None:
            self.model = NeuralNetwork(
                input_size=self.schema.get_num_attributes(),
                number_of_classes=self.schema.get_num_classes(),
            ).to(self.device)

    def train(self, instance):
        if self.model is None:
            self.set_model(instance)

        X = torch.tensor(instance.x, dtype=torch.float32)
        y = torch.tensor(instance.y_index, dtype=torch.long)
        # set the device and add a dimension to the tensor
        X, y = (
            torch.unsqueeze(X.to(self.device), 0),
            torch.unsqueeze(y.to(self.device), 0),
        )

        # Compute prediction error
        pred = self.model(X)
        loss = self.loss_fn(pred, y)

        # Backpropagation
        loss.backward()
        self.optimizer.step()
        self.optimizer.zero_grad()

    def predict(self, instance):
        return np.argmax(self.predict_proba(instance))

    def predict_proba(self, instance):
        if self.model is None:
            self.set_model(instance)
        X = torch.unsqueeze(
            torch.tensor(instance.x, dtype=torch.float32).to(self.device), 0
        )
        # turn off gradient collection
        with torch.no_grad():
            pred = np.asarray(self.model(X).numpy(), dtype=np.double)
        return pred

5.3 PyTorchClassifier + the test-then-train loop + TensorBoard#

  • Here we use instance loop to log relevant log information to TensorBoard

  • These information can be viewed while the processing is happening using TensorBoard

[10]:
from capymoa.evaluation import ClassificationEvaluator
from capymoa.datasets import Electricity
from torch.utils.tensorboard import SummaryWriter

# Create a SummaryWriter instance.
writer = SummaryWriter()
## Opening a file again to start from the beginning
stream = Electricity()

# Creating the evaluator
evaluator = ClassificationEvaluator(schema=stream.get_schema())

# Creating a learner
simple_pyTorch_classifier = PyTorchClassifier(
    schema=stream.get_schema(),
    nn_model=NeuralNetwork(
        input_size=stream.get_schema().get_num_attributes(),
        number_of_classes=stream.get_schema().get_num_classes(),
    ).to(device),
)

i = 0
while stream.has_more_instances():
    i += 1
    instance = stream.next_instance()

    prediction = simple_pyTorch_classifier.predict(instance)
    evaluator.update(instance.y_index, prediction)
    simple_pyTorch_classifier.train(instance)

    if i % 1000 == 0:
        writer.add_scalar("accuracy", evaluator.accuracy(), i)

    if i % 10000 == 0:
        print(f"Processed {i} instances")

writer.add_scalar("accuracy", evaluator.accuracy(), i)
# Call flush() method to make sure that all pending events have been written to disk.
writer.flush()

# If you do not need the summary writer anymore, call close() method.
writer.close()
Processed 10000 instances
Processed 20000 instances
Processed 30000 instances
Processed 40000 instances

5.4 Run TensorBoard#

Now, start TensorBoard, specifying the root log directory you used above. Argument logdir points to directory where TensorBoard will look to find event files that it can display. TensorBoard will recursively walk the directory structure rooted at logdir, looking for .*tfevents.* files.

tensorboard --logdir=runs

Go to the URL it provides

This dashboard shows how the accuracy change with time. You can use it to also track training speed, learning rate, and other scalar values.

6. Creating a synthetic stream with concept drifts from MOA#

  • Demonstrates the flexibility of the API, these level of manipulation of the API is expected from experienced MOA users.

  • To use the API like this the user must be familiar with how concept drifts are simulatd in MOA

EvaluatePrequential -l trees.HoeffdingAdaptiveTree -s (ConceptDriftStream -s generators.AgrawalGenerator -d (generators.AgrawalGenerator -f 2) -p 5000) -e (WindowClassificationPerformanceEvaluator -w 100) -i 10000 -f 100

[11]:
from capymoa.stream import Stream
from capymoa.classifier import OnlineBagging
from capymoa.evaluation import prequential_evaluation
from capymoa.evaluation.visualization import plot_windowed_results
from moa.streams import ConceptDriftStream

# Using the API to generate the data using the ConceptDriftStream and AgrawalGenerator.
# The drift location is based on the number of instances (5000) as well as the drift width (1000, the default value)
stream_sea1drift = Stream(
    moa_stream=ConceptDriftStream(),
    CLI="-s generators.SEAGenerator -d (generators.SEAGenerator -f 2) -p 5000 -w 1000",
)

OB = OnlineBagging(schema=stream_sea1drift.get_schema(), ensemble_size=10)

results_sea1drift_OB = prequential_evaluation(
    stream=stream_sea1drift, learner=OB, window_size=100, max_instances=10000
)

plot_windowed_results(results_sea1drift_OB, metric="accuracy")
../_images/notebooks_06_advanced_API_23_0.png

7. Drift, Multi-threated Ensemble and Results#

  • Generate a stream with 3 drifts, 2 abrupt and one gradual.

  • Evaluate utilising test-then-train (cumulative) and windowed evaluation.

  • Execute a multi-threated version of AdaptiveRandomForest.

  • For more on multi-threaded ensembles, see parallel_ensembles.ipynb notebook

[12]:
from capymoa.stream.generator import SEA
from capymoa.stream.drift import DriftStream, AbruptDrift, GradualDrift
from capymoa.classifier import AdaptiveRandomForestClassifier
from capymoa.evaluation import prequential_evaluation
from capymoa.evaluation.visualization import plot_windowed_results

SEA3drifts = DriftStream(
    stream=[
        SEA(1),
        AbruptDrift(10000),
        SEA(2),
        GradualDrift(start=20000, end=25000),
        SEA(3),
        AbruptDrift(45000),
        SEA(1),
    ]
)

arf = AdaptiveRandomForestClassifier(
    schema=SEA3drifts.get_schema(), ensemble_size=100, number_of_jobs=4
)

results = prequential_evaluation(
    stream=SEA3drifts, learner=arf, window_size=5000, max_instances=50000
)

print(f"Cumulative accuracy = {results.cumulative.accuracy()}")
print(f"wallclock = {results.wallclock()} seconds")
display(results.windowed.metrics_per_window())
plot_windowed_results(results, metric="accuracy")
None
Cumulative accuracy = 89.346
wallclock = 10.006356000900269 seconds
instances accuracy kappa kappa_t kappa_m f1_score f1_score_0 f1_score_1 precision precision_0 precision_1 recall recall_0 recall_1
0 5000.0 88.26 73.743687 74.333188 67.096413 87.033480 82.534960 91.158307 88.176897 87.951807 88.401987 85.919338 77.746637 94.092040
1 10000.0 88.90 75.530516 76.572393 69.771242 87.928724 83.973433 91.509867 89.020852 89.366933 88.674770 86.863069 79.193900 94.532238
2 15000.0 89.32 76.502635 77.228145 71.009772 88.411372 84.646348 91.812328 89.488370 89.975550 89.001189 87.359989 79.913138 94.806840
3 20000.0 88.46 74.512585 75.457252 68.348875 87.397210 83.280209 91.189495 88.410300 88.267813 88.552788 86.407075 78.826111 93.988039
4 25000.0 89.96 77.605382 77.865961 71.718310 88.910925 85.165485 92.412334 89.854565 89.558732 90.150398 87.986898 81.183099 94.790698
5 30000.0 89.18 75.897805 76.508901 69.759642 88.083152 84.046004 91.814193 89.119564 88.951311 89.287816 87.070568 79.653438 94.487699
6 35000.0 89.42 76.037829 76.063348 69.046226 88.089201 83.896499 92.122115 88.884746 87.436548 90.332944 87.307770 80.631949 93.983592
7 40000.0 89.86 77.447069 77.879581 71.786311 88.868885 85.092620 92.317018 89.952864 90.211970 89.693757 87.810720 80.523094 95.098345
8 45000.0 90.18 78.293807 79.247675 73.125342 89.320332 85.739181 92.511819 90.482147 91.336634 89.627660 88.187975 80.788177 95.587772
9 50000.0 89.92 77.551851 77.962396 71.764706 88.897167 85.150265 92.370572 89.890396 89.807334 89.973459 87.925646 80.952381 94.898911
../_images/notebooks_06_advanced_API_25_2.png

8. AutoML with AutoClass#

The following example shows how to use the AutoClass algorithm using CapyMOA. * AutoClass is configured using a json configuration file configuration_json and a list of classifiers base_classifiers * AutoClass can also be configured with either a list of strings base_classifiers representing the MOA classifiers. This approach is only enticing for people that are very familiar with MOA. * In the example below, we also compare it against using the base classifiers individually

[13]:
from capymoa.evaluation import prequential_evaluation
from capymoa.datasets import RBFm_100k
from capymoa.automl import AutoClass
from capymoa.classifier import HoeffdingTree, HoeffdingAdaptiveTree, KNN
from capymoa.evaluation.visualization import plot_windowed_results

rbf_100k = RBFm_100k()

max_instances = 25000
window_size = 2500

ht = HoeffdingTree(schema=rbf_100k.get_schema())
hat = HoeffdingAdaptiveTree(schema=rbf_100k.get_schema())
knn = KNN(schema=rbf_100k.get_schema())
autoclass = AutoClass(
    schema=rbf_100k.get_schema(),
    configuration_json="./settings_autoclass.json",
    base_classifiers=[KNN, HoeffdingAdaptiveTree, HoeffdingTree],
)

results_ht = prequential_evaluation(
    stream=rbf_100k, learner=ht, window_size=window_size, max_instances=max_instances
)
results_hat = prequential_evaluation(
    stream=rbf_100k, learner=hat, window_size=window_size, max_instances=max_instances
)
results_knn = prequential_evaluation(
    stream=rbf_100k, learner=knn, window_size=window_size, max_instances=max_instances
)
results_autoclass = prequential_evaluation(
    stream=rbf_100k,
    learner=autoclass,
    window_size=window_size,
    max_instances=max_instances,
)

print(
    f"[HT] Cumulative accuracy = {results_ht.accuracy()}, wall-clock time: {results_ht.wallclock()}"
)
print(
    f"[HAT] Cumulative accuracy = {results_hat.accuracy()}, wall-clock time: {results_hat.wallclock()}"
)
print(
    f"[KNN] Cumulative accuracy = {results_knn.accuracy()}, wall-clock time: {results_knn.wallclock()}"
)
print(
    f"[AUTOCLASS] Cumulative accuracy = {results_autoclass.accuracy()}, wall-clock time: {results_autoclass.wallclock()}"
)
plot_windowed_results(
    results_ht, results_knn, results_hat, results_autoclass, metric="accuracy"
)
[HT] Cumulative accuracy = 53.396, wall-clock time: 0.25939011573791504
[HAT] Cumulative accuracy = 57.676, wall-clock time: 0.3388500213623047
[KNN] Cumulative accuracy = 86.956, wall-clock time: 2.3922243118286133
[AUTOCLASS] Cumulative accuracy = 86.268, wall-clock time: 68.99542450904846
../_images/notebooks_06_advanced_API_27_1.png

8.1 AutoClass alternative syntax#

Another way to configure the learners is by using a list of strings base_classifiers representing the MOA classifiers

[14]:
from capymoa.automl import AutoClass
from capymoa.datasets import RBFm_100k
from capymoa.classifier import KNN, HoeffdingTree, HoeffdingAdaptiveTree, OnlineBagging
from capymoa.evaluation import prequential_evaluation
from capymoa.evaluation.visualization import plot_windowed_results

rbf_100k = RBFm_100k()

autoclass = AutoClass(
    schema=rbf_100k.get_schema(),
    configuration_json="./settings_autoclass.json",
    base_classifiers=[KNN, HoeffdingTree, HoeffdingAdaptiveTree],
)

autoclass_MOAStrings = AutoClass(
    schema=rbf_100k.get_schema(),
    configuration_json="./settings_autoclass.json",
    base_classifiers=["lazy.kNN", "trees.HoeffdingTree", "trees.HoeffdingAdaptiveTree"],
)

results_autoClass = prequential_evaluation(
    stream=rbf_100k, learner=autoclass, window_size=100, max_instances=500
)
results_autoclass_MOAStrings = prequential_evaluation(
    stream=rbf_100k, learner=autoclass_MOAStrings, window_size=100, max_instances=500
)

results_autoclass_MOAStrings.learner = "AutoClass_MOAStrings"

plot_windowed_results(
    results_autoClass, results_autoclass_MOAStrings, metric="accuracy"
)
../_images/notebooks_06_advanced_API_29_0.png
[ ]: