6. Exploring Advanced Features#

This notebook is target at advanced users that want, among other things, access MOA objects directly using the Python API from capymoa.

Examples on how to use any MOA Classifier or Regressor from capymoa
An example of how preprocessing (from MOA) can be used.
Comparing a SKLearn model against a MOA model
A variation of Tutorial 5: Creating a new classifier in CapyMOA which uses MOA learners, thus accessing MOA (Java) objects directly
How to log experiments using TensorBoard alongside the PyTorch API. This extends Tutorial 3: Using Pytorch with CapyMOA
Creating a synthetic stream with concept drifts using the MOA CLI directly
An example utilising a multi-threaded ensemble

More information about CapyMOA can be found in https://www.capymoa.org

last update on 28/07/2024

1. Using any MOA learner#

CapyMOA gives you access to any MOA classifier or regressor
For some of the MOA learners there are corresponding Python objects (such as the HoeffdingTree or Adaptive Random Forest Classifier). However, MOA has over a hundred learners, and more are added constantly.
To allow advanced users to access any MOA learner from CapyMOA, we included the MOAClassifier and MOARegressor generic wrappers.

[2]:

from capymoa.evaluation import prequential_evaluation
from capymoa.base import MOAClassifier
from capymoa.datasets import Electricity

# This is an import from MOA
from moa.classifiers.trees import HoeffdingAdaptiveTree

stream = Electricity()

# Creates a wrapper around the HoeffdingAdaptiveTree, which then can be used as any other capymoa classifier
HAT = MOAClassifier(schema=stream.get_schema(), moa_learner=HoeffdingAdaptiveTree)

results_HAT = prequential_evaluation(stream=stream, learner=HAT, window_size=500)

print(
    f"Cumulative accuracy = {results_HAT['cumulative'].accuracy()}, wall-clock time: {results_HAT['wallclock']}"
)
display(results_HAT["windowed"].metrics_per_window())

Cumulative accuracy = 83.38629943502825, wall-clock time: 0.6487746238708496

	instances	accuracy	kappa	kappa_t	kappa_m	f1_score	f1_score_0	f1_score_1	precision	precision_0	precision_1	recall	recall_0	recall_1
0	500.0	86.0	71.762808	-9.375000	68.888889	85.886082	84.581498	87.179487	85.939394	85.333333	86.545455	85.832836	83.842795	87.822878
1	1000.0	89.2	78.456874	28.947368	78.988327	89.441189	89.285714	89.112903	89.408294	94.142259	84.674330	89.474107	84.905660	94.042553
2	1500.0	95.8	86.827579	66.129032	83.064516	93.435701	89.447236	97.378277	94.263385	91.752577	96.774194	92.622426	87.254902	97.989950
3	2000.0	77.0	54.896301	-47.435897	41.326531	78.794015	75.479744	78.342750	78.232560	64.835165	91.629956	79.363588	90.306122	68.421053
4	2500.0	86.2	71.983109	25.000000	68.636364	85.991852	84.282460	87.700535	86.009685	84.474886	87.544484	85.974026	84.090909	87.857143
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
86	43500.0	84.4	66.158171	20.408163	62.679426	85.193622	77.058824	88.181818	89.430894	100.000000	78.861789	81.339713	62.679426	100.000000
87	44000.0	77.4	35.265811	-32.941176	28.481013	74.117119	44.334975	85.821832	87.582418	100.000000	75.164835	64.240506	28.481013	100.000000
88	44500.0	72.0	39.008452	-105.882353	36.073059	74.346872	53.947368	79.885057	81.729270	96.470588	66.987952	68.187653	37.442922	98.932384
89	45000.0	77.6	52.642706	-77.777778	45.365854	76.539541	70.526316	81.935484	77.362637	76.571429	78.153846	75.733774	65.365854	86.101695
90	45312.0	76.4	52.842253	-38.823529	47.555556	76.613251	75.105485	77.566540	76.446493	70.634921	82.258065	76.780738	80.180180	73.381295

91 rows × 14 columns

1.1 Checking the hyperparameters for the MOA CLI#

MOA objects can be parametrized using the MOA CLI (Command Line Interface)
Sometimes you may not know the relevent parameters for moa_learner, moa_learner.cli_help() presents all the hyperparameters available for the moa_learner object.

[3]:

from moa.classifiers.meta import AdaptiveRandomForest

arf = MOAClassifier(schema=stream.get_schema(), moa_learner=AdaptiveRandomForest)

print(arf.cli_help())

-l treeLearner (default: ARFHoeffdingTree -e 2000000 -g 50 -c 0.01)
Random Forest Tree.
-s ensembleSize (default: 100)
The number of trees.
-o mFeaturesMode (default: Percentage (M * (m / 100)))
Defines how m, defined by mFeaturesPerTreeSize, is interpreted. M represents the total number of features.
-m mFeaturesPerTreeSize (default: 60)
Number of features allowed considered for each split. Negative values corresponds to M - m
-a lambda (default: 6.0)
The lambda parameter for bagging.
-j numberOfJobs (default: 1)
Total number of concurrent jobs used for processing (-1 = as much as possible, 0 = do not use multithreading)
-x driftDetectionMethod (default: ADWINChangeDetector -a 1.0E-3)
Change detector for drifts and its parameters
-p warningDetectionMethod (default: ADWINChangeDetector -a 1.0E-2)
Change detector for warnings (start training bkg learner)
-w disableWeightedVote
Should use weighted voting?
-u disableDriftDetection
Should use drift detection? If disabled then bkg learner is also disabled
-q disableBackgroundLearner
Should use bkg learner? If disabled then reset tree immediately.

2. Using preprocessing from MOA (filters)#

We are working on a more user friendly API for preprocessing, this example just show how one can do that using MOA filters from here

Here we use NormalisationFilter filter from MOA to normalize instances in an online fashion.
MOA filters syntax wraps the whole stream, so we are always composing commands like `Filter(Stream,
We obtain the MOA CLI from the rbf_100k stream, since it can be mapped to a MOA stream, it is possible to obtain that. Comment out the print statements if you would like to inspect the actual creation strings (perhaps to copy and paste that into MOA?)

[4]:

from capymoa.stream import Stream
from capymoa.classifier import OnlineBagging
from capymoa.evaluation import prequential_evaluation
from capymoa.datasets import Electricity, get_download_dir

from moa.streams import FilteredStream

stream = Electricity()
cli = (
    f"-s (ArffFileStream -f {get_download_dir() / Electricity._filename}) "
    f" -f NormalisationFilter"
)
print(cli)
# Create a FilterStream and use the NormalisationFilter
rbf_stream_normalised = Stream(CLI=cli, moa_stream=FilteredStream())

# print(f'MOA creation string for filtered version: {rbf_stream_normalised.moa_stream.getCLICreationString(rbf_stream_normalised.moa_stream.__class__)}')

ob_learner_norm = OnlineBagging(
    schema=rbf_stream_normalised.get_schema(), ensemble_size=5
)
ob_learner = OnlineBagging(schema=stream.get_schema(), ensemble_size=5)

ob_results_norm = prequential_evaluation(
    stream=rbf_stream_normalised, learner=ob_learner_norm
)
ob_results = prequential_evaluation(stream=stream, learner=ob_learner)


print(f"Accuracy with online normalization: {ob_results_norm['cumulative'].accuracy()}")
print(f"Accuracy without normalization: {ob_results['cumulative'].accuracy()}")

-s (ArffFileStream -f /local/scratch/antonlee/datasets/electricity.arff)  -f NormalisationFilter
Accuracy with online normalization: 80.53937146892656
Accuracy without normalization: 82.06656073446328

3. Comparing a MOA and SKLearn models#

This simple example shows how it is simple to compare a MOA and a SKLearn regressors.
For the sake of this example, we are using the wrappers
SKClassifier (and SKRegressor) are parametrized directly as part of the object initialization
MOAClassifier (and MOARegressor) are parametrized through a CLI (a separate parameter)

[5]:

from capymoa.base import SKClassifier, MOAClassifier
from capymoa.datasets import CovtypeTiny
from capymoa.evaluation import prequential_evaluation_multiple_learners
from capymoa.evaluation.visualization import plot_windowed_results

from sklearn.linear_model import SGDClassifier
from moa.classifiers.trees import HoeffdingTree

covt_tiny = CovtypeTiny()

sk_sgd = SKClassifier(
    schema=covt_tiny.schema,
    sklearner=SGDClassifier(loss="log_loss", penalty="l1", alpha=0.001),
)
moa_ht = MOAClassifier(schema=covt_tiny.schema, moa_learner=HoeffdingTree, CLI="-g 50")

results = prequential_evaluation_multiple_learners(
    stream=covt_tiny, learners={"sk_sgd": sk_sgd, "moa_ht": moa_ht}, window_size=100
)
plot_windowed_results(results["sk_sgd"], results["moa_ht"], metric="accuracy")

../_images/notebooks_06_advanced_API_9_0.png

4. Creating Python learners with MOA Objects#

This example follow the example from 06_new_learner which shows how to create a custom online bagging implementation.
Here we also create an online bagging implementation, but the base_learner is a MOA class

[6]:

from capymoa.base import Classifier, MOAClassifier
from moa.classifiers.trees import HoeffdingTree
from collections import Counter
import numpy as np
import random
import math


def poisson(lambd, random_generator):
    if lambd < 100.0:
        product = 1.0
        _sum = 1.0
        threshold = random_generator.random() * math.exp(lambd)
        i = 1
        max_val = max(100, 10 * math.ceil(lambd))
        while i < max_val and _sum <= threshold:
            product *= lambd / i
            _sum += product
            i += 1
        return i - 1
    x = lambd + math.sqrt(lambd) * random_generator.gauss(0, 1)
    if x < 0.0:
        return 0
    return int(math.floor(x))


class CustomOnlineBagging(Classifier):
    def __init__(
        self,
        schema=None,
        random_seed=1,
        ensemble_size=5,
        moa_base_learner_class=None,
        CLI_base_learner=None,
    ):
        super().__init__(schema=schema, random_seed=random_seed)

        self.random_generator = random.Random()
        self.CLI_base_learner = CLI_base_learner

        self.ensemble_size = ensemble_size
        self.moa_base_learner_class = moa_base_learner_class

        # Default base learner if None is specified
        if self.moa_base_learner_class is None:
            self.moa_base_learner_class = HoeffdingTree

        self.ensemble = []
        # Create several instances for the base_learners
        for i in range(self.ensemble_size):
            self.ensemble.append(
                MOAClassifier(
                    schema=self.schema,
                    moa_learner=self.moa_base_learner_class(),
                    CLI=self.CLI_base_learner,
                )
            )

    def __str__(self):
        return "CustomOnlineBagging"

    def train(self, instance):
        for i in range(self.ensemble_size):
            k = poisson(1.0, self.random_generator)
            for _ in range(k):
                self.ensemble[i].train(instance)

    def predict(self, instance):
        predictions = []
        for i in range(self.ensemble_size):
            predictions.append(self.ensemble[i].predict(instance))
        majority_vote = Counter(predictions)
        prediction = majority_vote.most_common(1)[0][0]
        return prediction

    def predict_proba(self, instance):
        probabilities = []
        for i in range(self.ensemble_size):
            classifier_proba = self.ensemble[i].predict_proba(instance)
            classifier_proba = classifier_proba / np.sum(classifier_proba)
            probabilities.append(classifier_proba)
        avg_proba = np.mean(probabilities, axis=0)
        return avg_proba

4.1 Testing the custom online bagging#

We choose to use an HoeffdingAdaptiveTree from MOA as the base learner
We also specify the CLI commands to configure the base learner

[7]:

from capymoa.evaluation import prequential_evaluation
from capymoa.datasets import Electricity
from moa.classifiers.trees import HoeffdingAdaptiveTree

elec_stream = Electricity()

# Creating a learner: using a hoeffding adaptive tree as the base learner with grace period of 50 (-g 50)
NEW_OB = CustomOnlineBagging(
    schema=elec_stream.get_schema(),
    ensemble_size=5,
    moa_base_learner_class=HoeffdingAdaptiveTree,
    CLI_base_learner="-g 50",
)

results_NEW_OB = prequential_evaluation(
    stream=elec_stream, learner=NEW_OB, window_size=4500
)

print(f"Accuracy: {results_NEW_OB.cumulative.accuracy()}")

Accuracy: 85.89556850282486

5. Using TensorBoard with PyTorch in CapyMOA#

One can use TensorBoard to visualize logged data in an online fashion
We go through all the steps below, including installing TensorBoard

5.1 Install TensorBoard#

Clear any logs from previous runs

rm -rf ./runs

[8]:

!pip install tensorboard

Requirement already satisfied: tensorboard in /local/scratch/antonlee/miniconda3/envs/capymoa/lib/python3.9/site-packages (2.17.1)
Requirement already satisfied: absl-py>=0.4 in /local/scratch/antonlee/miniconda3/envs/capymoa/lib/python3.9/site-packages (from tensorboard) (2.1.0)
Requirement already satisfied: grpcio>=1.48.2 in /local/scratch/antonlee/miniconda3/envs/capymoa/lib/python3.9/site-packages (from tensorboard) (1.66.1)
Requirement already satisfied: markdown>=2.6.8 in /local/scratch/antonlee/miniconda3/envs/capymoa/lib/python3.9/site-packages (from tensorboard) (3.7)
Requirement already satisfied: numpy>=1.12.0 in /local/scratch/antonlee/miniconda3/envs/capymoa/lib/python3.9/site-packages (from tensorboard) (1.26.3)
Requirement already satisfied: packaging in /local/scratch/antonlee/miniconda3/envs/capymoa/lib/python3.9/site-packages (from tensorboard) (24.1)
Requirement already satisfied: protobuf!=4.24.0,>=3.19.6 in /local/scratch/antonlee/miniconda3/envs/capymoa/lib/python3.9/site-packages (from tensorboard) (5.28.2)
Requirement already satisfied: setuptools>=41.0.0 in /local/scratch/antonlee/miniconda3/envs/capymoa/lib/python3.9/site-packages (from tensorboard) (69.5.1)
Requirement already satisfied: six>1.9 in /local/scratch/antonlee/miniconda3/envs/capymoa/lib/python3.9/site-packages (from tensorboard) (1.16.0)
Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in /local/scratch/antonlee/miniconda3/envs/capymoa/lib/python3.9/site-packages (from tensorboard) (0.7.2)
Requirement already satisfied: werkzeug>=1.0.1 in /local/scratch/antonlee/miniconda3/envs/capymoa/lib/python3.9/site-packages (from tensorboard) (3.0.4)
Requirement already satisfied: importlib-metadata>=4.4 in /local/scratch/antonlee/miniconda3/envs/capymoa/lib/python3.9/site-packages (from markdown>=2.6.8->tensorboard) (7.2.1)
Requirement already satisfied: MarkupSafe>=2.1.1 in /local/scratch/antonlee/miniconda3/envs/capymoa/lib/python3.9/site-packages (from werkzeug>=1.0.1->tensorboard) (2.1.5)
Requirement already satisfied: zipp>=0.5 in /local/scratch/antonlee/miniconda3/envs/capymoa/lib/python3.9/site-packages (from importlib-metadata>=4.4->markdown>=2.6.8->tensorboard) (3.19.2)

5.2 PyTorchClassifier#

We define PyTorchClassifier and NeuralNetwork classes similarly to those from Tutorial 3: Using Pytorch with CapyMOA

[9]:

from capymoa.base import Classifier
import torch
from torch import nn

torch.manual_seed(1)
torch.use_deterministic_algorithms(True)

# Get cpu device for training.
device = "cpu"


# Define model
class NeuralNetwork(nn.Module):
    def __init__(self, input_size=0, number_of_classes=0):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(input_size, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, number_of_classes),
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits


class PyTorchClassifier(Classifier):
    def __init__(
        self,
        schema=None,
        random_seed=1,
        nn_model: nn.Module = None,
        optimizer=None,
        loss_fn=nn.CrossEntropyLoss(),
        device=("cpu"),
        lr=1e-3,
    ):
        super().__init__(schema, random_seed)
        self.model = None
        self.optimizer = None
        self.loss_fn = loss_fn
        self.lr = lr
        self.device = device

        torch.manual_seed(random_seed)

        if nn_model is None:
            self.set_model(None)
        else:
            self.model = nn_model.to(device)
        if optimizer is None:
            if self.model is not None:
                self.optimizer = torch.optim.SGD(self.model.parameters(), lr=lr)
        else:
            self.optimizer = optimizer

    def __str__(self):
        return str(self.model)

    def cli_help(self):
        return str(
            'schema=None, random_seed=1, nn_model: nn.Module = None, optimizer=None, loss_fn=nn.CrossEntropyLoss(), device=("cpu"), lr=1e-3'
        )

    def set_model(self, instance):
        if self.schema is None:
            moa_instance = instance.java_instance.getData()
            self.model = NeuralNetwork(
                input_size=moa_instance.get_num_attributes(),
                number_of_classes=moa_instance.get_num_classes(),
            ).to(self.device)
        elif instance is not None:
            self.model = NeuralNetwork(
                input_size=self.schema.get_num_attributes(),
                number_of_classes=self.schema.get_num_classes(),
            ).to(self.device)

    def train(self, instance):
        if self.model is None:
            self.set_model(instance)

        X = torch.tensor(instance.x, dtype=torch.float32)
        y = torch.tensor(instance.y_index, dtype=torch.long)
        # set the device and add a dimension to the tensor
        X, y = (
            torch.unsqueeze(X.to(self.device), 0),
            torch.unsqueeze(y.to(self.device), 0),
        )

        # Compute prediction error
        pred = self.model(X)
        loss = self.loss_fn(pred, y)

        # Backpropagation
        loss.backward()
        self.optimizer.step()
        self.optimizer.zero_grad()

    def predict(self, instance):
        return np.argmax(self.predict_proba(instance))

    def predict_proba(self, instance):
        if self.model is None:
            self.set_model(instance)
        X = torch.unsqueeze(
            torch.tensor(instance.x, dtype=torch.float32).to(self.device), 0
        )
        # turn off gradient collection
        with torch.no_grad():
            pred = np.asarray(self.model(X).numpy(), dtype=np.double)
        return pred

5.3 PyTorchClassifier + the test-then-train loop + TensorBoard#

Here we use instance loop to log relevant log information to TensorBoard
These information can be viewed while the processing is happening using TensorBoard

[10]:

from capymoa.evaluation import ClassificationEvaluator
from capymoa.datasets import Electricity
from torch.utils.tensorboard import SummaryWriter

# Create a SummaryWriter instance.
writer = SummaryWriter()
## Opening a file again to start from the beginning
stream = Electricity()

# Creating the evaluator
evaluator = ClassificationEvaluator(schema=stream.get_schema())

# Creating a learner
simple_pyTorch_classifier = PyTorchClassifier(
    schema=stream.get_schema(),
    nn_model=NeuralNetwork(
        input_size=stream.get_schema().get_num_attributes(),
        number_of_classes=stream.get_schema().get_num_classes(),
    ).to(device),
)

i = 0
while stream.has_more_instances():
    i += 1
    instance = stream.next_instance()

    prediction = simple_pyTorch_classifier.predict(instance)
    evaluator.update(instance.y_index, prediction)
    simple_pyTorch_classifier.train(instance)

    if i % 1000 == 0:
        writer.add_scalar("accuracy", evaluator.accuracy(), i)

    if i % 10000 == 0:
        print(f"Processed {i} instances")

writer.add_scalar("accuracy", evaluator.accuracy(), i)
# Call flush() method to make sure that all pending events have been written to disk.
writer.flush()

# If you do not need the summary writer anymore, call close() method.
writer.close()

Processed 10000 instances
Processed 20000 instances
Processed 30000 instances
Processed 40000 instances

5.4 Run TensorBoard#

Now, start TensorBoard, specifying the root log directory you used above. Argument logdir points to directory where TensorBoard will look to find event files that it can display. TensorBoard will recursively walk the directory structure rooted at logdir, looking for .*tfevents.* files.

tensorboard --logdir=runs

Go to the URL it provides

This dashboard shows how the accuracy change with time. You can use it to also track training speed, learning rate, and other scalar values.

6. Creating a synthetic stream with concept drifts from MOA#

Demonstrates the flexibility of the API, these level of manipulation of the API is expected from experienced MOA users.
To use the API like this the user must be familiar with how concept drifts are simulatd in MOA

EvaluatePrequential -l trees.HoeffdingAdaptiveTree -s (ConceptDriftStream -s generators.AgrawalGenerator -d (generators.AgrawalGenerator -f 2) -p 5000) -e (WindowClassificationPerformanceEvaluator -w 100) -i 10000 -f 100

[11]:

from capymoa.stream import Stream
from capymoa.classifier import OnlineBagging
from capymoa.evaluation import prequential_evaluation
from capymoa.evaluation.visualization import plot_windowed_results
from moa.streams import ConceptDriftStream

# Using the API to generate the data using the ConceptDriftStream and AgrawalGenerator.
# The drift location is based on the number of instances (5000) as well as the drift width (1000, the default value)
stream_sea1drift = Stream(
    moa_stream=ConceptDriftStream(),
    CLI="-s generators.SEAGenerator -d (generators.SEAGenerator -f 2) -p 5000 -w 1000",
)

OB = OnlineBagging(schema=stream_sea1drift.get_schema(), ensemble_size=10)

results_sea1drift_OB = prequential_evaluation(
    stream=stream_sea1drift, learner=OB, window_size=100, max_instances=10000
)

plot_windowed_results(results_sea1drift_OB, metric="accuracy")

../_images/notebooks_06_advanced_API_23_0.png

7. Drift, Multi-threated Ensemble and Results#

Generate a stream with 3 drifts, 2 abrupt and one gradual.
Evaluate utilising test-then-train (cumulative) and windowed evaluation.
Execute a multi-threated version of AdaptiveRandomForest.
For more on multi-threaded ensembles, see parallel_ensembles.ipynb notebook

[12]:

from capymoa.stream.generator import SEA
from capymoa.stream.drift import DriftStream, AbruptDrift, GradualDrift
from capymoa.classifier import AdaptiveRandomForestClassifier
from capymoa.evaluation import prequential_evaluation
from capymoa.evaluation.visualization import plot_windowed_results

SEA3drifts = DriftStream(
    stream=[
        SEA(1),
        AbruptDrift(10000),
        SEA(2),
        GradualDrift(start=20000, end=25000),
        SEA(3),
        AbruptDrift(45000),
        SEA(1),
    ]
)

arf = AdaptiveRandomForestClassifier(
    schema=SEA3drifts.get_schema(), ensemble_size=100, number_of_jobs=4
)

results = prequential_evaluation(
    stream=SEA3drifts, learner=arf, window_size=5000, max_instances=50000
)

print(f"Cumulative accuracy = {results.cumulative.accuracy()}")
print(f"wallclock = {results.wallclock()} seconds")
display(results.windowed.metrics_per_window())
plot_windowed_results(results, metric="accuracy")

None
Cumulative accuracy = 89.346
wallclock = 10.006356000900269 seconds

	instances	accuracy	kappa	kappa_t	kappa_m	f1_score	f1_score_0	f1_score_1	precision	precision_0	precision_1	recall	recall_0	recall_1
0	5000.0	88.26	73.743687	74.333188	67.096413	87.033480	82.534960	91.158307	88.176897	87.951807	88.401987	85.919338	77.746637	94.092040
1	10000.0	88.90	75.530516	76.572393	69.771242	87.928724	83.973433	91.509867	89.020852	89.366933	88.674770	86.863069	79.193900	94.532238
2	15000.0	89.32	76.502635	77.228145	71.009772	88.411372	84.646348	91.812328	89.488370	89.975550	89.001189	87.359989	79.913138	94.806840
3	20000.0	88.46	74.512585	75.457252	68.348875	87.397210	83.280209	91.189495	88.410300	88.267813	88.552788	86.407075	78.826111	93.988039
4	25000.0	89.96	77.605382	77.865961	71.718310	88.910925	85.165485	92.412334	89.854565	89.558732	90.150398	87.986898	81.183099	94.790698
5	30000.0	89.18	75.897805	76.508901	69.759642	88.083152	84.046004	91.814193	89.119564	88.951311	89.287816	87.070568	79.653438	94.487699
6	35000.0	89.42	76.037829	76.063348	69.046226	88.089201	83.896499	92.122115	88.884746	87.436548	90.332944	87.307770	80.631949	93.983592
7	40000.0	89.86	77.447069	77.879581	71.786311	88.868885	85.092620	92.317018	89.952864	90.211970	89.693757	87.810720	80.523094	95.098345
8	45000.0	90.18	78.293807	79.247675	73.125342	89.320332	85.739181	92.511819	90.482147	91.336634	89.627660	88.187975	80.788177	95.587772
9	50000.0	89.92	77.551851	77.962396	71.764706	88.897167	85.150265	92.370572	89.890396	89.807334	89.973459	87.925646	80.952381	94.898911

../_images/notebooks_06_advanced_API_25_2.png

8. AutoML with AutoClass#

The following example shows how to use the AutoClass algorithm using CapyMOA.

AutoClass is configured using a json configuration file configuration_json and a list of classifiers base_classifiers
AutoClass can also be configured with either a list of strings base_classifiers representing the MOA classifiers. This approach is only enticing for people that are very familiar with MOA.
In the example below, we also compare it against using the base classifiers individually

[13]:

from capymoa.evaluation import prequential_evaluation
from capymoa.datasets import RBFm_100k
from capymoa.automl import AutoClass
from capymoa.classifier import HoeffdingTree, HoeffdingAdaptiveTree, KNN
from capymoa.evaluation.visualization import plot_windowed_results

rbf_100k = RBFm_100k()

max_instances = 25000
window_size = 2500

ht = HoeffdingTree(schema=rbf_100k.get_schema())
hat = HoeffdingAdaptiveTree(schema=rbf_100k.get_schema())
knn = KNN(schema=rbf_100k.get_schema())
autoclass = AutoClass(
    schema=rbf_100k.get_schema(),
    configuration_json="./settings_autoclass.json",
    base_classifiers=[KNN, HoeffdingAdaptiveTree, HoeffdingTree],
)

results_ht = prequential_evaluation(
    stream=rbf_100k, learner=ht, window_size=window_size, max_instances=max_instances
)
results_hat = prequential_evaluation(
    stream=rbf_100k, learner=hat, window_size=window_size, max_instances=max_instances
)
results_knn = prequential_evaluation(
    stream=rbf_100k, learner=knn, window_size=window_size, max_instances=max_instances
)
results_autoclass = prequential_evaluation(
    stream=rbf_100k,
    learner=autoclass,
    window_size=window_size,
    max_instances=max_instances,
)

print(
    f"[HT] Cumulative accuracy = {results_ht.accuracy()}, wall-clock time: {results_ht.wallclock()}"
)
print(
    f"[HAT] Cumulative accuracy = {results_hat.accuracy()}, wall-clock time: {results_hat.wallclock()}"
)
print(
    f"[KNN] Cumulative accuracy = {results_knn.accuracy()}, wall-clock time: {results_knn.wallclock()}"
)
print(
    f"[AUTOCLASS] Cumulative accuracy = {results_autoclass.accuracy()}, wall-clock time: {results_autoclass.wallclock()}"
)
plot_windowed_results(
    results_ht, results_knn, results_hat, results_autoclass, metric="accuracy"
)

[HT] Cumulative accuracy = 53.396, wall-clock time: 0.25939011573791504
[HAT] Cumulative accuracy = 57.676, wall-clock time: 0.3388500213623047
[KNN] Cumulative accuracy = 86.956, wall-clock time: 2.3922243118286133
[AUTOCLASS] Cumulative accuracy = 86.268, wall-clock time: 68.99542450904846

../_images/notebooks_06_advanced_API_27_1.png

8.1 AutoClass alternative syntax#

Another way to configure the learners is by using a list of strings base_classifiers representing the MOA classifiers

[14]:

from capymoa.automl import AutoClass
from capymoa.datasets import RBFm_100k
from capymoa.classifier import KNN, HoeffdingTree, HoeffdingAdaptiveTree, OnlineBagging
from capymoa.evaluation import prequential_evaluation
from capymoa.evaluation.visualization import plot_windowed_results

rbf_100k = RBFm_100k()

autoclass = AutoClass(
    schema=rbf_100k.get_schema(),
    configuration_json="./settings_autoclass.json",
    base_classifiers=[KNN, HoeffdingTree, HoeffdingAdaptiveTree],
)

autoclass_MOAStrings = AutoClass(
    schema=rbf_100k.get_schema(),
    configuration_json="./settings_autoclass.json",
    base_classifiers=["lazy.kNN", "trees.HoeffdingTree", "trees.HoeffdingAdaptiveTree"],
)

results_autoClass = prequential_evaluation(
    stream=rbf_100k, learner=autoclass, window_size=100, max_instances=500
)
results_autoclass_MOAStrings = prequential_evaluation(
    stream=rbf_100k, learner=autoclass_MOAStrings, window_size=100, max_instances=500
)

results_autoclass_MOAStrings.learner = "AutoClass_MOAStrings"

plot_windowed_results(
    results_autoClass, results_autoclass_MOAStrings, metric="accuracy"
)

../_images/notebooks_06_advanced_API_29_0.png

[ ]: