BatchClassifier#

class capymoa.base.BatchClassifier[source]#

Bases: Classifier, Batch, ABC

Base class for classifiers that support mini-batches.

Supported by:

Evaluators that support batch classifiers will call the batch_train() and batch_predict_proba() methods instead of train() and predict_proba():

>>> from capymoa.base import BatchClassifier
>>> from capymoa.datasets import ElectricityTiny
>>> from capymoa.evaluation import prequential_evaluation
>>>
>>> batch_size = 500
>>> class MyBatchClassifier(BatchClassifier):
...     def batch_train(self, x, y):
...         print(f"batch_train x: {x.shape} {x.dtype}")
...         print(f"batch_train y: {y.shape} {y.dtype}")
...
...     def batch_predict_proba(self, x):
...         print(f"batch_predict_proba x: {x.shape} {x.dtype}")
...         return torch.zeros((x.shape[0], self.schema.get_num_classes()))
...
>>> stream = ElectricityTiny()
>>> learner = MyBatchClassifier(stream.schema)
>>> _ = prequential_evaluation(
...     stream,
...     learner,
...     batch_size=batch_size,
...     max_instances=721
... )
batch_predict_proba x: torch.Size([500, 6]) torch.float32
batch_train x: torch.Size([500, 6]) torch.float32
batch_train y: torch.Size([500]) torch.int64
batch_predict_proba x: torch.Size([221, 6]) torch.float32
batch_train x: torch.Size([221, 6]) torch.float32
batch_train y: torch.Size([221]) torch.int64

You can manually use itertools.batched (python 3.12) function and np.stack to collect batches of instances as a matrix:

>>> from itertools import islice
>>> from capymoa._utils import batched # Not available in python < 3.12
>>> stream.restart() # streams are stateful, so restart it
>>> for i, batch in enumerate(batched(stream, 100)):
...     x = np.stack([instance.x for instance in batch])
...     y = np.stack([instance.y_index for instance in batch])
...     x = torch.from_numpy(x).to(learner.device, learner.x_dtype)
...     y = torch.from_numpy(y).to(learner.device, learner.y_dtype)
...     learner.batch_train(x, y)
...     break
batch_train x: torch.Size([100, 6]) torch.float32
batch_train y: torch.Size([100]) torch.int64

The default implementation of train() and predict() calls the batch variants with a batch of size 1. This is useful for parts of CapyMOA that expect a classifier to be able to train and predict on single instances.

>>> instance = next(stream)
>>> learner.train(instance)
batch_train x: torch.Size([1, 6]) torch.float32
batch_train y: torch.Size([1]) torch.int64
>>> learner.predict(instance)
batch_predict_proba x: torch.Size([1, 6]) torch.float32
np.int64(0)
>>> learner.predict_proba(instance)
batch_predict_proba x: torch.Size([1, 6]) torch.float32
array([0., 0.], dtype=float32)
__init__(schema: Schema, random_seed: int = 1)[source]#
batch_predict(x: Tensor) Tensor[source]#

Predict the labels for a batch of instances.

Parameters:

x – Batch of x_dtype valued feature vectors (batch_size, num_features)

Returns:

Predicted batch of y_dtype valued labels (batch_size,).

abstract batch_predict_proba(x: Tensor) Tensor[source]#

Predict the probabilities of the classes for a batch of instances.

Parameters:

x – Batch of x_dtype valued feature vectors (batch_size, num_features)

Returns:

Batch of x_dtype valued predicted probabilities (batch_size, num_classes).

abstract batch_train(x: Tensor, y: Tensor) None[source]#

Train with a batch of instances.

Parameters:
  • x – Batch of x_dtype valued feature vectors (batch_size, num_features)

  • y – Batch of y_dtype valued labels (batch_size,).

predict(instance: Instance) int | None[source]#

Predict the label of an instance.

The base implementation calls predict_proba() and returns the label with the highest probability.

Parameters:

instance – The instance to predict the label for.

Returns:

The predicted label or None if the classifier is unable to make a prediction.

predict_proba(
instance: Instance,
) ndarray[Any, dtype[float64]] | None[source]#

Calls batch_predict_proba() with a batch of size 1.

train(instance: LabeledInstance) None[source]#

Calls batch_train() with a batch of size 1.

device: device = device(type='cpu')#

Device on which the batch will be processed.

random_seed: int#

The random seed for reproducibility.

When implementing a classifier ensure random number generators are seeded.

schema: Schema#

The schema representing the instances.

x_dtype: dtype = torch.float32[source]#

Data type for the input features.

y_dtype: dtype = torch.int64[source]#

Data type for the target value/labels.