ocl#

This module contains built-in datastream for online continual learning (OCL).

In OCL datastreams are irreversible sequences of examples following a non-stationary data distribution. Learners in OCL can only learn from a single pass through the datastream but are expected to perform well on any portion of the datastream.

Portions of the datastream where the data distribution is relatively stationary are called tasks.

A common way to construct an OCL dataset for experimentation is to groups the classes of a usual classification dataset into tasks. Known as the class-incremental scenario, the learner is presented with a sequence of tasks where each task contains a new subset of the classes.

For example SplitMNIST splits the MNIST dataset into five tasks where each task contains two classes:

>>> from capymoa.datasets.ocl import SplitMNIST
>>> scenario = SplitMNIST()
>>> scenario.task_schedule
[{1, 4}, {5, 7}, {9, 3}, {0, 8}, {2, 6}]

To get the usual CapyMOA stream object for training:

>>> instance = scenario.train_streams[0].next_instance()
>>> instance
LabeledInstance(
    Schema(SplitMNISTTrain),
    x=[0. 0. 0. ... 0. 0. 0.],
    y_index=1,
    y_label='1'
)

CapyMOA streams flatten the data into a feature vector:

>>> instance.x.shape
(784,)

You can access the PyTorch datasets for each task:

>>> x, y = scenario.test_tasks[0][0]
>>> x.shape
torch.Size([1, 28, 28])
>>> y
1

Classes#

SplitCIFAR10

Split CIFAR-10 dataset for online class incremental learning.

SplitCIFAR100

Split CIFAR-100 dataset for online class incremental learning.

SplitFashionMNIST

Split Fashion MNIST dataset for online class incremental learning.

SplitMNIST

Split MNIST dataset for online class incremental learning.

TinySplitMNIST

A lower resolution and smaller version of the SplitMNIST dataset for testing.