datasets#
Use built-in datasets for online continual learning.
In OCL datastreams are irreversible sequences of examples following a non-stationary data distribution. Learners in OCL can only learn from a single pass through the datastream but are expected to perform well on any portion of the datastream.
Portions of the datastream where the data distribution is relatively stationary are called tasks.
A common way to construct an OCL dataset for experimentation is to group the classes of a classification dataset into tasks. Known as the class-incremental scenario, the learner is presented with a sequence of tasks where each task contains a new subset of the classes.
For example SplitMNIST splits the MNIST dataset into five tasks where
each task contains two classes:
>>> from capymoa.ocl.datasets import SplitMNIST
>>> scenario = SplitMNIST(preload_test=False)
>>> scenario.task_schedule
[{1, 4}, {5, 7}, {9, 3}, {0, 8}, {2, 6}]
To get the usual CapyMOA stream object for training:
>>> instance = scenario.stream.next_instance()
>>> instance
LabeledInstance(
Schema(SplitMNIST10/5),
x=[0. 0. 0. ... 0. 0. 0.],
y_index=4,
y_label='4'
)
CapyMOA streams flatten the data into a feature vector:
>>> instance.x.shape
(784,)
You can access the PyTorch datasets for each task:
>>> x, y = scenario.test_tasks[0][0]
>>> x.shape
torch.Size([1, 28, 28])
>>> y
1
Classes#
Split MNIST dataset for online class incremental learning. |
|
Rotated MNIST where each task applies a fixed image rotation. |
|
A lower resolution and smaller version of the SplitMNIST dataset for testing. |
|
Domain-incremental TinyMNIST where each task applies a fixed image rotation. |
|
CIFAR100 encoded by a Vision Transformer (ViT). |
|
CIFAR10 encoded by a Vision Transformer (ViT). |
|
Domain incremental CIFAR-100 ViT variant with 20 classes per task. |
|
Split Fashion MNIST dataset for online class incremental learning. |
|
Domain-incremental FashionMNIST where each task applies a fixed image rotation. |
|
Split CIFAR-10 dataset for online class incremental learning. |
|
Split CIFAR-100 dataset for online class incremental learning. |
|
Domain incremental CIFAR-100 variant with 20 classes per task. |