ocl#
This module contains built-in datastream for online continual learning (OCL).
In OCL datastreams are irreversible sequences of examples following a non-stationary data distribution. Learners in OCL can only learn from a single pass through the datastream but are expected to perform well on any portion of the datastream.
Portions of the datastream where the data distribution is relatively stationary are called tasks.
A common way to construct an OCL dataset for experimentation is to groups the classes of a usual classification dataset into tasks. Known as the class-incremental scenario, the learner is presented with a sequence of tasks where each task contains a new subset of the classes.
For example SplitMNIST
splits the MNIST dataset into five tasks where each
task contains two classes:
>>> from capymoa.datasets.ocl import SplitMNIST
>>> scenario = SplitMNIST()
>>> scenario.task_schedule
[{1, 4}, {5, 7}, {9, 3}, {0, 8}, {2, 6}]
To get the usual CapyMOA stream object for training:
>>> instance = scenario.train_streams[0].next_instance()
>>> instance
LabeledInstance(
Schema(SplitMNISTTrain),
x=[0. 0. 0. ... 0. 0. 0.],
y_index=1,
y_label='1'
)
CapyMOA streams flatten the data into a feature vector:
>>> instance.x.shape
(784,)
You can access the PyTorch datasets for each task:
>>> x, y = scenario.test_tasks[0][0]
>>> x.shape
torch.Size([1, 28, 28])
>>> y
1
Classes#
Split CIFAR-10 dataset for online class incremental learning. |
|
Split CIFAR-100 dataset for online class incremental learning. |
|
Split Fashion MNIST dataset for online class incremental learning. |
|
Split MNIST dataset for online class incremental learning. |
|
A lower resolution and smaller version of the SplitMNIST dataset for testing. |