SplitCIFAR10#
- class capymoa.ocl.datasets.SplitCIFAR10[source]#
Bases:
_BuiltInCIScenario
Split CIFAR-10 dataset for online class incremental learning.
References:
Krizhevsky, A. (2009). Learning Multiple Layers of Features from Tiny Images.
- __init__(
- num_tasks: int | None = None,
- shuffle_tasks: bool = True,
- shuffle_data: bool = True,
- seed: int = 0,
- directory: Path = get_download_dir(),
- auto_download: bool = True,
- train_transform: Callable[[Any], Tensor] | None = None,
- test_transform: Callable[[Any], Tensor] | None = None,
- normalize_features: bool = False,
- preload_test: bool = True,
- preload_train: bool = False,
Create a new online continual learning datamodule.
- Parameters:
num_tasks – The number of tasks to partition the dataset into, defaults to
default_task_count
.shuffle_tasks – Should the contents and order of the tasks be shuffled, defaults to True.
shuffle_data – Should the training dataset be shuffled.
seed – Seed for shuffling the tasks, defaults to 0.
directory – The directory to download the dataset to, defaults to
capymoa.datasets.get_download_dir()
.auto_download – Should the dataset be automatically downloaded if it does not exist, defaults to True.
train_transform – A transform to apply to the training dataset, defaults to
default_train_transform
.test_transform – A transform to apply to the test dataset, defaults to
default_test_transform
.normalize_features – Should the features be normalized. This normalization step is after all other transformations.
preload_test – Should the test dataset be preloaded into CPU memory. Helps with memory locality and speed, but increases memory usage. Preloading the test dataset is recommended since it is small and is used multiple times in evaluation.
preload_train – Should the training dataset be preloaded into CPU memory. Helps with memory locality and speed, but increases memory usage. Preloading the training dataset is not recommended, since it is large and each sample is only seen once in online continual learning.
- test_loaders(
- batch_size: int,
- **kwargs: Any,
Get the training streams for the scenario.
- Parameters:
batch_size – Collects vectors in batches of this size.
kwargs – Additional keyword arguments to pass to the DataLoader.
- Returns:
A data loader for each task.
- train_loaders(
- batch_size: int,
- **kwargs: Any,
Get the training streams for the scenario.
The order of the tasks is fixed and does not change between iterations. The datasets themselves are shuffled in
__init__()
if shuffle_data is set to True. This is because the order of data is important in online learning since the learner can only see each example once.
- Parameters:
batch_size – Collects vectors in batches of this size.
kwargs – Additional keyword arguments to pass to the DataLoader.
- Returns:
A data loader for each task.
- default_task_count: int = 5#
The default number of tasks in the dataset.
- default_test_transform: Callable[[Any], Tensor] = ToTensor()[source]#
The default transform to apply to the dataset.
- default_train_transform: Callable[[Any], Tensor] = ToTensor()[source]#
The default transform to apply to the dataset.
- mean: Sequence[float] | None = [0.491, 0.482, 0.447]#
The mean of the features in the dataset used for normalization.
- num_classes: int = 10#
The number of classes in the dataset.
- std: Sequence[float] | None = [0.247, 0.243, 0.262]#
The standard deviation of the features in the dataset used for normalization.
- stream: Stream[LabeledInstance]#
Stream containing each task in sequence.
- task_schedule: Sequence[Set[int]]#
A sequence of sets containing the classes for each task.
In online continual learning your learner may not have access to this attribute. It is provided for evaluation and debugging.