data#
Utilities for continual learning when using PyTorch datasets.
Functions#
- capymoa.ocl.util.data.class_incremental_schedule(
- num_classes: int,
- num_tasks: int,
- shuffle: bool = True,
- generator: Generator = torch.default_generator,
Returns a class schedule for class incremental learning.
>>> class_incremental_schedule(9, 3, shuffle=False) [{0, 1, 2}, {3, 4, 5}, {8, 6, 7}]
>>> class_incremental_schedule(9, 3, generator=torch.Generator().manual_seed(0)) [{8, 0, 2}, {1, 3, 7}, {4, 5, 6}]
- Parameters:
num_classes – The number of classes in the dataset.
num_tasks – The number of tasks to divide the classes into.
shuffle – When False, the classes occur in numerical order of their labels. When True, the classes are shuffled.
generator – The random number generator used for shuffling, defaults to torch.default_generator
- Returns:
A list of lists of classes for each task.
- capymoa.ocl.util.data.class_incremental_split(
- dataset: Dataset[Tuple[Tensor, Tensor]],
- num_tasks: int,
- shuffle_tasks: bool = True,
- generator: Generator = torch.default_generator,
Divide a dataset into multiple tasks for class incremental learning.
>>> from torch.utils.data import TensorDataset >>> x = torch.tensor([[1, 2], [3, 4], [5, 6], [7, 8]]) >>> y = torch.tensor([0, 1, 2, 3]) >>> dataset = TensorDataset(x, y) >>> tasks, schedule = class_incremental_split(dataset, 2, shuffle_tasks=False) >>> schedule [{0, 1}, {2, 3}] >>> tasks[0][0] (tensor([1, 2]), tensor(0)) >>> tasks[1][0] (tensor([5, 6]), tensor(2))
- Parameters:
dataset – The dataset to divide.
num_tasks – The number of tasks to divide the dataset into.
shuffle_tasks – When False, the classes occur in numerical order of their labels. When True, the classes are shuffled.
generator – The random number generator used for shuffling, defaults to torch.default_generator
- Returns:
A tuple containing the list of tasks and the class schedule.
- capymoa.ocl.util.data.class_schedule_to_task_mask(
- class_schedule: Sequence[Set[int]],
- num_classes: int,
Convert a class schedule to a list of boolean masks.
This is useful when implementing multi-headed neural networks for task incremental learning.
>>> class_schedule_to_task_mask([{0, 1}, {2, 3}], 4) tensor([[ True, True, False, False], [False, False, True, True]])
- Parameters:
num_classes – The total number of classes.
class_schedule – A sequence of sets containing class indices defining task order and composition.
- Returns:
A boolean mask of shape (num_tasks, num_classes)
- capymoa.ocl.util.data.get_class_indices(targets: LongTensor) dict[int, LongTensor] [source]#
Return a dictionary containing the indices of each sample given the class.
>>> targets = torch.tensor([0, 1, 0, 1, 2]) >>> get_class_indices(targets) {0: tensor([0, 2]), 1: tensor([1, 3]), 2: tensor([4])}
- Parameters:
targets – A 1D tensor containing the class labels.
- Returns:
A dictionary containing the indices of each class.
- capymoa.ocl.util.data.get_targets( ) LongTensor [source]#
Return the targets of a dataset as a 1D tensor.
If the dataset has a targets attribute, it is used.
Otherwise, the targets are extracted from the dataset by iterating over it.
- Parameters:
dataset – The dataset to get the targets from.
- Returns:
A 1D tensor containing the targets of the dataset.
- capymoa.ocl.util.data.partition_by_schedule(
- dataset: Dataset[Tuple[Tensor, Tensor]],
- class_schedule: Sequence[Set[int]],
- shuffle: bool = False,
- rng: Generator = torch.default_generator,
Divide a dataset into multiple datasets based on a class schedule.
In class incremental learning, a task is a dataset containing a subset of the classes in the original dataset. This function divides a dataset into multiple tasks, each containing a subset of the classes.
- Parameters:
dataset – The dataset to divide.
class_schedule – A sequence of sets containing class indices defining task order and composition.
shuffle – If True, the samples in each task are shuffled.
rng – The random number generator used for shuffling, defaults to torch.default_generator
- Returns:
A list of datasets, each corresponding to a task.