data#

Utilities for continual learning when using PyTorch datasets.

Functions#

capymoa.ocl.util.data.class_incremental_schedule( num_classes: int, num_tasks: int, shuffle: bool = True, generator: Generator = torch.default_generator, ) → Sequence[Set[int]][source]#

Returns a class schedule for class incremental learning.

>>> class_incremental_schedule(9, 3, shuffle=False)
[{0, 1, 2}, {3, 4, 5}, {8, 6, 7}]

>>> class_incremental_schedule(9, 3, generator=torch.Generator().manual_seed(0))
[{8, 0, 2}, {1, 3, 7}, {4, 5, 6}]

Parameters:

num_classes – The number of classes in the dataset.
num_tasks – The number of tasks to divide the classes into.
shuffle – When False, the classes occur in numerical order of their labels. When True, the classes are shuffled.
generator – The random number generator used for shuffling, defaults to torch.default_generator

Returns:

A list of lists of classes for each task.

capymoa.ocl.util.data.class_incremental_split( dataset: Dataset[Tuple[Tensor, Tensor]], num_tasks: int, shuffle_tasks: bool = True, generator: Generator = torch.default_generator, ) → tuple[Sequence[Dataset[Tuple[Tensor, Tensor]]], Sequence[Set[int]]][source]#

Divide a dataset into multiple tasks for class incremental learning.

>>> from torch.utils.data import TensorDataset
>>> x = torch.tensor([[1, 2], [3, 4], [5, 6], [7, 8]])
>>> y = torch.tensor([0, 1, 2, 3])
>>> dataset = TensorDataset(x, y)
>>> tasks, schedule = class_incremental_split(dataset, 2, shuffle_tasks=False)
>>> schedule
[{0, 1}, {2, 3}]
>>> tasks[0][0]
(tensor([1, 2]), tensor(0))
>>> tasks[1][0]
(tensor([5, 6]), tensor(2))

Parameters:

dataset – The dataset to divide.
num_tasks – The number of tasks to divide the dataset into.
shuffle_tasks – When False, the classes occur in numerical order of their labels. When True, the classes are shuffled.
generator – The random number generator used for shuffling, defaults to torch.default_generator

Returns:

A tuple containing the list of tasks and the class schedule.

capymoa.ocl.util.data.class_schedule_to_task_mask( class_schedule: Sequence[Set[int]], num_classes: int, ) → BoolTensor[source]#

Convert a class schedule to a list of boolean masks.

This is useful when implementing multi-headed neural networks for task incremental learning.

>>> class_schedule_to_task_mask([{0, 1}, {2, 3}], 4)
tensor([[ True,  True, False, False],
        [False, False,  True,  True]])

Parameters:

num_classes – The total number of classes.
class_schedule – A sequence of sets containing class indices defining task order and composition.

Returns:

A boolean mask of shape (num_tasks, num_classes)

capymoa.ocl.util.data.get_class_indices(targets: LongTensor) → dict[int, LongTensor][source]#

Return a dictionary containing the indices of each sample given the class.

>>> targets = torch.tensor([0, 1, 0, 1, 2])
>>> get_class_indices(targets)
{0: tensor([0, 2]), 1: tensor([1, 3]), 2: tensor([4])}

Parameters:: targets – A 1D tensor containing the class labels.
Returns:: A dictionary containing the indices of each class.

capymoa.ocl.util.data.get_targets( dataset: Dataset[Tuple[Tensor, Tensor]], ) → LongTensor[source]#

Return the targets of a dataset as a 1D tensor.

If the dataset has a targets attribute, it is used.
Otherwise, the targets are extracted from the dataset by iterating over it.

Parameters:: dataset – The dataset to get the targets from.
Returns:: A 1D tensor containing the targets of the dataset.

capymoa.ocl.util.data.partition_by_schedule( dataset: Dataset[Tuple[Tensor, Tensor]], class_schedule: Sequence[Set[int]], shuffle: bool = False, rng: Generator = torch.default_generator, ) → Sequence[Dataset[Tuple[Tensor, Tensor]]][source]#

Divide a dataset into multiple datasets based on a class schedule.

In class incremental learning, a task is a dataset containing a subset of the classes in the original dataset. This function divides a dataset into multiple tasks, each containing a subset of the classes.

Parameters:

dataset – The dataset to divide.
class_schedule – A sequence of sets containing class indices defining task order and composition.
shuffle – If True, the samples in each task are shuffled.
rng – The random number generator used for shuffling, defaults to torch.default_generator

Returns:

A list of datasets, each corresponding to a task.

data#

Functions#

This Page