data#

Utilities for continual learning when using PyTorch datasets.

Functions#

capymoa.ocl.util.data.class_incremental_schedule(
num_classes: int,
num_tasks: int,
shuffle: bool = True,
generator: Generator = torch.default_generator,
) Sequence[Set[int]][source]#

Returns a class schedule for class incremental learning.

>>> class_incremental_schedule(9, 3, shuffle=False)
[{0, 1, 2}, {3, 4, 5}, {8, 6, 7}]
>>> class_incremental_schedule(9, 3, generator=torch.Generator().manual_seed(0))
[{8, 0, 2}, {1, 3, 7}, {4, 5, 6}]
Parameters:
  • num_classes – The number of classes in the dataset.

  • num_tasks – The number of tasks to divide the classes into.

  • shuffle – When False, the classes occur in numerical order of their labels. When True, the classes are shuffled.

  • generator – The random number generator used for shuffling, defaults to torch.default_generator

Returns:

A list of lists of classes for each task.

capymoa.ocl.util.data.class_incremental_split(
dataset: Dataset[Tuple[Tensor, Tensor]],
num_tasks: int,
shuffle_tasks: bool = True,
generator: Generator = torch.default_generator,
) tuple[Sequence[Dataset[Tuple[Tensor, Tensor]]], Sequence[Set[int]]][source]#

Divide a dataset into multiple tasks for class incremental learning.

>>> from torch.utils.data import TensorDataset
>>> x = torch.tensor([[1, 2], [3, 4], [5, 6], [7, 8]])
>>> y = torch.tensor([0, 1, 2, 3])
>>> dataset = TensorDataset(x, y)
>>> tasks, schedule = class_incremental_split(dataset, 2, shuffle_tasks=False)
>>> schedule
[{0, 1}, {2, 3}]
>>> tasks[0][0]
(tensor([1, 2]), tensor(0))
>>> tasks[1][0]
(tensor([5, 6]), tensor(2))
Parameters:
  • dataset – The dataset to divide.

  • num_tasks – The number of tasks to divide the dataset into.

  • shuffle_tasks – When False, the classes occur in numerical order of their labels. When True, the classes are shuffled.

  • generator – The random number generator used for shuffling, defaults to torch.default_generator

Returns:

A tuple containing the list of tasks and the class schedule.

capymoa.ocl.util.data.class_schedule_to_task_mask(
class_schedule: Sequence[Set[int]],
num_classes: int,
) BoolTensor[source]#

Convert a class schedule to a list of boolean masks.

This is useful when implementing multi-headed neural networks for task incremental learning.

>>> class_schedule_to_task_mask([{0, 1}, {2, 3}], 4)
tensor([[ True,  True, False, False],
        [False, False,  True,  True]])
Parameters:
  • num_classes – The total number of classes.

  • class_schedule – A sequence of sets containing class indices defining task order and composition.

Returns:

A boolean mask of shape (num_tasks, num_classes)

capymoa.ocl.util.data.get_class_indices(targets: LongTensor) dict[int, LongTensor][source]#

Return a dictionary containing the indices of each sample given the class.

>>> targets = torch.tensor([0, 1, 0, 1, 2])
>>> get_class_indices(targets)
{0: tensor([0, 2]), 1: tensor([1, 3]), 2: tensor([4])}
Parameters:

targets – A 1D tensor containing the class labels.

Returns:

A dictionary containing the indices of each class.

capymoa.ocl.util.data.get_targets(
dataset: Dataset[Tuple[Tensor, Tensor]],
) LongTensor[source]#

Return the targets of a dataset as a 1D tensor.

  • If the dataset has a targets attribute, it is used.

  • Otherwise, the targets are extracted from the dataset by iterating over it.

Parameters:

dataset – The dataset to get the targets from.

Returns:

A 1D tensor containing the targets of the dataset.

capymoa.ocl.util.data.partition_by_schedule(
dataset: Dataset[Tuple[Tensor, Tensor]],
class_schedule: Sequence[Set[int]],
shuffle: bool = False,
rng: Generator = torch.default_generator,
) Sequence[Dataset[Tuple[Tensor, Tensor]]][source]#

Divide a dataset into multiple datasets based on a class schedule.

In class incremental learning, a task is a dataset containing a subset of the classes in the original dataset. This function divides a dataset into multiple tasks, each containing a subset of the classes.

Parameters:
  • dataset – The dataset to divide.

  • class_schedule – A sequence of sets containing class indices defining task order and composition.

  • shuffle – If True, the samples in each task are shuffled.

  • rng – The random number generator used for shuffling, defaults to torch.default_generator

Returns:

A list of datasets, each corresponding to a task.